Backup process for GitHub repositories
There currently isn't a solution for scripted/automated import of repositories into Gitlab from Github (with additional metadata such as issues, PRs, and wiki pages). There are 3 potential options to resolve this issue.
1. Bulk import manually
In the Github Import integration, there is a section that describes a bulk import interface. This interface is entered manually, but doesn't require an active browser session once started (user can navigate away). There is a filter that can be used to pair down the enormous amounts of repositories that are managed, and a quick way to choose namespace and project name. This is the option that requires the least work, only needing the Github integration to be activated and setup. There are some manual steps in this option, but they are quick and guarantee all the data is brought over.
2. UI Automation
In the project creation screen, there is an option to import a Github project that calls out various metadata sections of Github repositories as being imported. While there are a few setup steps involved before this import process can be used, they are basic linking of the current Gitlab account to a Github account.
In this option, there is a limitation in that users that are imported are matched on publicly available email, or the linked Gitlab account in Github. As it's not feasible to ask people to link their Github accounts to this Gitlab instance, the matching will more likely be done on public email addresses. This presents an issue where imported data for noreply/privated email address users will present as the webmaster account (or whoever is importing the data). This link break will cause problems in managing things such as active PRs or issues, as owners will no longer be as easily able to manage them.
An additional limitation is that large changes to the process or UI will cause UI automation to break. This will mean more maintenance as the platform evolves, and creates extra tests that must be performed when platform updates are performed.
This could be accomplished using something like Node.js, Selenium-driver, and HeadlessChrome, so that this would run in the background. There would be development and overhead with this option, but it likely wouldn't be incredibly high unless there is a blocker to this somewhere.
3. Import API
In the API, there is an import API available that indicates support for Github. This API seems to be broken in local tests, ending in 500 server responses. This error seems to be caused by bad nil checks or bad comparisons deep in the Import code:
{
"time": "2021-02-26T16:51:13.755Z",
"severity": "INFO",
"duration": 20.37,
"db": 0,
"view": 20.37,
"status": 500,
"method": "POST",
"path": "/api/v4/import/github",
"params": [
{
"key": "personal_access_token",
"value": "[FILTERED]"
},
{
"key": "repo_id",
"value": "264273415"
},
{
"key": "target_namespace",
"value": "2212"
}
],
"host": "localhost",
"remote_ip": "172.17.0.1, 127.0.0.1",
"ua": "insomnia/2020.3.3",
"route": "/api/:version/import/github",
"exception.class": "NoMethodError",
"exception.message": "undefined method `namespace_path' for nil:NilClass",
"exception.backtrace": [
"app/services/import/github_service.rb:37:in `target_namespace'",
"app/services/import/github_service.rb:45:in `authorized?'",
"app/services/import/github_service.rb:9:in `execute'",
"lib/api/import_github.rb:40:in `block in <class:ImportGithub>'",
"lib/api/api_guard.rb:168:in `call'"
],
"queue_duration": 13.73,
"correlation_id": "tfUNr5h7ez"
}
Additionally, there is no documentation on what data is imported by this call like there is in other locations in the documentation. I've created an issue in the Gitlab issue tracker for this, but I seriously doubt that this API will get much support as the UI seems to be the main focus for import experience.
Relying on this endpoint will likely require a long wait period before these issues are addressed, and cause large delays to the investigation of this option.