Add an API to get a list of repositories for a project
We have a need to get a list of repositories for individual projects. Currently, we need this to gather project metrics and we need this to identify repositories that require review by the IP Team. As we move to implement metrics via an external provider, we're going to need a consistent and fully-supported means of getting the list of repositories.
Here's what the scripts that I maintain currently do.
We start by getting the project metadata via API call to projects.eclipse.org
(e.g., https://projects.eclipse.org/json/project/adoptium.aqavit
). From that, we get:
- A list of repository urls from the
source_repositories
field; - Zero or more GitHub organisations from the
github_org
field; and - Zero or more GitLab groups from the
gl_project_group
field and excluded groups from thegl_excl_sub_groups
field.
There is potentially some duplication between what is listed in the source_repositories
field and what we find in the GitHub group, so I filter for that.
For each of the GitHub organisations, I use the GitHub API to get the list of repositories.
For each of the GitLab organisations, I use the GitLab API to get the list of repositories. I do this recursively, so that subgroups are included. I prune branches in excluded groups during the recursion. Note that, AFAICT, this "excluded groups" feature isn't actually exploited by any projects currently.
With a full list of repositories gathered in this manner, I exclude some repositories that I know to be mirrors or otherwise do not include Eclipse project code. This includes a number of OpenJDK repositories from the Adoptium subprojects and a bunch of mirror/third-party repositories under Oniro. I also skip all "website" repositories under an assumption that they do not contain project code. These exclusions are all done with a nasty bit of hard coded regular expressions that an earlier version of myself decided would be a temporary hack.
I currently have this all implemented twice: once in PHP, and once in Java.
I would very much like to have this all implemented once and maintained by the same team that decides how all of this information is represented.
By way of example... The Eclipse Dash metadata contains this (trimmed)...
...
"gl_excl_sub_groups" : [
{
"value" : "eclipse/technology/dash/sync-script-testing/exclude-test"
}
],
"gl_project_group" : [
{
"value" : "eclipse/technology/dash"
}
],
...
"source_repo" : [
{
"name" : "dash-licenses",
"path" : "https://github.com/eclipse/dash-licenses",
"type" : "github",
"url" : "https://github.com/eclipse/dash-licenses"
}
],
...
Which gives me this list of repositories:
https://github.com/eclipse/dash-licenses
https://gitlab.eclipse.org/eclipse/technology/dash/eclipse-api-for-java.git
https://gitlab.eclipse.org/eclipse/technology/dash/eclipse-project-code.git
https://gitlab.eclipse.org/eclipse/technology/dash/org.eclipse.dash.handbook.git