Migrating projects.eclipse.org is going to be a huge undertaking. This could be seen as an opportunity to migrate away from Drupal for projects.eclipse.org if we decide that's the best approach.
I believe this is a minimum of a 1-year project with 1 full-time developer on the project.
This does not include the time that I need to manage/oversee the project. I also believe this project will require @wbeaton to be involved.
From my perspective, I am trying to understand if we need staff from my team to help with this initiative in 2022.
I'm thinking that we have an opportunity to rethink and revamp what we're doing with the PMI.
Our world was different when we created the PMI. We weren't even on GitHub at that point in time and repositories were hard to find. GitHub revolutionized how people find things. I believe that most of the community finds Eclipse projects through GitHub and other means that are not the PMI. The PMI was far more important as a community development tool back in the day than it is now. In fact, as more of our projects move off of straight up Git and Gerrit, I'll suggest that the PMI will become interesting and useful only to a very small number of insiders.
I am very much on a track to push more information about a project into the project's Git repositories. It makes more sense to me, for example, to get projects to use a standard change log format to capture information about releases (rather than having us populate the "issues" tab in a release record). If we do this, then, couldn't the change log drive any list of releases that we generate? That is... the change log becomes the release records. Of course, then, the bigger question is whether or not we have to generate a list of releases?
In our first project portal, we depended on projects providing certain files in their repositories to provide information like a description of the project. This was a bit cumbersome back in the day, but I think that it may be worth thinking about in the modern context. Providing READMEs, LICENSE files, and more in a repository is far more natural today than it had been previously. That many of our projects have multiple repositories may complicate this a bit (along with the fact that we use the PMI to keep track of a project's repositories), but I expect that there is an easy solution for this.
The things that come immediately to mind as needing automation support are elections and the resulting paperwork process.
In the past, we've discussed doing something with Hugo. I'm interested in exploring this.
In any case, before we go too far down a reimplementation rabbit hole (which is what migrating to Drupal 9 sounds like), we should engage in a requirements gathering process to see if our actual needs would be met by engaging in that sort of activity. At very least, we can shed some unnecessary functionality to help streamline a migration.
In any case, before we go too far down a reimplementation rabbit hole (which is what migrating to Drupal 9 sounds like), we should engage in a requirements gathering process to see if our actual needs would be met by engaging in that sort of activity. At very least, we can shed some unnecessary functionality to help streamline a migration.
I do like this idea, especially if we can streamline the migration.
How should we get started on the requirement gathering process?
I would like to better understand what features my team will be expected to migrate in 2022 for planning purposes.
At this point in time, I still believe Drupal is the right solution for projects.eclipse.org but I might change my mind once the list of requirements is set.
We can do some piecemeal work by killing off things like the issues tab in releases, but that's probably not going to get as far along as we'd like.
My sense is that if we back up, make a problem statement, and start capturing requirements, we'll end up with a shorter list than we expect.
I need to think a bit about the nature of the problem that I think we're actually trying to solve.
In the meantime, here are some thoughts:
As a general principle, data that we don't want project teams manipulating should be in the Foundation DB:
Project name;
Whether or not a project is a specification project and the selection of patent license;
Project license and scope;
Git repository URLs;
Creation and termination date (we currently guess this based on the start date of the first committer, and end date of the last); and
Committers, project leads, mentors (already there, but included for completeness).
Putting all of this stuff in the Foundation DB should actually make it easier for webmaster. I believe that we've made a decision that the Foundation DB is the source of truth (at one point my intention was to make the PMI the source of truth for project stuff, but that's not true any more). So, let's double down on making it the source of truth and stop replicating information.
Other thoughts:
Move the simultaneous release tracking stuff out of the PMI (e.g., make it an IDE working group problem);
Pointers to builds belong in the README in the Git repositories;
Most of the documentation fields (e.g., "build help") belongs in the README or CONTRIBUTING file;
We need a means of making an association between projects and reviews.
We are in the strange predicament where doing nothing is not an option, moving PMI to D9 is a lot of work and not the best use of resources, but finding a new home to those pieces that are used, and fixing the parts that do depend on PMI data is also going to be a lot of work and disruptime.
I think we should all share a screen, go through PMI, piece by piece, and document if it's still needed or not, what depends on it and where it should end up.
This could be a way to get started. However, I think the PMI does more than what users can see. I feel we will need to dig into the code to have a better understanding of what it's actually doing.
One thing is true, with this site and all the others, we need to document what it's currently doing. Once we have that information, we will need to decide what stays and what needs to go.
This is a bit of a swing back to what we used to do, but I'm thinking that we should use Git to represent most project information.
We could set up a repository specifically for project information where we can keep a lot of the project information in a machine readable format. We could, for example, create some sort of project data file with this sort of information:
name, by-line
description and scope
license (expressed as an SPDX expression)
list of repositories
communication channels (e.g., mailing lists)
pointer to CI
list of review dates, results, and pointer to tracking issue
related projects
associated trademarks
related working groups (though this probably belongs with working group data)
categorization, tags
...
The benefits of using a Git repository for this are (to start) very easy tracking of change and the ability for literally anybody to suggest changes via merge requests (which should reduce maintenance burden). That, and all of the information is in one location. I'm thinking that, at least for now, EF staff would be the only ones who can merge a merge request.
If it's in a machine readable format, then developing an API to grab this information should be relatively straightforward. We then use that API to populate "project information" along the lines of what the PMI does.
I'm thinking that we leverage the release support provided in GitLab and GitHub; I don't think that having any additional notion of releases in our data is required (I assume that we can get at the releases via the GitLab/GitHub APIs if we decide we need that). We do need to keep track of reviews, but adding them to the project's data file makes sense to me.
As we revamp the IP due diligence process, I'm thinking that the IPZilla integration just disappears. I'd like to stick as close to we can to an "off the shelf" solution for a replacement, so -- at least at this point -- I think that this support just goes away, or becomes a separate concern.
This leaves us with:
Committer and project lead elections. When we created the PMI, I toyed with just running these in the project mailing list. I'm willing to revisit that, but I expect that we'll end up still wanting something to support elections. In any case, we'll need the committer paperwork bits that follow committer elections.
Tracking committer and project lead status. We use the database for this. Again, I presume that we can use an API to display this information on a "project information" page from there. Not duplicating this information in whatever we create to replace the PMI should be goal.
Contributors. Not sure what to do with this.
IP Log Generator. As we turn our attention to doing SPDX-style SBOMs, I think that this just goes away from the PMI. If anything, it will end up being something that gets generated by the IPZilla replacement (mostly likely ORT at this point).
Create a Review.@mdelgado624 and I need to think about what we're going to do here. We've started down a path where the EMO initiates progress reviews, but it's not clear to me how we should handle reviews initiated by committers (e.g., release reviews for specification projects.
Tracking specification projects. IMHO, this belongs in the Foundation DB.
Charts. We need to continue to provide the various charts and graphs of project activity. These currently have no behaviour beyond displaying content in the PMI.
We can considerably simplify the way that we handle Git repositories. The initial design intentionally leaned on committers to provide repository information, so there's a lot of code in there to help committers get it right (and resolve conflicts when they get it wrong). If we lock this down and only let the EF staff set repository information, then we can drop a lot of code (especially if we retire Gerrit).
Currently we have:
A background process (in the project-services repository) that queries Gerrit for a list of repositories and puts the results into a table in the dashboard database that enables committers to do wildcard matches in the repository list (e.g., you can specify your repositories as "/gitroot/dash/*"). Note that this not used for content assist. Rather, the wildcard match happens whenever the repositories are displayed (i.e., there's a DB query). At least in part this was implemented in this manner because there was initially a separation between webmaster and the PMI and when webmaster created a new repository on Git (pre-Gerrit), that change was not reflected in the PMI until somebody noticed and added it. The use of the "/gitroot/" prefix reflects the state of the world when this was created: this was before Gerrit when all of our repositories could be accessed "locally" via an NFS mount.
A custom Drupal field display hook that, when displaying the "Source Repositories" field, rolls in the GitHub and GitLab repositories. Unfortunately, when the "Source Repositories" field is empty, the field does not display and it looks like the project has no repositories. To force the field to display, I've been putting a /gitroot/bogus/* wildcard into the field which matches to nothing, but invokes the hooks that include the GitHub and GitLab stuff. I'm thinking that we should stop rolling these fields together and just display them separately (should we choose to keep this on Drupal). There's more gory detail here.
There's also some infrastructure to support mailing lists that we can probably remove/simplify, depending on what course we choose to take.
Again, there's a background script that harvests information about mailing lists and puts it into a table in the dashboard database. This data is used to provide content assist in the mailing list fields.
The table is also used when displaying information about a mailing list. What we actually capture in the PMI is the mailing list name (e.g., "dash-dev"), but we display the URL to the subscribe page and provide a link to actually send email.
The script actually stores the name, subscribe URL, and email address. During the time of multiple forges, we couldn't just guess at the URL and address from the name so the table wound up being the source of that information.
If we decide that EF sets the mailing list info, then we don't need the content assist. We may be able to do away with the table altogether if we decide that mailing list information is set by EF staff.
I just made a commitment to retire IPzilla in 2022.
As part of this, the related bits (specifically, the projects_iplog, iplog_generator, and ipzilla modules) from the PMI can be retired. We'll have to coordinate this with the creation of replacements (which at this point I am thinking are GitLab issue templates).