Is it time to retire the IP Log generator
Section IV of the Eclipse Foundation's Intellectual Property Policy [1] describes our requirements for record keeping. It does not specifically describe IP Logs as we capture them today. The current implementation of generated IP Log is a derivative of the original implementation and is based on requirements implied by that original implementation. Note that Implementing the IP Policy [2] makes no mention of record keeping.
The implementation of the IP Log Generator is generally stable and so itself does not require a lot of maintenance. It draws data from collection of sources that are themselves generally stable. There is, however, some complexity and subtlety in the implementation that makes potential future maintenance challenging. In some cases generating an IP Log and submitting it for review can get expensive (we limit the ability to do this to committers to mitigate).
IP Logs currently capture the project's licensing scheme, list of source code repositories, list of committers (along with their active/inactive date), third party content (separate by content type), and commits by non-committers. It's not clear to me that these IP Logs are actually used by anybody (IBM used to grab them, but it's been a very long time since anybody has asked me about getting access to IP Logs). All of this information is available in other, perhaps better, sources.
The PMI provides access to license, repository, and committer information.
The Git log (e.g., shortlog) provides a list of contributions by committers and contributors. Note that the original implementation pulled contributor information from Bugzilla because this information was not available in CVS. Author information is available in Git.
e.g.,
$ git shortlog --email
Ahmed Ashour <asashour@yahoo.com> (7):
Fix eclipse checkstyle errors (#381)
Correct JavaDoc about not deprecated method (#380)
DateTime support conversion to Instant (#382)
Ignore test-output folders (#383)
Fix eclipse checkstyle warnings (#384)
Ignore test-output folders (#383) (#385)
Use Path instead of File (#386)
Ari Suutari <ari.suutari@syncrontech.com> (3):
Add simple address space browsing example
Wait for SHARED_EXECUTOR_SERVICE and EXECUTOR_SERVICE to stop. (#72)
Add stubs for HistoryRead and HistoryUpdate services (#110)
...
The wrinkle here is that current IP Logs snapshot a moment in time. Theoretically, we can play back Git logs from a certain point in tag or branch to get an historical view for any particular release. We don't generally allow rewriting history in Git repositories, but this does occasionally happen, meaning that we might theoretically lose that historical information in those extremely rare cases when content is removed from a repository (re-writing long-past history is extremely rare). There's also the possibility that we lose some information when a project team decides to retire a repository (we generally archive rather than delete though, so even retired repositories generally persist).
Another wrinkle is that some of our older projects have commits that have been migrated from CVS that only record committer information. That is, commits that predate our move to Git do not record author information for contributions; rather, these commits are attributed exclusively to the committer who pushed them. The IP Log generator accounts for this by pulling contribution information out of Bugzilla. With the prospect of Bugzilla being retired, this source of information may potentially disappear. Regardless, our historical record of IP Log reviews records all of this information.
Reporting of third-party content has always been spotty; it has always been dependent on committers correctly recording all their dependencies via IPzilla. This data has always missed, for example, third-party content leveraged indirectly via linking to other Eclipse project content. With the policy changes that remove the need for piggyback CQs and leveraging of ClearlyDefined data, there are more holes.
The third-party library holes are nicely filled by the new Dash License Tool, which generates a "DEPENDENCIES" file that I believe should be generated and then committed directly to the Git repository by the committer. This would capture the dependencies in Git commits, providing a nice historical record.
In cases where dependencies cannot be automatically extracted using the build tool (e.g., cmake builds), project teams will be required to manually maintain a DEPENDENCIES file. In these cases, project teams generally have to provide a list of dependencies in build instructions anyway, so this shouldn't be particularly onerous.
The TL;DR is that we can make each Git repository be its own record of intellectual property contributions.
So... I propose that we retire the IP Log generator. Further, I proposal that we stop using IPZilla to request and record IP Log reviews and just make reviewing the project's intellectual property tracking practices (which is effectively what an IP Log review is anyway) a part of the EDP review process (e.g., release reviews).
The primary benefit is that this will remove a massive chunk of code that provides functionality that I believe is of limited additional value over and above what already exists.
What have I missed?
[1] https://www.eclipse.org/org/documents/Eclipse_IP_Policy.pdf [2] https://www.eclipse.org/org/documents/implementing_the_ip_policy.pdf