Debian SPDX data imported to fossology should not have any LicenseInfoInFile data
Why Debian SPDX data imported into Fossology should not have any LicenseInfoInFile data but only LicenseConcluded data
Fossology
In Fossology, SPDX LicenseInfoInFile data are imported by reportImport as scanner findings, while LicenseConcluded data are imported as auditor decisions.
The main difference is that scanner findings are stored by Fossology as related to a particular file checksum, implying that findings on a particular file will be applied to any other occurrence of the same file (with the same checksum) in any other upload (even future uploads), even if the file name is different (on the condition that the same scanner agent -- in this case, the reportImport agent -- has been scheduled also on that other upload); on the contrary, decisions are stored as related to a particular upload (and to a particular audit group) so they are never applied to other uploads (unless one manually schedules a reuser agent job).
Importing license metadata extracted from Debian copyright as LicenseInfoInFile entails that such metadata are applied not only to the specific upload against which the reportImport agent is run, but to any files which have the same checksum in any other upload for which the reportImport agent is run.
Debian
Debian DEP5 specs mandate that every single file must have copyright and license information, and allow to achieve that also through the use of wildcards (*
), so also files that do not have any license notice inside must "get" a license. Moreover, also files that do have a license notice may "get" a different license from debian/copyright, because f.e. in the debian/copyright file there may be wildcards that apply the "prevailing" outbound license (eg. GPL-2.0-only) over permissive inbound licenses (eg. MIT or BSD). Therefore. when the same files are found in other packages/uploads, they may get a different license in debian/copyright depending on the package license context.
Fossology vs. Debian
If Debian license metadata on files, that are found in different uploads (typically, shell scripts, m4 files, etc.), are imported to Fossology as LicenseInfoInFile data, they would be all applied to any occurrence of each of such files in any upload for which the reportImport agent is run. And if different LicenseInfoInFile data for the same file are imported from different Debian packages for different Fossology uploads, they will be applied altogether to all occurrences of that file in any upload for which reportImport is run.
Given that Debian license metadata may be upload/context-dependent, and may not be related to the license notice (if any) found within the file, importing them as LicenseInfoInFile may have unintended side-effects and ultimately lead to inconsistent results on files "shared" among different uploads.