* Is curated SPDX data sharing a thing? @ 2020-12-18 20:15 Jérôme Carretero 2020-12-18 20:34 ` Richard Purdie 0 siblings, 1 reply; 4+ messages in thread From: Jérôme Carretero @ 2020-12-18 20:15 UTC (permalink / raw) To: yocto, Richard Purdie, Joshua Watt Hi, Please correct me if I'm wrong but as far as I understand it, as of today the flow for generating SPDX data to build software BoMs, documented eg. in: - https://www.fossology.org/get-started/basic-workflow/ - https://elinux.org/images/2/20/License_Compliance_in_Embedded_Linux_with_the_Yocto_Project.pdf involves building your own database of SPDX files after reviewing all the sources, which doesn't look to be something at reach of most businesses. I am wondering by extension: - Whether there are businesses selling pre-masticated SPDX data (I can imagine one would be willing to pay a little something to obtain a collection of "certified" (or possibly "insured") SPDX); - Whether there are (plans for having) public, collaborative repositories of SPDX data that could be trusted over automatic scans of source. Best regards, -- Jérôme ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Is curated SPDX data sharing a thing? 2020-12-18 20:15 Is curated SPDX data sharing a thing? Jérôme Carretero @ 2020-12-18 20:34 ` Richard Purdie 2020-12-18 21:51 ` [yocto] " Jérôme Carretero 0 siblings, 1 reply; 4+ messages in thread From: Richard Purdie @ 2020-12-18 20:34 UTC (permalink / raw) To: Jérôme Carretero, yocto, Joshua Watt Hi, On Fri, 2020-12-18 at 15:15 -0500, Jérôme Carretero wrote: > Please correct me if I'm wrong but as far as I understand it, as of > today the flow for generating SPDX data to build software BoMs, > documented eg. in: > > - https://www.fossology.org/get-started/basic-workflow/ > - > https://elinux.org/images/2/20/License_Compliance_in_Embedded_Linux_with_the_Yocto_Project.pdf > > involves building your own database of SPDX files after reviewing all > the sources, which doesn't look to be something at reach of most > businesses. The challenge is that Yocto Project lets you build your own custom software, which means you also end up in your own BoM situation. We generally therefore provide tooling that can help you generate the information you need but there usually isn't "one size fits all". I would mention the meta-spdxscanner layer as having support/integration for some of the more recent scanning and document generation tools. > I am wondering by extension: > > - Whether there are businesses selling pre-masticated SPDX data > (I can imagine one would be willing to pay a little something to > obtain a collection of "certified" (or possibly "insured") SPDX); I'm sure there are services provided, particularly by some of the member OSVs but as I mention above, its hard to have a one size fits all since you can patch or reconfigure the sources at will. > - Whether there are (plans for having) public, collaborative > repositories of SPDX data that could be trusted over automatic > scans of source. We are hoping to have better tools integration where the build process may be able to generation better SBoM and SPDX information directly. Unfortunately its an area its hard to find people willing to contribute. Cheers, Richard ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [yocto] Is curated SPDX data sharing a thing? 2020-12-18 20:34 ` Richard Purdie @ 2020-12-18 21:51 ` Jérôme Carretero 2020-12-18 22:23 ` Richard Purdie 0 siblings, 1 reply; 4+ messages in thread From: Jérôme Carretero @ 2020-12-18 21:51 UTC (permalink / raw) To: Richard Purdie; +Cc: yocto On Fri, 18 Dec 2020 20:34:01 +0000 "Richard Purdie" <richard.purdie@linuxfoundation.org> wrote: > The challenge is that Yocto Project lets you build your own custom > software, which means you also end up in your own BoM situation. We > generally therefore provide tooling that can help you generate the > information you need but there usually isn't "one size fits all". Of course different choices can be made regarding obligations (where licenses are shown, how sources are distributed) but it in the same way that today ${LICENSE_DIRECTORY}/${P}/recipeinfo contains a LICENSE key which is very useful figuring out obligations, SPDX could be used to have more information and more trust. In most of my experience, a product mostly contains F/LOSS code from major Yocto/OE layers, maybe a couple of other 3rd party libraries, a couple of patches here and there, and a few 100kSLOC of "original" code; the BoM consists... in an image manifest file. A huge portion of the SPDX data could be reused, to get an almost-complete better BoM. > I would mention the meta-spdxscanner layer as having > support/integration for some of the more recent scanning and document > generation tools. Yeah, I used it. I can see that it mostly works except for the fact that you either spend a lifetime doing source code analysis, or just a few years because you trust the agreement of multiple robots on the license verdict, which only leaves you the ambiguous files to process (and that's time-consuming work). > I'm sure there are services provided, particularly by some of the > member OSVs but as I mention above, its hard to have a one size fits > all since you can patch or reconfigure the sources at will. SPDX data contains package and also source file info (based on hashes), so if a patch is applied, an analysis would only need to concern modified files. Provided a development history and a baseline SPDX available, it would significantly reduce the amount of work one would face. > We are hoping to have better tools integration where the build process > may be able to generation better SBoM and SPDX information directly. > Unfortunately its an area its hard to find people willing to > contribute. It's certainly easy to verify after do_patch (or after do_compile in some cases) that sources correspond to existing SPDX files, or to lookup SPDX files in an external database based on hashes of sources, but automatically generating SPDX: - is very time-consuming and I don't see it as something that one would even do eg. in continuous integration; - is not perfect; I don't think the build process could automatically generate more than "candidate SPDX" information except maybe for a couple of really-clean packages where the developers care about that. Is there is a more focused discussion list on that topic or here is OK? I may have a lot of questions/ideas but don't want to cause off-topic noise. Best, -- Jérôme ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [yocto] Is curated SPDX data sharing a thing? 2020-12-18 21:51 ` [yocto] " Jérôme Carretero @ 2020-12-18 22:23 ` Richard Purdie 0 siblings, 0 replies; 4+ messages in thread From: Richard Purdie @ 2020-12-18 22:23 UTC (permalink / raw) To: Jérôme Carretero; +Cc: yocto, licensing On Fri, 2020-12-18 at 16:51 -0500, Jérôme Carretero wrote: > On Fri, 18 Dec 2020 20:34:01 +0000 > "Richard Purdie" <richard.purdie@linuxfoundation.org> wrote: > > > The challenge is that Yocto Project lets you build your own custom > > software, which means you also end up in your own BoM situation. We > > generally therefore provide tooling that can help you generate the > > information you need but there usually isn't "one size fits all". > > Of course different choices can be made regarding obligations (where > licenses are shown, how sources are distributed) but it in the same > way that today ${LICENSE_DIRECTORY}/${P}/recipeinfo contains a > LICENSE key which is very useful figuring out obligations, SPDX could > be used to have more information and more trust. Its going to take someone to stand up and provide the first "version" of that and I'm not sure anyone wants to step up and be that person/organisation... > In most of my experience, a product mostly contains F/LOSS code from > major Yocto/OE layers, maybe a couple of other 3rd party libraries, a > couple of patches here and there, and a few 100kSLOC of "original" > code; > the BoM consists... in an image manifest file. > > A huge portion of the SPDX data could be reused, to get an > almost-complete better BoM. It does depend on which data we're talking about. You also have the issue that its fine to generate this tons of data but at some point you have to interpret what it means too... > > I would mention the meta-spdxscanner layer as having > > support/integration for some of the more recent scanning and > > document > > generation tools. > > Yeah, I used it. I can see that it mostly works except for the fact > that you either spend a lifetime doing source code analysis, or just > a few years because you trust the agreement of multiple robots on the > license verdict, which only leaves you the ambiguous files to process > (and that's time-consuming work). I watched and helped our older LICENSE field work and I can say its a thankless task which its very hard to get people to do. I fear that the SPDX scans you refer to are so complex it will be hard to do this consistently across the codebase. I'm actually hoping things may go a slightly different route such as ultimately a majority of code having license identifiers in it (we've tried to ensure YP code has them). > > I'm sure there are services provided, particularly by some of the > > member OSVs but as I mention above, its hard to have a one size > > fits > > all since you can patch or reconfigure the sources at will. > > SPDX data contains package and also source file info (based on > hashes), > so if a patch is applied, an analysis would only need to concern > modified files. Provided a development history and a baseline SPDX > available, it would significantly reduce the amount of work one would > face. Sure, how do we get people to build such a baseline though? > > We are hoping to have better tools integration where the build > > process > > may be able to generation better SBoM and SPDX information > > directly. > > Unfortunately its an area its hard to find people willing to > > contribute. > > It's certainly easy to verify after do_patch (or after do_compile in > some cases) that sources correspond to existing SPDX files, or to > lookup SPDX files in an external database based on hashes of sources, > but automatically generating SPDX: > > - is very time-consuming and I don't see it as something that one > would > even do eg. in continuous integration; > - is not perfect; I don't think the build process could automatically > generate more than "candidate SPDX" information except maybe for a > couple of really-clean packages where the developers care about > that. There are certainly ways it could be done, if there are people who agree on a common objective and are willing/able to contribute time to it. > Is there is a more focused discussion list on that topic or here is > OK? > I may have a lot of questions/ideas but don't want to cause off-topic > noise. We did set one up so there is https://lists.yoctoproject.org/g/licensing/topics but it hasn't really taken off (yet?)... Cheers, Richard ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-12-18 22:23 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-18 20:15 Is curated SPDX data sharing a thing? Jérôme Carretero 2020-12-18 20:34 ` Richard Purdie 2020-12-18 21:51 ` [yocto] " Jérôme Carretero 2020-12-18 22:23 ` Richard Purdie
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.