All of lore.kernel.org
 help / color / mirror / Atom feed
* Is curated SPDX data sharing a thing?
@ 2020-12-18 20:15 Jérôme Carretero
  2020-12-18 20:34 ` Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Jérôme Carretero @ 2020-12-18 20:15 UTC (permalink / raw)
  To: yocto, Richard Purdie, Joshua Watt

Hi,


Please correct me if I'm wrong but as far as I understand it, as of
today the flow for generating SPDX data to build software BoMs,
documented eg. in:

- https://www.fossology.org/get-started/basic-workflow/
- https://elinux.org/images/2/20/License_Compliance_in_Embedded_Linux_with_the_Yocto_Project.pdf

involves building your own database of SPDX files after reviewing all
the sources, which doesn't look to be something at reach of most
businesses.


I am wondering by extension:

- Whether there are businesses selling pre-masticated SPDX data
  (I can imagine one would be willing to pay a little something to
  obtain a collection of "certified" (or possibly "insured") SPDX);

- Whether there are (plans for having) public, collaborative
  repositories of SPDX data that could be trusted over automatic scans
  of source.


Best regards,

-- 
Jérôme

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Is curated SPDX data sharing a thing?
  2020-12-18 20:15 Is curated SPDX data sharing a thing? Jérôme Carretero
@ 2020-12-18 20:34 ` Richard Purdie
  2020-12-18 21:51   ` [yocto] " Jérôme Carretero
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2020-12-18 20:34 UTC (permalink / raw)
  To: Jérôme Carretero, yocto, Joshua Watt

Hi,

On Fri, 2020-12-18 at 15:15 -0500, Jérôme Carretero wrote:
> Please correct me if I'm wrong but as far as I understand it, as of
> today the flow for generating SPDX data to build software BoMs,
> documented eg. in:
> 
> - https://www.fossology.org/get-started/basic-workflow/
> - 
> https://elinux.org/images/2/20/License_Compliance_in_Embedded_Linux_with_the_Yocto_Project.pdf
> 
> involves building your own database of SPDX files after reviewing all
> the sources, which doesn't look to be something at reach of most
> businesses.

The challenge is that Yocto Project lets you build your own custom
software, which means you also end up in your own BoM situation. We
generally therefore provide tooling that can help you generate the
information you need but there usually isn't "one size fits all".

I would mention the meta-spdxscanner layer as having
support/integration for some of the more recent scanning and document
generation tools.

> I am wondering by extension:
> 
> - Whether there are businesses selling pre-masticated SPDX data
>   (I can imagine one would be willing to pay a little something to
>   obtain a collection of "certified" (or possibly "insured") SPDX);

I'm sure there are services provided, particularly by some of the
member OSVs but as I mention above, its hard to have a one size fits
all since you can patch or reconfigure the sources at will.

> - Whether there are (plans for having) public, collaborative
>   repositories of SPDX data that could be trusted over automatic
> scans of source.

We are hoping to have better tools integration where the build process
may be able to generation better SBoM and SPDX information directly.
Unfortunately its an area its hard to find people willing to
contribute.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [yocto] Is curated SPDX data sharing a thing?
  2020-12-18 20:34 ` Richard Purdie
@ 2020-12-18 21:51   ` Jérôme Carretero
  2020-12-18 22:23     ` Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Jérôme Carretero @ 2020-12-18 21:51 UTC (permalink / raw)
  To: Richard Purdie; +Cc: yocto

On Fri, 18 Dec 2020 20:34:01 +0000
"Richard Purdie" <richard.purdie@linuxfoundation.org> wrote:

> The challenge is that Yocto Project lets you build your own custom
> software, which means you also end up in your own BoM situation. We
> generally therefore provide tooling that can help you generate the
> information you need but there usually isn't "one size fits all".

Of course different choices can be made regarding obligations (where
licenses are shown, how sources are distributed) but it in the same way
that today ${LICENSE_DIRECTORY}/${P}/recipeinfo contains a LICENSE key
which is very useful figuring out obligations, SPDX could be used to
have more information and more trust.

In most of my experience, a product mostly contains F/LOSS code from
major Yocto/OE layers, maybe a couple of other 3rd party libraries, a
couple of patches here and there, and a few 100kSLOC of "original" code;
the BoM consists... in an image manifest file.

A huge portion of the SPDX data could be reused, to get an
almost-complete better BoM.

> I would mention the meta-spdxscanner layer as having
> support/integration for some of the more recent scanning and document
> generation tools.

Yeah, I used it. I can see that it mostly works except for the fact
that you either spend a lifetime doing source code analysis, or just a
few years because you trust the agreement of multiple robots on the
license verdict, which only leaves you the ambiguous files to process
(and that's time-consuming work).

> I'm sure there are services provided, particularly by some of the
> member OSVs but as I mention above, its hard to have a one size fits
> all since you can patch or reconfigure the sources at will.

SPDX data contains package and also source file info (based on hashes),
so if a patch is applied, an analysis would only need to concern
modified files. Provided a development history and a baseline SPDX
available, it would significantly reduce the amount of work one would
face.

> We are hoping to have better tools integration where the build process
> may be able to generation better SBoM and SPDX information directly.
> Unfortunately its an area its hard to find people willing to
> contribute.

It's certainly easy to verify after do_patch (or after do_compile in
some cases) that sources correspond to existing SPDX files, or to
lookup SPDX files in an external database based on hashes of sources,
but automatically generating SPDX:

- is very time-consuming and I don't see it as something that one would
  even do eg. in continuous integration;
- is not perfect; I don't think the build process could automatically
  generate more than "candidate SPDX" information except maybe for a
  couple of really-clean packages where the developers care about that.

Is there is a more focused discussion list on that topic or here is OK?
I may have a lot of questions/ideas but don't want to cause off-topic
noise.


Best,

-- 
Jérôme

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [yocto] Is curated SPDX data sharing a thing?
  2020-12-18 21:51   ` [yocto] " Jérôme Carretero
@ 2020-12-18 22:23     ` Richard Purdie
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Purdie @ 2020-12-18 22:23 UTC (permalink / raw)
  To: Jérôme Carretero; +Cc: yocto, licensing

On Fri, 2020-12-18 at 16:51 -0500, Jérôme Carretero wrote:
> On Fri, 18 Dec 2020 20:34:01 +0000
> "Richard Purdie" <richard.purdie@linuxfoundation.org> wrote:
> 
> > The challenge is that Yocto Project lets you build your own custom
> > software, which means you also end up in your own BoM situation. We
> > generally therefore provide tooling that can help you generate the
> > information you need but there usually isn't "one size fits all".
> 
> Of course different choices can be made regarding obligations (where
> licenses are shown, how sources are distributed) but it in the same
> way that today ${LICENSE_DIRECTORY}/${P}/recipeinfo contains a
> LICENSE key which is very useful figuring out obligations, SPDX could
> be used to have more information and more trust.

Its going to take someone to stand up and provide the first "version"
of that and I'm not sure anyone wants to step up and be that
person/organisation...

> In most of my experience, a product mostly contains F/LOSS code from
> major Yocto/OE layers, maybe a couple of other 3rd party libraries, a
> couple of patches here and there, and a few 100kSLOC of "original"
> code;
> the BoM consists... in an image manifest file.
> 
> A huge portion of the SPDX data could be reused, to get an
> almost-complete better BoM.

It does depend on which data we're talking about. You also have the
issue that its fine to generate this tons of data but at some point you
have to interpret what it means too...

> > I would mention the meta-spdxscanner layer as having
> > support/integration for some of the more recent scanning and
> > document
> > generation tools.
> 
> Yeah, I used it. I can see that it mostly works except for the fact
> that you either spend a lifetime doing source code analysis, or just
> a few years because you trust the agreement of multiple robots on the
> license verdict, which only leaves you the ambiguous files to process
> (and that's time-consuming work).

I watched and helped our older LICENSE field work and I can say its a
thankless task which its very hard to get people to do. I fear that the
SPDX scans you refer to are so complex it will be hard to do this
consistently across the codebase. I'm actually hoping things may go a
slightly different route such as ultimately a majority of code having
license identifiers in it (we've tried to ensure YP code has them).

> > I'm sure there are services provided, particularly by some of the
> > member OSVs but as I mention above, its hard to have a one size
> > fits
> > all since you can patch or reconfigure the sources at will.
> 
> SPDX data contains package and also source file info (based on
> hashes),
> so if a patch is applied, an analysis would only need to concern
> modified files. Provided a development history and a baseline SPDX
> available, it would significantly reduce the amount of work one would
> face.

Sure, how do we get people to build such a baseline though?

> > We are hoping to have better tools integration where the build
> > process
> > may be able to generation better SBoM and SPDX information
> > directly.
> > Unfortunately its an area its hard to find people willing to
> > contribute.
> 
> It's certainly easy to verify after do_patch (or after do_compile in
> some cases) that sources correspond to existing SPDX files, or to
> lookup SPDX files in an external database based on hashes of sources,
> but automatically generating SPDX:
> 
> - is very time-consuming and I don't see it as something that one
> would
>   even do eg. in continuous integration;
> - is not perfect; I don't think the build process could automatically
>   generate more than "candidate SPDX" information except maybe for a
>   couple of really-clean packages where the developers care about
> that.

There are certainly ways it could be done, if there are people who
agree on a common objective and are willing/able to contribute time to
it.

> Is there is a more focused discussion list on that topic or here is
> OK?
> I may have a lot of questions/ideas but don't want to cause off-topic
> noise.

We did set one up so there is 
https://lists.yoctoproject.org/g/licensing/topics but it hasn't really
taken off (yet?)...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-18 22:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 20:15 Is curated SPDX data sharing a thing? Jérôme Carretero
2020-12-18 20:34 ` Richard Purdie
2020-12-18 21:51   ` [yocto] " Jérôme Carretero
2020-12-18 22:23     ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.