Adding more information to the SBOM

* Adding more information to the SBOM
@ 2022-09-14 14:16 Marta Rybczynska
  2022-09-14 14:56 ` Joshua Watt
  2022-09-15 12:16 ` Richard Purdie
  0 siblings, 2 replies; 11+ messages in thread
From: Marta Rybczynska @ 2022-09-14 14:16 UTC (permalink / raw)
  To: OE-core, openembedded-architecture, Joshua Watt

Dear all,
(cross-posting to oe-core and *-architecture)
In the last months, we have worked in Oniro on using the create-spdx
class for both IP compliance and security.

During this work, Alberto Pianon has found that some information is
missing from the SBOM and it does not contain enough for Software
Composition Analysis. The main missing point is the relation between
the actual upstream sources and the final binaries (create-spdx uses
composite sources).

Alberto has worked on how to obtain the missing data and now has a
POC. This POC provides full source-to-binary tracking of Yocto builds
through a couple of scripts (intended to be transformed into a new
bbclass at a later stage). The goal is to add the missing pieces of
information in order to get a "real" SBOM from Yocto, which should, at
a minimum:

- carefully describe what is found in a final image (i.e. binary files
and their dependencies), since that is what is actually distributed
and goes into the final product;
- describe how such binary files have been generated and where they
come from (i.e. upstream sources, including patches and other stuff
added from meta-layers); provenance is important for a number of
reasons related to IP Compliance and security.

The aim is to become able to:

- map binaries to their corresponding upstream source packages (and
not to the "internal" source packages created by recipes by combining
multiple upstream sources and patches)
- map binaries to the source files that have been actually used to
build them - which usually are a small subset of the whole source
package

With respect to IP compliance, this would allow to, among other things:

- get the real license text for each binary file, by getting the
license of the specific source files it has been generated from
(provided by Fossology, for instance), - and not the main license
stated in the corresponding recipe (which may be as confusing as
GPL-2.0-or-later & LGPL-2.1-or-later & BSD-3-Clause & BSD-4-Clause, or
even worse)
- automatically check license incompatibilities at the binary file level.

Other possible interesting things could be done also on the security side.

This work intends to add a way to provide additional data that can be
used by create-spdx, not to replace create-spdx in any way.

The sources with a long README are available at
https://gitlab.eclipse.org/eclipse/oniro-compliancetoolchain/toolchain/tinfoilhat/-/tree/srctracker/srctracker

What do you think of this work? Would it be of interest to integrate
into YP at some point? Shall we discuss this?

Marta and Alberto

^ permalink raw reply	[flat|nested] 11+ messages in thread