All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] recipetool/create: Scan for SDPX-License-Identifier
@ 2022-02-03 17:07 Saul Wold
  2022-02-03 21:24 ` [OE-core] " Richard Purdie
  2022-02-04  8:11 ` Stefan Herbrechtsmeier
  0 siblings, 2 replies; 8+ messages in thread
From: Saul Wold @ 2022-02-03 17:07 UTC (permalink / raw)
  To: openembedded-core, ticotimo

When a file can not be identified by checksum and they contain an SPDX
License-Identifier tag, use it as the found license.

[YOCTO #14529]

Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags

Signed-off-by: Saul Wold <saul.wold@windriver.com>
---
 scripts/lib/recipetool/create.py | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
index 507a230511..9149c2d94f 100644
--- a/scripts/lib/recipetool/create.py
+++ b/scripts/lib/recipetool/create.py
@@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
     for licfile in sorted(licfiles):
         md5value = bb.utils.md5_file(licfile)
         license = md5sums.get(md5value, None)
+        license_list = []
         if not license:
             license, crunched_md5, lictext = crunch_license(licfile)
             if lictext and not license:
-                license = 'Unknown'
-                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
-                    "and replace `Unknown` with the license:\n" \
-                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
-        if license:
+                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
+                license_list = re.findall(spdx_re, "\n".join(lictext))
+                if not license_list:
+                    license_list.append('Unknown')
+                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
+                        "and replace `Unknown` with the license:\n" \
+                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
+        else:
+            license_list.append(license)
+        for license in license_list:
             licenses.append((license, os.path.relpath(licfile, srctree), md5value))
 
     # FIXME should we grab at least one source file with a license header and add that too?
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-03 17:07 [PATCH] recipetool/create: Scan for SDPX-License-Identifier Saul Wold
@ 2022-02-03 21:24 ` Richard Purdie
  2022-02-03 21:58   ` Saul Wold
  2022-02-04  9:05   ` Stefan Herbrechtsmeier
  2022-02-04  8:11 ` Stefan Herbrechtsmeier
  1 sibling, 2 replies; 8+ messages in thread
From: Richard Purdie @ 2022-02-03 21:24 UTC (permalink / raw)
  To: Saul Wold, openembedded-core, ticotimo

On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
> When a file can not be identified by checksum and they contain an SPDX
> License-Identifier tag, use it as the found license.
> 
> [YOCTO #14529]
> 
> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
> 
> Signed-off-by: Saul Wold <saul.wold@windriver.com>
> ---
>  scripts/lib/recipetool/create.py | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
> index 507a230511..9149c2d94f 100644
> --- a/scripts/lib/recipetool/create.py
> +++ b/scripts/lib/recipetool/create.py
> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
>      for licfile in sorted(licfiles):
>          md5value = bb.utils.md5_file(licfile)
>          license = md5sums.get(md5value, None)
> +        license_list = []
>          if not license:
>              license, crunched_md5, lictext = crunch_license(licfile)
>              if lictext and not license:
> -                license = 'Unknown'
> -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> -                    "and replace `Unknown` with the license:\n" \
> -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> -        if license:
> +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
> +                license_list = re.findall(spdx_re, "\n".join(lictext))
> +                if not license_list:
> +                    license_list.append('Unknown')
> +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> +                        "and replace `Unknown` with the license:\n" \
> +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> +        else:
> +            license_list.append(license)
> +        for license in license_list:
>              licenses.append((license, os.path.relpath(licfile, srctree), md5value))
>  
>      # FIXME should we grab at least one source file with a license header and add that too?

I think to close this bug the code may need to go one step further and
effectively grep over the source tree. 

We'd probably want to list the value of any SPDX-License-Identifier: header
found in any of the source files for the user to then decide upon?

Or am I misunderstanding?

Cheers,

Richard







^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-03 21:24 ` [OE-core] " Richard Purdie
@ 2022-02-03 21:58   ` Saul Wold
  2022-02-03 22:01     ` Richard Purdie
  2022-02-04  9:05   ` Stefan Herbrechtsmeier
  1 sibling, 1 reply; 8+ messages in thread
From: Saul Wold @ 2022-02-03 21:58 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core, ticotimo



On 2/3/22 13:24, Richard Purdie wrote:
> On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
>> When a file can not be identified by checksum and they contain an SPDX
>> License-Identifier tag, use it as the found license.
>>
>> [YOCTO #14529]
>>
>> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
>>
>> Signed-off-by: Saul Wold <saul.wold@windriver.com>
>> ---
>>   scripts/lib/recipetool/create.py | 16 +++++++++++-----
>>   1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
>> index 507a230511..9149c2d94f 100644
>> --- a/scripts/lib/recipetool/create.py
>> +++ b/scripts/lib/recipetool/create.py
>> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
>>       for licfile in sorted(licfiles):
>>           md5value = bb.utils.md5_file(licfile)
>>           license = md5sums.get(md5value, None)
>> +        license_list = []
>>           if not license:
>>               license, crunched_md5, lictext = crunch_license(licfile)
>>               if lictext and not license:
>> -                license = 'Unknown'
>> -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>> -                    "and replace `Unknown` with the license:\n" \
>> -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>> -        if license:
>> +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
>> +                license_list = re.findall(spdx_re, "\n".join(lictext))
>> +                if not license_list:
>> +                    license_list.append('Unknown')
>> +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>> +                        "and replace `Unknown` with the license:\n" \
>> +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>> +        else:
>> +            license_list.append(license)
>> +        for license in license_list:
>>               licenses.append((license, os.path.relpath(licfile, srctree), md5value))
>>   
>>       # FIXME should we grab at least one source file with a license header and add that too?
> 
> I think to close this bug the code may need to go one step further and
> effectively grep over the source tree.
> 
> We'd probably want to list the value of any SPDX-License-Identifier: header
> found in any of the source files for the user to then decide upon?
> 
That's moving in to the create-spdx.bbclass territory I think. The 
change would need to be much larger. and I will likely have to shelve 
for a while.

> Or am I misunderstanding?
>
Maybe it's my misunderstanding, Tim has mentioned the LICENSE related 
files in the bug report.

Sau!


> Cheers,
> 
> Richard
> 
> 
> 
> 
> 

-- 
Sau!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-03 21:58   ` Saul Wold
@ 2022-02-03 22:01     ` Richard Purdie
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Purdie @ 2022-02-03 22:01 UTC (permalink / raw)
  To: Saul Wold, openembedded-core, ticotimo

On Thu, 2022-02-03 at 13:58 -0800, Saul Wold wrote:
> 
> On 2/3/22 13:24, Richard Purdie wrote:
> > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
> > > When a file can not be identified by checksum and they contain an SPDX
> > > License-Identifier tag, use it as the found license.
> > > 
> > > [YOCTO #14529]
> > > 
> > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
> > > 
> > > Signed-off-by: Saul Wold <saul.wold@windriver.com>
> > > ---
> > >   scripts/lib/recipetool/create.py | 16 +++++++++++-----
> > >   1 file changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
> > > index 507a230511..9149c2d94f 100644
> > > --- a/scripts/lib/recipetool/create.py
> > > +++ b/scripts/lib/recipetool/create.py
> > > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
> > >       for licfile in sorted(licfiles):
> > >           md5value = bb.utils.md5_file(licfile)
> > >           license = md5sums.get(md5value, None)
> > > +        license_list = []
> > >           if not license:
> > >               license, crunched_md5, lictext = crunch_license(licfile)
> > >               if lictext and not license:
> > > -                license = 'Unknown'
> > > -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> > > -                    "and replace `Unknown` with the license:\n" \
> > > -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> > > -        if license:
> > > +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
> > > +                license_list = re.findall(spdx_re, "\n".join(lictext))
> > > +                if not license_list:
> > > +                    license_list.append('Unknown')
> > > +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> > > +                        "and replace `Unknown` with the license:\n" \
> > > +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> > > +        else:
> > > +            license_list.append(license)
> > > +        for license in license_list:
> > >               licenses.append((license, os.path.relpath(licfile, srctree), md5value))
> > >   
> > >       # FIXME should we grab at least one source file with a license header and add that too?
> > 
> > I think to close this bug the code may need to go one step further and
> > effectively grep over the source tree.
> > 
> > We'd probably want to list the value of any SPDX-License-Identifier: header
> > found in any of the source files for the user to then decide upon?
> > 
> That's moving in to the create-spdx.bbclass territory I think. The 
> change would need to be much larger. and I will likely have to shelve 
> for a while.

This isn't related to create-spdx.

> 
> > Or am I misunderstanding?
> > 
> Maybe it's my misunderstanding, Tim has mentioned the LICENSE related 
> files in the bug report.

Right, we want to "guess" what the right LICENSE is for the new recipe. To do
that wouldn't we scan all the source for SPDX-License-Identifier: lines in the
headers, add those all together and suggest that as the LICENSE field?

Cheers,

Richard





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-03 17:07 [PATCH] recipetool/create: Scan for SDPX-License-Identifier Saul Wold
  2022-02-03 21:24 ` [OE-core] " Richard Purdie
@ 2022-02-04  8:11 ` Stefan Herbrechtsmeier
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Herbrechtsmeier @ 2022-02-04  8:11 UTC (permalink / raw)
  To: Saul.Wold, openembedded-core, ticotimo

Hi Saul,

Am 03.02.2022 um 18:07 schrieb Saul Wold via lists.openembedded.org:
> When a file can not be identified by checksum and they contain an SPDX
> License-Identifier tag, use it as the found license.
> 
> [YOCTO #14529]
> 
> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags

Can you please give an example for an project with use a 
SPDX-License-Identifier inside a license file.


> Signed-off-by: Saul Wold <saul.wold@windriver.com>
> ---
>   scripts/lib/recipetool/create.py | 16 +++++++++++-----
>   1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
> index 507a230511..9149c2d94f 100644
> --- a/scripts/lib/recipetool/create.py
> +++ b/scripts/lib/recipetool/create.py
> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
>       for licfile in sorted(licfiles):
>           md5value = bb.utils.md5_file(licfile)
>           license = md5sums.get(md5value, None)
> +        license_list = []

Could you please use an other name. We already have licenses and it is 
hard to distinguish the difference between licenses and license_list.

>           if not license:
>               license, crunched_md5, lictext = crunch_license(licfile)
>               if lictext and not license:
> -                license = 'Unknown'
> -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> -                    "and replace `Unknown` with the license:\n" \
> -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> -        if license:
> +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
> +                license_list = re.findall(spdx_re, "\n".join(lictext))
> +                if not license_list:
> +                    license_list.append('Unknown')
> +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> +                        "and replace `Unknown` with the license:\n" \
> +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> +        else:
> +            license_list.append(license)
> +        for license in license_list:
>               licenses.append((license, os.path.relpath(licfile, srctree), md5value))
>   
>       # FIXME should we grab at least one source file with a license header and add that too?

Regards
   Stefan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-03 21:24 ` [OE-core] " Richard Purdie
  2022-02-03 21:58   ` Saul Wold
@ 2022-02-04  9:05   ` Stefan Herbrechtsmeier
  2022-02-04 13:41     ` Richard Purdie
  1 sibling, 1 reply; 8+ messages in thread
From: Stefan Herbrechtsmeier @ 2022-02-04  9:05 UTC (permalink / raw)
  To: openembedded-core

Hi Richard,

Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org:
> On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
>> When a file can not be identified by checksum and they contain an SPDX
>> License-Identifier tag, use it as the found license.
>>
>> [YOCTO #14529]
>>
>> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
>>
>> Signed-off-by: Saul Wold <saul.wold@windriver.com>
>> ---
>>   scripts/lib/recipetool/create.py | 16 +++++++++++-----
>>   1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
>> index 507a230511..9149c2d94f 100644
>> --- a/scripts/lib/recipetool/create.py
>> +++ b/scripts/lib/recipetool/create.py
>> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
>>       for licfile in sorted(licfiles):
>>           md5value = bb.utils.md5_file(licfile)
>>           license = md5sums.get(md5value, None)
>> +        license_list = []
>>           if not license:
>>               license, crunched_md5, lictext = crunch_license(licfile)
>>               if lictext and not license:
>> -                license = 'Unknown'
>> -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>> -                    "and replace `Unknown` with the license:\n" \
>> -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>> -        if license:
>> +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
>> +                license_list = re.findall(spdx_re, "\n".join(lictext))
>> +                if not license_list:
>> +                    license_list.append('Unknown')
>> +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>> +                        "and replace `Unknown` with the license:\n" \
>> +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>> +        else:
>> +            license_list.append(license)
>> +        for license in license_list:
>>               licenses.append((license, os.path.relpath(licfile, srctree), md5value))
>>   
>>       # FIXME should we grab at least one source file with a license header and add that too?
> 
> I think to close this bug the code may need to go one step further and
> effectively grep over the source tree.

Please keep in mind that we need a full license text and not only the 
license name for license compliance. The current function only search 
for license files with license text.

> We'd probably want to list the value of any SPDX-License-Identifier: header
> found in any of the source files for the user to then decide upon?

I think this is an other feature like a license checker because if you 
have a SPDX-License-Identifier without a license text you have a license 
violation.

This brings us to the problem that this code will interpret a file with 
only a SPDX-License-Identifier as a license file with license text.

Regards
   Stefan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-04  9:05   ` Stefan Herbrechtsmeier
@ 2022-02-04 13:41     ` Richard Purdie
  2022-02-04 14:40       ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2022-02-04 13:41 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, openembedded-core

On Fri, 2022-02-04 at 10:05 +0100, Stefan Herbrechtsmeier wrote:
> Hi Richard,
> 
> Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org:
> > On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
> > > When a file can not be identified by checksum and they contain an SPDX
> > > License-Identifier tag, use it as the found license.
> > > 
> > > [YOCTO #14529]
> > > 
> > > Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
> > > 
> > > Signed-off-by: Saul Wold <saul.wold@windriver.com>
> > > ---
> > >   scripts/lib/recipetool/create.py | 16 +++++++++++-----
> > >   1 file changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
> > > index 507a230511..9149c2d94f 100644
> > > --- a/scripts/lib/recipetool/create.py
> > > +++ b/scripts/lib/recipetool/create.py
> > > @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
> > >       for licfile in sorted(licfiles):
> > >           md5value = bb.utils.md5_file(licfile)
> > >           license = md5sums.get(md5value, None)
> > > +        license_list = []
> > >           if not license:
> > >               license, crunched_md5, lictext = crunch_license(licfile)
> > >               if lictext and not license:
> > > -                license = 'Unknown'
> > > -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> > > -                    "and replace `Unknown` with the license:\n" \
> > > -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> > > -        if license:
> > > +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
> > > +                license_list = re.findall(spdx_re, "\n".join(lictext))
> > > +                if not license_list:
> > > +                    license_list.append('Unknown')
> > > +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
> > > +                        "and replace `Unknown` with the license:\n" \
> > > +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
> > > +        else:
> > > +            license_list.append(license)
> > > +        for license in license_list:
> > >               licenses.append((license, os.path.relpath(licfile, srctree), md5value))
> > >   
> > >       # FIXME should we grab at least one source file with a license header and add that too?
> > 
> > I think to close this bug the code may need to go one step further and
> > effectively grep over the source tree.
> 
> Please keep in mind that we need a full license text and not only the 
> license name for license compliance. The current function only search 
> for license files with license text.
> 
> > We'd probably want to list the value of any SPDX-License-Identifier: header
> > found in any of the source files for the user to then decide upon?
> 
> I think this is an other feature like a license checker because if you 
> have a SPDX-License-Identifier without a license text you have a license 
> violation.
> 
> This brings us to the problem that this code will interpret a file with 
> only a SPDX-License-Identifier as a license file with license text.

As I understand it the tool is there to help write a recipe so filling out
LICENSE and highlighting a missing full license text would be a valid approach
for the tool and helpful to the user?

It certainly isn't intended as full validation, just intended to assist the
creation of a recipe.

Cheers,

Richard








^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OE-core] [PATCH] recipetool/create: Scan for SDPX-License-Identifier
  2022-02-04 13:41     ` Richard Purdie
@ 2022-02-04 14:40       ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Herbrechtsmeier @ 2022-02-04 14:40 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core

Am 04.02.2022 um 14:41 schrieb Richard Purdie:
> On Fri, 2022-02-04 at 10:05 +0100, Stefan Herbrechtsmeier wrote:
>> Am 03.02.2022 um 22:24 schrieb Richard Purdie via lists.openembedded.org:
>>> On Thu, 2022-02-03 at 09:07 -0800, Saul Wold wrote:
>>>> When a file can not be identified by checksum and they contain an SPDX
>>>> License-Identifier tag, use it as the found license.
>>>>
>>>> [YOCTO #14529]
>>>>
>>>> Tested with LICENSE files that contain 1 or more SPDX-License-Identifier tags
>>>>
>>>> Signed-off-by: Saul Wold <saul.wold@windriver.com>
>>>> ---
>>>>    scripts/lib/recipetool/create.py | 16 +++++++++++-----
>>>>    1 file changed, 11 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/scripts/lib/recipetool/create.py b/scripts/lib/recipetool/create.py
>>>> index 507a230511..9149c2d94f 100644
>>>> --- a/scripts/lib/recipetool/create.py
>>>> +++ b/scripts/lib/recipetool/create.py
>>>> @@ -1221,14 +1221,20 @@ def guess_license(srctree, d):
>>>>        for licfile in sorted(licfiles):
>>>>            md5value = bb.utils.md5_file(licfile)
>>>>            license = md5sums.get(md5value, None)
>>>> +        license_list = []
>>>>            if not license:
>>>>                license, crunched_md5, lictext = crunch_license(licfile)
>>>>                if lictext and not license:
>>>> -                license = 'Unknown'
>>>> -                logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>>>> -                    "and replace `Unknown` with the license:\n" \
>>>> -                    "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>>>> -        if license:
>>>> +                spdx_re = re.compile('SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
>>>> +                license_list = re.findall(spdx_re, "\n".join(lictext))
>>>> +                if not license_list:
>>>> +                    license_list.append('Unknown')
>>>> +                    logger.info("Please add the following line for '%s' to a 'lib/recipetool/licenses.csv' " \
>>>> +                        "and replace `Unknown` with the license:\n" \
>>>> +                        "%s,Unknown" % (os.path.relpath(licfile, srctree), md5value))
>>>> +        else:
>>>> +            license_list.append(license)
>>>> +        for license in license_list:
>>>>                licenses.append((license, os.path.relpath(licfile, srctree), md5value))
>>>>    
>>>>        # FIXME should we grab at least one source file with a license header and add that too?
>>>
>>> I think to close this bug the code may need to go one step further and
>>> effectively grep over the source tree.
>>
>> Please keep in mind that we need a full license text and not only the
>> license name for license compliance. The current function only search
>> for license files with license text.
>>
>>> We'd probably want to list the value of any SPDX-License-Identifier: header
>>> found in any of the source files for the user to then decide upon?
>>
>> I think this is an other feature like a license checker because if you
>> have a SPDX-License-Identifier without a license text you have a license
>> violation.
>>
>> This brings us to the problem that this code will interpret a file with
>> only a SPDX-License-Identifier as a license file with license text.
> 
> As I understand it the tool is there to help write a recipe so filling out
> LICENSE and highlighting a missing full license text would be a valid approach
> for the tool and helpful to the user?

Yes, but we should distinguish between license files which are guess via 
hash of the content and SPDX-License-Identifier which labels the source 
code’s license. In this case the SPDX-License-Identifier is non-material 
text from a license file and should be filtered out inside 
crunch_license function.

The collection of all used licenses via SPDX-License-Identifier is an 
additional feature and we need a warning if a SPDX-License-Identifier 
exists without license file.

> It certainly isn't intended as full validation, just intended to assist the
> creation of a recipe.

But this patch is an regress because it doesn't distinguish between a 
license file with a known hash and a mostly empty file with a 
SPDX-License-Identifier.

Regards
   Stefan


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-04 14:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 17:07 [PATCH] recipetool/create: Scan for SDPX-License-Identifier Saul Wold
2022-02-03 21:24 ` [OE-core] " Richard Purdie
2022-02-03 21:58   ` Saul Wold
2022-02-03 22:01     ` Richard Purdie
2022-02-04  9:05   ` Stefan Herbrechtsmeier
2022-02-04 13:41     ` Richard Purdie
2022-02-04 14:40       ` Stefan Herbrechtsmeier
2022-02-04  8:11 ` Stefan Herbrechtsmeier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.