linux-spdx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8
@ 2021-07-03  1:21 Nishanth Menon
  2021-07-07  9:00 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Nishanth Menon @ 2021-07-03  1:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Thomas Gleixner
  Cc: linux-kernel, linux-spdx, Rahul T R, Nishanth Menon

Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
While python will barf at it with:

FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 244, in <module>
    spdx = read_spdxdata(repo)
  File "scripts/spdxcheck.py", line 47, in read_spdxdata
    for l in open(el.path).readlines():
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)

While it is indeed debatable if 'Licensor.' used in the license file
needs unicode quotes, instead, let us force spdxcheck to read utf-8
instead.

Reported-by: Rahul T R <r-ravikumar@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
---
 scripts/spdxcheck.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 3e784cf9f401..ebd06ae642c9 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -44,7 +44,7 @@ def read_spdxdata(repo):
                 continue
 
             exception = None
-            for l in open(el.path).readlines():
+            for l in open(el.path, encoding="utf-8").readlines():
                 if l.startswith('Valid-License-Identifier:'):
                     lid = l.split(':')[1].strip().upper()
                     if lid in spdx.licenses:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8
  2021-07-03  1:21 [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8 Nishanth Menon
@ 2021-07-07  9:00 ` Thomas Gleixner
  2021-07-07 16:53   ` Jonathan Corbet
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2021-07-07  9:00 UTC (permalink / raw)
  To: Nishanth Menon, Greg Kroah-Hartman
  Cc: linux-kernel, linux-spdx, Rahul T R, Nishanth Menon

Nishanth,
On Fri, Jul 02 2021 at 20:21, Nishanth Menon wrote:
> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text

Sigh. Why are people adding such things w/o running this script in the
first place.

> While python will barf at it with:
>
> FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
> Traceback (most recent call last):
>   File "scripts/spdxcheck.py", line 244, in <module>
>     spdx = read_spdxdata(repo)
>   File "scripts/spdxcheck.py", line 47, in read_spdxdata
>     for l in open(el.path).readlines():
>   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>     return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
>
> While it is indeed debatable if 'Licensor.' used in the license file
> needs unicode quotes, instead, let us force spdxcheck to read utf-8
> instead.

s/let us//

Ditto for the $subject. See Documentation/process/ for further enlightment.

> Reported-by: Rahul T R <r-ravikumar@ti.com>
> Signed-off-by: Nishanth Menon <nm@ti.com>

With that fixed:

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8
  2021-07-07  9:00 ` Thomas Gleixner
@ 2021-07-07 16:53   ` Jonathan Corbet
  0 siblings, 0 replies; 3+ messages in thread
From: Jonathan Corbet @ 2021-07-07 16:53 UTC (permalink / raw)
  To: Thomas Gleixner, Nishanth Menon, Greg Kroah-Hartman
  Cc: linux-kernel, linux-spdx, Rahul T R, Nishanth Menon

Thomas Gleixner <tglx@linutronix.de> writes:

> Nishanth,
> On Fri, Jul 02 2021 at 20:21, Nishanth Menon wrote:
>> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
>> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
>
> Sigh. Why are people adding such things w/o running this script in the
> first place.

I have a guess on that front ... there is nothing in our documentation
that says anybody should run it, and the script itself gives no
indication of what it does, when it should be run, or how to run it.
That might just reduce uptake a little bit...:)

I increasingly believe that anything we add to scripts/ should start
with a "usage" header describing why it exists and how to make it do its
thing.  That would be a welcome addition to spdxcheck.py.  Adding
something to Documentation/process/license-rules.html would be a nice
bonus.

Thanks,

jon

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-07 16:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03  1:21 [PATCH] scripts/spdxcheck.py: Lets strictly read license files in utf-8 Nishanth Menon
2021-07-07  9:00 ` Thomas Gleixner
2021-07-07 16:53   ` Jonathan Corbet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).