linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
@ 2018-12-12 13:12 Thierry Reding
  2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding
  2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline
  0 siblings, 2 replies; 8+ messages in thread
From: Thierry Reding @ 2018-12-12 13:12 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner
  Cc: Jonathan Corbet, Joe Perches, Jeremy Cline,
	Uwe Kleine-König, linux-kernel

From: Thierry Reding <treding@nvidia.com>

The spdxcheck script currently falls over when confronted with a binary
file (such as Documentation/logo.gif). To avoid that, always open files
in binary mode and decode line-by-line, ignoring encoding errors.

One tricky case is when piping data into the script and reading it from
standard input. By default, standard input will be opened in text mode,
so we need to reopen it in binary mode.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 scripts/spdxcheck.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 5056fb3b897d..e559c6294c39 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -168,6 +168,7 @@ class id_parser(object):
         self.curline = 0
         try:
             for line in fd:
+                line = line.decode(locale.getpreferredencoding(False), errors='ignore')
                 self.curline += 1
                 if self.curline > maxlines:
                     break
@@ -249,12 +250,13 @@ if __name__ == '__main__':
 
     try:
         if len(args.path) and args.path[0] == '-':
-            parser.parse_lines(sys.stdin, args.maxlines, '-')
+            stdin = os.fdopen(sys.stdin.fileno(), 'rb')
+            parser.parse_lines(stdin, args.maxlines, '-')
         else:
             if args.path:
                 for p in args.path:
                     if os.path.isfile(p):
-                        parser.parse_lines(open(p), args.maxlines, p)
+                        parser.parse_lines(open(p, 'rb'), args.maxlines, p)
                     elif os.path.isdir(p):
                         scan_git_subtree(repo.head.reference.commit.tree, p)
                     else:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] scripts: Add spdxcheck.py self test
  2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding
@ 2018-12-12 13:12 ` Thierry Reding
  2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline
  1 sibling, 0 replies; 8+ messages in thread
From: Thierry Reding @ 2018-12-12 13:12 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner
  Cc: Jonathan Corbet, Joe Perches, Jeremy Cline,
	Uwe Kleine-König, linux-kernel

From: Thierry Reding <treding@nvidia.com>

Add a script that will run spdxcheck.py through a couple of self tests
to simplify validation in the future. The tests are run for both Python
2 and Python 3 to make sure all changes to the script remain compatible
across both versions.

The script tests a regular text file (Makefile) for basic sanity checks
and then runs it on a binary file (Documentation/logo.gif) to make sure
it works in both cases. It also tests opening files passed on the
command line as well as piped files read from standard input. Finally a
run on the complete tree will be performed to catch any other potential
issues.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 scripts/spdxcheck-test.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100755 scripts/spdxcheck-test.sh

diff --git a/scripts/spdxcheck-test.sh b/scripts/spdxcheck-test.sh
new file mode 100755
index 000000000000..cfea6a0d1cc0
--- /dev/null
+++ b/scripts/spdxcheck-test.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+for PYTHON in python2 python3; do
+	# run check on a text and a binary file
+	for FILE in Makefile Documentation/logo.gif; do
+		$PYTHON scripts/spdxcheck.py $FILE
+		$PYTHON scripts/spdxcheck.py - < $FILE
+	done
+
+	# run check on complete tree to catch any other issues
+	$PYTHON scripts/spdxcheck.py > /dev/null
+done
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding
  2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding
@ 2018-12-12 18:14 ` Jeremy Cline
  2018-12-13  2:51   ` Joe Perches
  2018-12-13  7:37   ` Uwe Kleine-König
  1 sibling, 2 replies; 8+ messages in thread
From: Jeremy Cline @ 2018-12-12 18:14 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Andrew Morton, Thomas Gleixner, Jonathan Corbet, Joe Perches,
	Uwe Kleine-König, linux-kernel

Hi,

On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> The spdxcheck script currently falls over when confronted with a binary
> file (such as Documentation/logo.gif). To avoid that, always open files
> in binary mode and decode line-by-line, ignoring encoding errors.
> 
> One tricky case is when piping data into the script and reading it from
> standard input. By default, standard input will be opened in text mode,
> so we need to reopen it in binary mode.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  scripts/spdxcheck.py | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
> index 5056fb3b897d..e559c6294c39 100755
> --- a/scripts/spdxcheck.py
> +++ b/scripts/spdxcheck.py
> @@ -168,6 +168,7 @@ class id_parser(object):
>          self.curline = 0
>          try:
>              for line in fd:
> +                line = line.decode(locale.getpreferredencoding(False), errors='ignore')
>                  self.curline += 1
>                  if self.curline > maxlines:
>                      break
> @@ -249,12 +250,13 @@ if __name__ == '__main__':
>  
>      try:
>          if len(args.path) and args.path[0] == '-':
> -            parser.parse_lines(sys.stdin, args.maxlines, '-')
> +            stdin = os.fdopen(sys.stdin.fileno(), 'rb')
> +            parser.parse_lines(stdin, args.maxlines, '-')
>          else:
>              if args.path:
>                  for p in args.path:
>                      if os.path.isfile(p):
> -                        parser.parse_lines(open(p), args.maxlines, p)
> +                        parser.parse_lines(open(p, 'rb'), args.maxlines, p)
>                      elif os.path.isdir(p):
>                          scan_git_subtree(repo.head.reference.commit.tree, p)
>                      else:
> -- 
> 2.19.1
> 

It might be worth noting this fixes commit 6f4d29df66ac
("scripts/spdxcheck.py: make python3 compliant") and also Cc this for
stable since 6f4d29df66ac got backported to v4.19. While that commit
did indeed make the script work with Python 3 for piping data, it broke
Python 2 and made its way to stable.

Reviewed-by: Jeremy Cline <jcline@redhat.com>

Regards,
Jeremy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline
@ 2018-12-13  2:51   ` Joe Perches
  2018-12-13 16:14     ` Thierry Reding
  2018-12-13  7:37   ` Uwe Kleine-König
  1 sibling, 1 reply; 8+ messages in thread
From: Joe Perches @ 2018-12-13  2:51 UTC (permalink / raw)
  To: Jeremy Cline, Thierry Reding
  Cc: Andrew Morton, Thomas Gleixner, Jonathan Corbet,
	Uwe Kleine-König, linux-kernel

On Wed, 2018-12-12 at 13:14 -0500, Jeremy Cline wrote:
> It might be worth noting this fixes commit 6f4d29df66ac
> ("scripts/spdxcheck.py: make python3 compliant")

umm, how does it do that?

> and also Cc this for
> stable since 6f4d29df66ac got backported to v4.19. While that commit
> did indeed make the script work with Python 3 for piping data, it broke
> Python 2 and made its way to stable.

I think attempting to use spdxcheck on binary files
is not useful in the first place.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline
  2018-12-13  2:51   ` Joe Perches
@ 2018-12-13  7:37   ` Uwe Kleine-König
  2018-12-13 15:10     ` Jeremy Cline
  1 sibling, 1 reply; 8+ messages in thread
From: Uwe Kleine-König @ 2018-12-13  7:37 UTC (permalink / raw)
  To: Jeremy Cline
  Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet,
	Joe Perches, linux-kernel

On Wed, Dec 12, 2018 at 01:14:10PM -0500, Jeremy Cline wrote:
> Hi,
> 
> On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > The spdxcheck script currently falls over when confronted with a binary
> > file (such as Documentation/logo.gif). To avoid that, always open files
> > in binary mode and decode line-by-line, ignoring encoding errors.

I suggest pointing out that the breakage only happens with python3 and
results in a UnicodeDecodeError.

> > One tricky case is when piping data into the script and reading it from
> > standard input. By default, standard input will be opened in text mode,
> > so we need to reopen it in binary mode.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  scripts/spdxcheck.py | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
> > index 5056fb3b897d..e559c6294c39 100755
> > --- a/scripts/spdxcheck.py
> > +++ b/scripts/spdxcheck.py
> > @@ -168,6 +168,7 @@ class id_parser(object):
> >          self.curline = 0
> >          try:
> >              for line in fd:
> > +                line = line.decode(locale.getpreferredencoding(False), errors='ignore')
> >                  self.curline += 1
> >                  if self.curline > maxlines:
> >                      break
> > @@ -249,12 +250,13 @@ if __name__ == '__main__':
> >  
> >      try:
> >          if len(args.path) and args.path[0] == '-':
> > -            parser.parse_lines(sys.stdin, args.maxlines, '-')
> > +            stdin = os.fdopen(sys.stdin.fileno(), 'rb')
> > +            parser.parse_lines(stdin, args.maxlines, '-')
> >          else:
> >              if args.path:
> >                  for p in args.path:
> >                      if os.path.isfile(p):
> > -                        parser.parse_lines(open(p), args.maxlines, p)
> > +                        parser.parse_lines(open(p, 'rb'), args.maxlines, p)
> >                      elif os.path.isdir(p):
> >                          scan_git_subtree(repo.head.reference.commit.tree, p)
> >                      else:
> > -- 
> > 2.19.1
> > 
> 
> It might be worth noting this fixes commit 6f4d29df66ac
> ("scripts/spdxcheck.py: make python3 compliant") and also Cc this for
> stable since 6f4d29df66ac got backported to v4.19. While that commit
> did indeed make the script work with Python 3 for piping data, it broke
> Python 2 and made its way to stable.

It didn't break for me. Can you provide details about how and when it
broke for you?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-13  7:37   ` Uwe Kleine-König
@ 2018-12-13 15:10     ` Jeremy Cline
  2018-12-13 19:40       ` Uwe Kleine-König
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Cline @ 2018-12-13 15:10 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet,
	Joe Perches, linux-kernel

On Thu, Dec 13, 2018 at 08:37:08AM +0100, Uwe Kleine-König wrote:
> On Wed, Dec 12, 2018 at 01:14:10PM -0500, Jeremy Cline wrote:
> > Hi,
> > 
> > On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > > 
> > > The spdxcheck script currently falls over when confronted with a binary
> > > file (such as Documentation/logo.gif). To avoid that, always open files
> > > in binary mode and decode line-by-line, ignoring encoding errors.
> 
> I suggest pointing out that the breakage only happens with python3 and
> results in a UnicodeDecodeError.
> 
> > > One tricky case is when piping data into the script and reading it from
> > > standard input. By default, standard input will be opened in text mode,
> > > so we need to reopen it in binary mode.
> > > 
> > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > > ---
> > >  scripts/spdxcheck.py | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
> > > index 5056fb3b897d..e559c6294c39 100755
> > > --- a/scripts/spdxcheck.py
> > > +++ b/scripts/spdxcheck.py
> > > @@ -168,6 +168,7 @@ class id_parser(object):
> > >          self.curline = 0
> > >          try:
> > >              for line in fd:
> > > +                line = line.decode(locale.getpreferredencoding(False), errors='ignore')
> > >                  self.curline += 1
> > >                  if self.curline > maxlines:
> > >                      break
> > > @@ -249,12 +250,13 @@ if __name__ == '__main__':
> > >  
> > >      try:
> > >          if len(args.path) and args.path[0] == '-':
> > > -            parser.parse_lines(sys.stdin, args.maxlines, '-')
> > > +            stdin = os.fdopen(sys.stdin.fileno(), 'rb')
> > > +            parser.parse_lines(stdin, args.maxlines, '-')
> > >          else:
> > >              if args.path:
> > >                  for p in args.path:
> > >                      if os.path.isfile(p):
> > > -                        parser.parse_lines(open(p), args.maxlines, p)
> > > +                        parser.parse_lines(open(p, 'rb'), args.maxlines, p)
> > >                      elif os.path.isdir(p):
> > >                          scan_git_subtree(repo.head.reference.commit.tree, p)
> > >                      else:
> > > -- 
> > > 2.19.1
> > > 
> > 
> > It might be worth noting this fixes commit 6f4d29df66ac
> > ("scripts/spdxcheck.py: make python3 compliant") and also Cc this for
> > stable since 6f4d29df66ac got backported to v4.19. While that commit
> > did indeed make the script work with Python 3 for piping data, it broke
> > Python 2 and made its way to stable.
> 
> It didn't break for me. Can you provide details about how and when it
> broke for you?

I was wrong about it being Python 2 that broke, sorry about that.
6f4d29df66ac broke Python 3 when you run it against a sub-tree because
scan_git_tree() opens the files in binary mode, but then find is run
with a text string:

$ python3 scripts/spdxcheck.py net/
FAIL: argument should be integer or bytes-like object, not 'str'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 259, in <module>
    scan_git_subtree(repo.head.reference.commit.tree, p)
  File "scripts/spdxcheck.py", line 211, in scan_git_subtree
    scan_git_tree(tree)
  File "scripts/spdxcheck.py", line 206, in scan_git_tree
    parser.parse_lines(fd, args.maxlines, el.path)
  File "scripts/spdxcheck.py", line 175, in parse_lines
    if line.find("SPDX-License-Identifier:") < 0:
TypeError: argument should be integer or bytes-like object, not 'str'

The reason I opened things in binary mode when I started adding Python 3
support was because not all files were valid UTF-8 (and some were
binary) so I decoded the text line-by-line and ignored any decoding
errors for simplicity's sake.

Regards,
Jeremy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-13  2:51   ` Joe Perches
@ 2018-12-13 16:14     ` Thierry Reding
  0 siblings, 0 replies; 8+ messages in thread
From: Thierry Reding @ 2018-12-13 16:14 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jeremy Cline, Andrew Morton, Thomas Gleixner, Jonathan Corbet,
	Uwe Kleine-König, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On Wed, Dec 12, 2018 at 06:51:19PM -0800, Joe Perches wrote:
> On Wed, 2018-12-12 at 13:14 -0500, Jeremy Cline wrote:
> > It might be worth noting this fixes commit 6f4d29df66ac
> > ("scripts/spdxcheck.py: make python3 compliant")
> 
> umm, how does it do that?
> 
> > and also Cc this for
> > stable since 6f4d29df66ac got backported to v4.19. While that commit
> > did indeed make the script work with Python 3 for piping data, it broke
> > Python 2 and made its way to stable.
> 
> I think attempting to use spdxcheck on binary files
> is not useful in the first place.

Agreed. However, if run without arguments the tool will walk the entire
tree and inevitably run into binary files. I realize that this is kind
of a lame way of dealing with binary files, but the alternatives would
be much more complicated and involve dealing with mimetypes and such.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode
  2018-12-13 15:10     ` Jeremy Cline
@ 2018-12-13 19:40       ` Uwe Kleine-König
  0 siblings, 0 replies; 8+ messages in thread
From: Uwe Kleine-König @ 2018-12-13 19:40 UTC (permalink / raw)
  To: Jeremy Cline
  Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet,
	Joe Perches, linux-kernel

On Thu, Dec 13, 2018 at 10:10:52AM -0500, Jeremy Cline wrote:
> On Thu, Dec 13, 2018 at 08:37:08AM +0100, Uwe Kleine-König wrote:
> > It didn't break for me. Can you provide details about how and when it
> > broke for you?
> 
> I was wrong about it being Python 2 that broke, sorry about that.
> 6f4d29df66ac broke Python 3 when you run it against a sub-tree because
> scan_git_tree() opens the files in binary mode, but then find is run
> with a text string:
> 
> $ python3 scripts/spdxcheck.py net/
> FAIL: argument should be integer or bytes-like object, not 'str'
> Traceback (most recent call last):
>   File "scripts/spdxcheck.py", line 259, in <module>
>     scan_git_subtree(repo.head.reference.commit.tree, p)
>   File "scripts/spdxcheck.py", line 211, in scan_git_subtree
>     scan_git_tree(tree)
>   File "scripts/spdxcheck.py", line 206, in scan_git_tree
>     parser.parse_lines(fd, args.maxlines, el.path)
>   File "scripts/spdxcheck.py", line 175, in parse_lines
>     if line.find("SPDX-License-Identifier:") < 0:
> TypeError: argument should be integer or bytes-like object, not 'str'
> 
> The reason I opened things in binary mode when I started adding Python 3
> support was because not all files were valid UTF-8 (and some were
> binary) so I decoded the text line-by-line and ignored any decoding
> errors for simplicity's sake.

OK I understand. The problem is that there are inconsistencies
in handling files as binaries or not that already existed before
6f4d29df66ac. Different code paths result in a different type for line
depending on how fd was opened. I fixed the cases where fd was opened
as text file and broke the cases where it was opened as binary.

So changing this to consistently using binary mode (as the patch by
Thierry does) seems the right thing to do. 

Thanks
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-13 19:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding
2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding
2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline
2018-12-13  2:51   ` Joe Perches
2018-12-13 16:14     ` Thierry Reding
2018-12-13  7:37   ` Uwe Kleine-König
2018-12-13 15:10     ` Jeremy Cline
2018-12-13 19:40       ` Uwe Kleine-König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).