linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
@ 2022-05-16 10:27 Thomas Gleixner
  2022-05-16 10:27 ` [patch 1/9] scripts/spdxcheck: Add percentage to statistics Thomas Gleixner
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

spdxcheck -v output is just providing basic statistics, but lacks per
directory statistics.

Finding files without SPDX identifiers is cumbersome with spdxcheck, though
it has all the information required.

The exclude of files and directories is hardcoded in the script which makes
it hard to maintain and the information cannot be accessed by external tools.

The following series addresses this by adding:

 1) Directory statistics

    Incomplete directories: SPDX in Files
    ./                               :     6 of    13   46%
    ./Documentation                  :  4096 of  8451   48%
    ./arch                           : 13476 of 16402   82%
    ./block                          :   100 of   101   99%
    ./certs                          :    11 of    14   78%
    ./crypto                         :   145 of   176   82%
    ./drivers                        : 24682 of 30745   80%

 2) The ability to show files without SPDX

    Files without SPDX:
    ./kernel/cpu.c
    ./kernel/kmod.c
    ./kernel/relay.c

 3) A file based handling for exclude patterns

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 1/9] scripts/spdxcheck: Add percentage to statistics
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 2/9] scripts/spdxcheck: Add directory statistics Thomas Gleixner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Files checked:            75856
Lines checked:           294516
Files with SPDX:          59410  78%
Files with errors:            0

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxcheck.py |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -285,7 +285,9 @@ import os
                 sys.stderr.write('\n')
                 sys.stderr.write('Files checked:     %12d\n' %parser.checked)
                 sys.stderr.write('Lines checked:     %12d\n' %parser.lines_checked)
-                sys.stderr.write('Files with SPDX:   %12d\n' %parser.spdx_valid)
+                if parser.checked:
+                    pc = int(100 * parser.spdx_valid / parser.checked)
+                    sys.stderr.write('Files with SPDX:   %12d %3d%%\n' %(parser.spdx_valid, pc))
                 sys.stderr.write('Files with errors: %12d\n' %parser.spdx_errors)
 
             sys.exit(0)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 2/9] scripts/spdxcheck: Add directory statistics
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
  2022-05-16 10:27 ` [patch 1/9] scripts/spdxcheck: Add percentage to statistics Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 3/9] scripts/spdxcheck: Add [sub]directory statistics Thomas Gleixner
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

For better insights.

Directories accounted:     4646
Directories complete:      2565  55%

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxcheck.py |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -28,6 +28,15 @@ import os
         self.licenses = [ ]
         self.exceptions = { }
 
+class dirinfo(object):
+    def __init__(self):
+        self.missing = 0
+        self.total = 0
+
+    def update(self, miss):
+        self.total += 1
+        self.missing += miss
+
 # Read the spdx data from the LICENSES directory
 def read_spdxdata(repo):
 
@@ -93,6 +102,7 @@ import os
         self.checked = 0
         self.spdx_valid = 0
         self.spdx_errors = 0
+        self.spdx_dirs = {}
         self.curline = 0
         self.deepest = 0
 
@@ -167,6 +177,7 @@ import os
     def parse_lines(self, fd, maxlines, fname):
         self.checked += 1
         self.curline = 0
+        fail = 1
         try:
             for line in fd:
                 line = line.decode(locale.getpreferredencoding(False), errors='ignore')
@@ -192,6 +203,7 @@ import os
                 # Should we check for more SPDX ids in the same file and
                 # complain if there are any?
                 #
+                fail = 0
                 break
 
         except ParserException as pe:
@@ -203,6 +215,11 @@ import os
                 sys.stdout.write('%s: %d:0 %s\n' %(fname, self.curline, pe.txt))
             self.spdx_errors += 1
 
+        base = os.path.dirname(fname)
+        di = self.spdx_dirs.get(base, dirinfo())
+        di.update(fail)
+        self.spdx_dirs[base] = di
+
 def scan_git_tree(tree):
     for el in tree.traverse():
         # Exclude stuff which would make pointless noise
@@ -289,6 +306,16 @@ import os
                     pc = int(100 * parser.spdx_valid / parser.checked)
                     sys.stderr.write('Files with SPDX:   %12d %3d%%\n' %(parser.spdx_valid, pc))
                 sys.stderr.write('Files with errors: %12d\n' %parser.spdx_errors)
+                ndirs = len(parser.spdx_dirs)
+                dirsok = 0
+                if ndirs:
+                    sys.stderr.write('\n')
+                    sys.stderr.write('Directories accounted: %8d\n' %ndirs)
+                    for di in parser.spdx_dirs.values():
+                        if not di.missing:
+                            dirsok += 1
+                    pc = int(100 * dirsok / ndirs)
+                    sys.stderr.write('Directories complete:  %8d %3d%%\n' %(dirsok, pc))
 
             sys.exit(0)
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 3/9] scripts/spdxcheck: Add [sub]directory statistics
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
  2022-05-16 10:27 ` [patch 1/9] scripts/spdxcheck: Add percentage to statistics Thomas Gleixner
  2022-05-16 10:27 ` [patch 2/9] scripts/spdxcheck: Add directory statistics Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 4/9] scripts/spdxcheck: Add option to display files without SPDX Thomas Gleixner
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Add functionality to display [sub]directory statistics. This is enabled by
adding '-d' to the command line. The optional -D parameter allows to limit
the directory depth. If supplied the subdirectories are accumulated

# scripts/spdxcheck.py -d kernel/
Incomplete directories: SPDX in Files
    ./kernel                         :   111 of   114   97%
    ./kernel/bpf                     :    43 of    45   95%
    ./kernel/bpf/preload             :     4 of     5   80%
    ./kernel/bpf/preload/iterators   :     4 of     5   80%
    ./kernel/cgroup                  :    10 of    13   76%
    ./kernel/configs                 :     0 of     9    0%
    ./kernel/debug                   :     3 of     4   75%
    ./kernel/debug/kdb               :     1 of    11    9%
    ./kernel/locking                 :    29 of    32   90%
    ./kernel/sched                   :    38 of    39   97%

The result can be accumulated by restricting the depth via the new command
line option '-d $DEPTH':

# scripts/spdxcheck.py -d -D1
Incomplete directories: SPDX in Files
    ./                               :     6 of    13   46%
    ./Documentation                  :  4096 of  8451   48%
    ./arch                           : 13476 of 16402   82%
    ./block                          :   100 of   101   99%
    ./certs                          :    11 of    14   78%
    ./crypto                         :   145 of   176   82%
    ./drivers                        : 24682 of 30745   80%
    ./fs                             :  1876 of  2110   88%
    ./include                        :  5175 of  5757   89%
    ./ipc                            :    12 of    13   92%
    ./kernel                         :   493 of   527   93%
    ./lib                            :   393 of   524   75%
    ./mm                             :   151 of   159   94%
    ./net                            :  1713 of  1900   90%
    ./samples                        :   211 of   273   77%
    ./scripts                        :   341 of   435   78%
    ./security                       :   241 of   250   96%
    ./sound                          :  2438 of  2503   97%
    ./tools                          :  3810 of  5462   69%
    ./usr                            :     9 of    10   90%

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxcheck.py |   67 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 57 insertions(+), 10 deletions(-)

--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -103,9 +103,21 @@ import os
         self.spdx_valid = 0
         self.spdx_errors = 0
         self.spdx_dirs = {}
+        self.dirdepth = -1
+        self.basedir = '.'
         self.curline = 0
         self.deepest = 0
 
+    def set_dirinfo(self, basedir, dirdepth):
+        if dirdepth >= 0:
+            self.basedir = basedir
+            bdir = basedir.lstrip('./').rstrip('/')
+            if bdir != '':
+                parts = bdir.split('/')
+            else:
+                parts = []
+            self.dirdepth = dirdepth + len(parts)
+
     # Validate License and Exception IDs
     def validate(self, tok):
         id = tok.value.upper()
@@ -215,12 +227,29 @@ import os
                 sys.stdout.write('%s: %d:0 %s\n' %(fname, self.curline, pe.txt))
             self.spdx_errors += 1
 
+        if fname == '-':
+            return
+
         base = os.path.dirname(fname)
+        if self.dirdepth > 0:
+            parts = base.split('/')
+            i = 0
+            base = '.'
+            while i < self.dirdepth and i < len(parts) and len(parts[i]):
+                base += '/' + parts[i]
+                i += 1
+        elif self.dirdepth == 0:
+            base = self.basedir
+        else:
+            base = './' + base.rstrip('/')
+        base += '/'
+
         di = self.spdx_dirs.get(base, dirinfo())
         di.update(fail)
         self.spdx_dirs[base] = di
 
-def scan_git_tree(tree):
+def scan_git_tree(tree, basedir, dirdepth):
+    parser.set_dirinfo(basedir, dirdepth)
     for el in tree.traverse():
         # Exclude stuff which would make pointless noise
         # FIXME: Put this somewhere more sensible
@@ -233,15 +262,19 @@ import os
         with open(el.path, 'rb') as fd:
             parser.parse_lines(fd, args.maxlines, el.path)
 
-def scan_git_subtree(tree, path):
+def scan_git_subtree(tree, path, dirdepth):
     for p in path.strip('/').split('/'):
         tree = tree[p]
-    scan_git_tree(tree)
+    scan_git_tree(tree, path.strip('/'), dirdepth)
 
 if __name__ == '__main__':
 
     ap = ArgumentParser(description='SPDX expression checker')
     ap.add_argument('path', nargs='*', help='Check path or file. If not given full git tree scan. For stdin use "-"')
+    ap.add_argument('-d', '--dirs', action='store_true',
+                    help='Show [sub]directory statistics.')
+    ap.add_argument('-D', '--depth', type=int, default=-1,
+                    help='Directory depth for -d statistics. Default: unlimited')
     ap.add_argument('-m', '--maxlines', type=int, default=15,
                     help='Maximum number of lines to scan in a file. Default 15')
     ap.add_argument('-v', '--verbose', action='store_true', help='Verbose statistics output')
@@ -285,13 +318,21 @@ import os
                     if os.path.isfile(p):
                         parser.parse_lines(open(p, 'rb'), args.maxlines, p)
                     elif os.path.isdir(p):
-                        scan_git_subtree(repo.head.reference.commit.tree, p)
+                        scan_git_subtree(repo.head.reference.commit.tree, p,
+                                         args.depth)
                     else:
                         sys.stderr.write('path %s does not exist\n' %p)
                         sys.exit(1)
             else:
                 # Full git tree scan
-                scan_git_tree(repo.head.commit.tree)
+                scan_git_tree(repo.head.commit.tree, '.', args.depth)
+
+            ndirs = len(parser.spdx_dirs)
+            dirsok = 0
+            if ndirs:
+                for di in parser.spdx_dirs.values():
+                    if not di.missing:
+                        dirsok += 1
 
             if args.verbose:
                 sys.stderr.write('\n')
@@ -306,17 +347,23 @@ import os
                     pc = int(100 * parser.spdx_valid / parser.checked)
                     sys.stderr.write('Files with SPDX:   %12d %3d%%\n' %(parser.spdx_valid, pc))
                 sys.stderr.write('Files with errors: %12d\n' %parser.spdx_errors)
-                ndirs = len(parser.spdx_dirs)
-                dirsok = 0
                 if ndirs:
                     sys.stderr.write('\n')
                     sys.stderr.write('Directories accounted: %8d\n' %ndirs)
-                    for di in parser.spdx_dirs.values():
-                        if not di.missing:
-                            dirsok += 1
                     pc = int(100 * dirsok / ndirs)
                     sys.stderr.write('Directories complete:  %8d %3d%%\n' %(dirsok, pc))
 
+            if ndirs and ndirs != dirsok and args.dirs:
+                if args.verbose:
+                    sys.stderr.write('\n')
+                sys.stderr.write('Incomplete directories: SPDX in Files\n')
+                for f in sorted(parser.spdx_dirs.keys()):
+                    di = parser.spdx_dirs[f]
+                    if di.missing:
+                        valid = di.total - di.missing
+                        pc = int(100 * valid / di.total)
+                        sys.stderr.write('    %-80s: %5d of %5d  %3d%%\n' %(f, valid, di.total, pc))
+
             sys.exit(0)
 
     except Exception as ex:


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 4/9] scripts/spdxcheck: Add option to display files without SPDX
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (2 preceding siblings ...)
  2022-05-16 10:27 ` [patch 3/9] scripts/spdxcheck: Add [sub]directory statistics Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 5/9] scripts/spdxcheck: Put excluded files and directories into a separate file Thomas Gleixner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Makes life easier when chasing the missing ones. Is activated with '-f'
on the command line.

# scripts/spdxcheck.py -f kernel/
Files without SPDX:
    ./kernel/cpu.c
    ./kernel/kmod.c
    ./kernel/relay.c
    ./kernel/bpf/offload.c
    ./kernel/bpf/preload/.gitignore
    ./kernel/bpf/preload/iterators/README
    ./kernel/bpf/ringbuf.c
    ./kernel/cgroup/cgroup.c
    ./kernel/cgroup/cpuset.c
    ./kernel/cgroup/legacy_freezer.c
    ./kernel/debug/debug_core.h
    ./kernel/debug/kdb/Makefile
    ./kernel/debug/kdb/kdb_bp.c
    ./kernel/debug/kdb/kdb_bt.c
    ./kernel/debug/kdb/kdb_cmds
    ./kernel/debug/kdb/kdb_debugger.c
    ./kernel/debug/kdb/kdb_io.c
    ./kernel/debug/kdb/kdb_keyboard.c
    ./kernel/debug/kdb/kdb_main.c
    ./kernel/debug/kdb/kdb_private.h
    ./kernel/debug/kdb/kdb_support.c
    ./kernel/locking/lockdep_states.h
    ./kernel/locking/mutex-debug.c
    ./kernel/locking/spinlock_debug.c
    ./kernel/sched/pelt.h

With the optional -D parameter the directory depth can be limited:

# scripts/spdxcheck.py -f -D 0 kernel/
Files without SPDX:
    ./kernel/cpu.c
    ./kernel/kmod.c
    ./kernel/relay.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxcheck.py |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -32,10 +32,16 @@ import os
     def __init__(self):
         self.missing = 0
         self.total = 0
+        self.files = []
 
-    def update(self, miss):
+    def update(self, fname, basedir, miss):
         self.total += 1
         self.missing += miss
+        if miss:
+            fname = './' + fname
+            bdir = os.path.dirname(fname)
+            if bdir == basedir.rstrip('/'):
+                self.files.append(fname)
 
 # Read the spdx data from the LICENSES directory
 def read_spdxdata(repo):
@@ -245,7 +251,7 @@ import os
         base += '/'
 
         di = self.spdx_dirs.get(base, dirinfo())
-        di.update(fail)
+        di.update(fname, base, fail)
         self.spdx_dirs[base] = di
 
 def scan_git_tree(tree, basedir, dirdepth):
@@ -275,6 +281,8 @@ import os
                     help='Show [sub]directory statistics.')
     ap.add_argument('-D', '--depth', type=int, default=-1,
                     help='Directory depth for -d statistics. Default: unlimited')
+    ap.add_argument('-f', '--files', action='store_true',
+                    help='Show files without SPDX.')
     ap.add_argument('-m', '--maxlines', type=int, default=15,
                     help='Maximum number of lines to scan in a file. Default 15')
     ap.add_argument('-v', '--verbose', action='store_true', help='Verbose statistics output')
@@ -364,6 +372,15 @@ import os
                         pc = int(100 * valid / di.total)
                         sys.stderr.write('    %-80s: %5d of %5d  %3d%%\n' %(f, valid, di.total, pc))
 
+            if ndirs and ndirs != dirsok and args.files:
+                if args.verbose or args.dirs:
+                    sys.stderr.write('\n')
+                sys.stderr.write('Files without SPDX:\n')
+                for f in sorted(parser.spdx_dirs.keys()):
+                    di = parser.spdx_dirs[f]
+                    for f in sorted(di.files):
+                        sys.stderr.write('    %s\n' %f)
+
             sys.exit(0)
 
     except Exception as ex:


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 5/9] scripts/spdxcheck: Put excluded files and directories into a separate file
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (3 preceding siblings ...)
  2022-05-16 10:27 ` [patch 4/9] scripts/spdxcheck: Add option to display files without SPDX Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 6/9] scripts/spdxcheck: Exclude config directories Thomas Gleixner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

The files and directories which are excluded from scanning are currently
hard coded in the script. That's not maintainable and not accessible for
external tools.

Move the files and directories which should be excluded into a file.  The
default file is scripts/spdxexclude. This can be overridden with the
'-e $FILE' command line option.

The file format and syntax is similar to the .gitignore file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxcheck.py |   70 ++++++++++++++++++++++++++++++++++++++++++++++-----
 scripts/spdxexclude  |    8 +++++
 2 files changed, 72 insertions(+), 6 deletions(-)

--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -6,6 +6,7 @@ from argparse import ArgumentParser
 from ply import lex, yacc
 import locale
 import traceback
+import fnmatch
 import sys
 import git
 import re
@@ -106,6 +107,7 @@ import os
         self.parser = yacc.yacc(module = self, write_tables = False, debug = False)
         self.lines_checked = 0
         self.checked = 0
+        self.excluded = 0
         self.spdx_valid = 0
         self.spdx_errors = 0
         self.spdx_dirs = {}
@@ -254,17 +256,47 @@ import os
         di.update(fname, base, fail)
         self.spdx_dirs[base] = di
 
+class pattern(object):
+    def __init__(self, line):
+        self.pattern = line
+        self.match = self.match_file
+        if line == '.*':
+            self.match = self.match_dot
+        elif line.endswith('/'):
+            self.pattern = line[:-1]
+            self.match = self.match_dir
+        elif line.startswith('/'):
+            self.pattern = line[1:]
+            self.match = self.match_fn
+
+    def match_dot(self, fpath):
+        return os.path.basename(fpath).startswith('.')
+
+    def match_file(self, fpath):
+        return os.path.basename(fpath) == self.pattern
+
+    def match_fn(self, fpath):
+        return fnmatch.fnmatchcase(fpath, self.pattern)
+
+    def match_dir(self, fpath):
+        if self.match_fn(os.path.dirname(fpath)):
+            return True
+        return fpath.startswith(self.pattern)
+
+def exclude_file(fpath):
+    for rule in exclude_rules:
+        if rule.match(fpath):
+            return True
+    return False
+
 def scan_git_tree(tree, basedir, dirdepth):
     parser.set_dirinfo(basedir, dirdepth)
     for el in tree.traverse():
-        # Exclude stuff which would make pointless noise
-        # FIXME: Put this somewhere more sensible
-        if el.path.startswith("LICENSES"):
-            continue
-        if el.path.find("license-rules.rst") >= 0:
-            continue
         if not os.path.isfile(el.path):
             continue
+        if exclude_file(el.path):
+            parser.excluded += 1
+            continue
         with open(el.path, 'rb') as fd:
             parser.parse_lines(fd, args.maxlines, el.path)
 
@@ -273,6 +305,20 @@ import os
         tree = tree[p]
     scan_git_tree(tree, path.strip('/'), dirdepth)
 
+def read_exclude_file(fname):
+    rules = []
+    if not fname:
+        return rules
+    with open(fname) as fd:
+        for line in fd:
+            line = line.strip()
+            if line.startswith('#'):
+                continue
+            if not len(line):
+                continue
+            rules.append(pattern(line))
+    return rules
+
 if __name__ == '__main__':
 
     ap = ArgumentParser(description='SPDX expression checker')
@@ -281,6 +327,8 @@ import os
                     help='Show [sub]directory statistics.')
     ap.add_argument('-D', '--depth', type=int, default=-1,
                     help='Directory depth for -d statistics. Default: unlimited')
+    ap.add_argument('-e', '--exclude',
+                    help='File containing file patterns to exclude. Default: scripts/spdxexclude')
     ap.add_argument('-f', '--files', action='store_true',
                     help='Show files without SPDX.')
     ap.add_argument('-m', '--maxlines', type=int, default=15,
@@ -317,6 +365,15 @@ import os
         sys.exit(1)
 
     try:
+        fname = args.exclude
+        if not fname:
+            fname = os.path.join(os.path.dirname(__file__), 'spdxexclude')
+        exclude_rules = read_exclude_file(fname)
+    except Exception as ex:
+        sys.stderr.write('FAIL: Reading exclude file %s: %s\n' %(fname, ex))
+        sys.exit(1)
+
+    try:
         if len(args.path) and args.path[0] == '-':
             stdin = os.fdopen(sys.stdin.fileno(), 'rb')
             parser.parse_lines(stdin, args.maxlines, '-')
@@ -349,6 +406,7 @@ import os
                 sys.stderr.write('License IDs        %12d\n' %len(spdx.licenses))
                 sys.stderr.write('Exception IDs      %12d\n' %len(spdx.exceptions))
                 sys.stderr.write('\n')
+                sys.stderr.write('Files excluded:    %12d\n' %parser.excluded)
                 sys.stderr.write('Files checked:     %12d\n' %parser.checked)
                 sys.stderr.write('Lines checked:     %12d\n' %parser.lines_checked)
                 if parser.checked:
--- /dev/null
+++ b/scripts/spdxexclude
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Patterns for excluding files and directories
+
+# Ignore the license directory and the licensing documentation which would
+# create lots of noise for no value
+LICENSES/
+license-rules.rst


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 6/9] scripts/spdxcheck: Exclude config directories
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (4 preceding siblings ...)
  2022-05-16 10:27 ` [patch 5/9] scripts/spdxcheck: Put excluded files and directories into a separate file Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 7/9] scripts/spdxcheck: Exclude MAINTAINERS/CREDITS Thomas Gleixner
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Kernel configuration files like default configs are machine generated and
pretty useless outside of the kernel context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxexclude |    5 +++++
 1 file changed, 5 insertions(+)

--- a/scripts/spdxexclude
+++ b/scripts/spdxexclude
@@ -6,3 +6,8 @@
 # create lots of noise for no value
 LICENSES/
 license-rules.rst
+
+# Ignore config files and snippets. The majority is generated
+# by the Kconfig tools
+kernel/configs/
+arch/*/configs/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 7/9] scripts/spdxcheck: Exclude MAINTAINERS/CREDITS
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (5 preceding siblings ...)
  2022-05-16 10:27 ` [patch 6/9] scripts/spdxcheck: Exclude config directories Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 10:27 ` [patch 8/9] scripts/spdxcheck: Exclude dot files Thomas Gleixner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Listings of maintainers and people who deserve credits are not really
interesting in terms of copyright. The usage of these files outside of the
kernel is pointless and the file format is trivial. No point in chasing
them or slapping a SPDX identifier into them just because.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxexclude |    4 ++++
 1 file changed, 4 insertions(+)

--- a/scripts/spdxexclude
+++ b/scripts/spdxexclude
@@ -11,3 +11,7 @@ license-rules.rst
 # by the Kconfig tools
 kernel/configs/
 arch/*/configs/
+
+# Other files without copyrightable content
+/CREDITS
+/MAINTAINERS


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 8/9] scripts/spdxcheck: Exclude dot files
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (6 preceding siblings ...)
  2022-05-16 10:27 ` [patch 7/9] scripts/spdxcheck: Exclude MAINTAINERS/CREDITS Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 14:22   ` Miguel Ojeda
  2022-05-16 10:27 ` [patch 9/9] scripts/spdxcheck: Exclude top-level README Thomas Gleixner
  2022-05-16 13:14 ` [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Max Mehl
  9 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

None of these files

     .clang-format, .cocciconfig, .get_maintainer.ignore, .gitattributes,
     .gitignore, .mailmap

have copyrightable content. They are configuration files which use a
publicly documented format.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxexclude |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/scripts/spdxexclude
+++ b/scripts/spdxexclude
@@ -2,6 +2,12 @@
 #
 # Patterns for excluding files and directories
 
+# Ignore dot files:
+# .clang-format, .cocciconfig, .get_maintainer.ignore
+# .gitattributes, .gitignore, .mailmap
+# do not really have copyrightable content
+.*
+
 # Ignore the license directory and the licensing documentation which would
 # create lots of noise for no value
 LICENSES/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 9/9] scripts/spdxcheck: Exclude top-level README
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (7 preceding siblings ...)
  2022-05-16 10:27 ` [patch 8/9] scripts/spdxcheck: Exclude dot files Thomas Gleixner
@ 2022-05-16 10:27 ` Thomas Gleixner
  2022-05-16 13:14 ` [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Max Mehl
  9 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 10:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

Nothing copyrightable to see here.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 scripts/spdxexclude |    1 +
 1 file changed, 1 insertion(+)

--- a/scripts/spdxexclude
+++ b/scripts/spdxexclude
@@ -21,3 +21,4 @@ arch/*/configs/
 # Other files without copyrightable content
 /CREDITS
 /MAINTAINERS
+/README


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
                   ` (8 preceding siblings ...)
  2022-05-16 10:27 ` [patch 9/9] scripts/spdxcheck: Exclude top-level README Thomas Gleixner
@ 2022-05-16 13:14 ` Max Mehl
  2022-05-16 18:52   ` Thomas Gleixner
  9 siblings, 1 reply; 20+ messages in thread
From: Max Mehl @ 2022-05-16 13:14 UTC (permalink / raw)
  To: LKML, Thomas Gleixner; +Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

Thank you for picking up the effort to add license (and perhaps also
copyright) info to all files in the Kernel. This, as you know, is also
working towards making the whole repo REUSE compliant [^1].

~ Thomas Gleixner [2022-05-16 12:27 +0200]:
> Finding files without SPDX identifiers is cumbersome with spdxcheck, though
> it has all the information required.
> 
> The exclude of files and directories is hardcoded in the script which makes
> it hard to maintain and the information cannot be accessed by external tools.

Unfortunately, excluding files (i.e. not adding machine-readable
license/copyright information to it) would also block reaching full
compliance with the REUSE best practices. Have you considered making
them available under GPL-2.0-only or a license similar to public domain
[^2]?

Regarding false-positives, e.g. in license-rules.rst, you could use the
brand-new feature that allows to ignore blocks of code (to be released
later this week) [^3]. I am aware that spdxcheck would not be able to
detect this, but using the REUSE helper tool [^4] could also be a
solution to scan for missing files.

Best,
Max


[^1]: https://reuse.software

[^2]: https://reuse.software/faq/#exclude-file

[^3]: https://github.com/fsfe/reuse-docs/pull/104/files

[^4]: https://github.com/fsfe/reuse-tool

-- 
Max Mehl - Programme Manager -- Free Software Foundation Europe
Contact and information: https://fsfe.org/about/mehl -- @mxmehl
The FSFE is a charity that empowers users to control technology

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 8/9] scripts/spdxcheck: Exclude dot files
  2022-05-16 10:27 ` [patch 8/9] scripts/spdxcheck: Exclude dot files Thomas Gleixner
@ 2022-05-16 14:22   ` Miguel Ojeda
  2022-05-16 18:43     ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Miguel Ojeda @ 2022-05-16 14:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

On Mon, May 16, 2022 at 3:55 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> None of these files
>
>      .clang-format, .cocciconfig, .get_maintainer.ignore, .gitattributes,
>      .gitignore, .mailmap
>
> have copyrightable content. They are configuration files which use a
> publicly documented format.

Should this files remove their SPDX-License-Identifier? If yes, we
should do that for `.clang-format`.

As another suggestion, we should check that the ignored files actually
do _not_ have the `SPDX-License-Identifier` (i.e. so the above case
would trigger a diagnostic).

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 8/9] scripts/spdxcheck: Exclude dot files
  2022-05-16 14:22   ` Miguel Ojeda
@ 2022-05-16 18:43     ` Thomas Gleixner
  2022-05-18 13:36       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 18:43 UTC (permalink / raw)
  To: Miguel Ojeda; +Cc: LKML, linux-spdx, Greg Kroah-Hartman, Christoph Hellwig

On Mon, May 16 2022 at 16:22, Miguel Ojeda wrote:
> On Mon, May 16, 2022 at 3:55 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> None of these files
>>
>>      .clang-format, .cocciconfig, .get_maintainer.ignore, .gitattributes,
>>      .gitignore, .mailmap
>>
>> have copyrightable content. They are configuration files which use a
>> publicly documented format.
>
> Should this files remove their SPDX-License-Identifier? If yes, we
> should do that for `.clang-format`.
>
> As another suggestion, we should check that the ignored files actually
> do _not_ have the `SPDX-License-Identifier` (i.e. so the above case
> would trigger a diagnostic).

Good questions. I'm happy to drop this patch for now until this
discussion has been settled.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-16 13:14 ` [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Max Mehl
@ 2022-05-16 18:52   ` Thomas Gleixner
  2022-05-16 18:59     ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 18:52 UTC (permalink / raw)
  To: Max Mehl, LKML; +Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

On Mon, May 16 2022 at 15:14, Max Mehl wrote:
> Thank you for picking up the effort to add license (and perhaps also
> copyright) info to all files in the Kernel.

Adding copyright notices retroactively is not going to happen
ever. That's just impossible.

>> The exclude of files and directories is hardcoded in the script which makes
>> it hard to maintain and the information cannot be accessed by external tools.
>
> Unfortunately, excluding files (i.e. not adding machine-readable
> license/copyright information to it) would also block reaching full
> compliance with the REUSE best practices. Have you considered making
> them available under GPL-2.0-only or a license similar to public domain
> [^2]?

The LICENSE directory is already handled by spdxcheck as the license
information is read from there. And no, we cannot add a GPL-2.0-only
identifier to all of the files under the LICENSE directory for obvious
reasons.

license-rules.rst is not longer a problem as all incarnations have a
proper SPDX identifier today.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-16 18:52   ` Thomas Gleixner
@ 2022-05-16 18:59     ` Thomas Gleixner
  2022-05-17  8:25       ` Max Mehl
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-16 18:59 UTC (permalink / raw)
  To: Max Mehl, LKML; +Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

On Mon, May 16 2022 at 20:52, Thomas Gleixner wrote:

> On Mon, May 16 2022 at 15:14, Max Mehl wrote:
>> Thank you for picking up the effort to add license (and perhaps also
>> copyright) info to all files in the Kernel.
>
> Adding copyright notices retroactively is not going to happen
> ever. That's just impossible.
>
>>> The exclude of files and directories is hardcoded in the script which makes
>>> it hard to maintain and the information cannot be accessed by external tools.
>>
>> Unfortunately, excluding files (i.e. not adding machine-readable
>> license/copyright information to it) would also block reaching full
>> compliance with the REUSE best practices. Have you considered making
>> them available under GPL-2.0-only or a license similar to public domain
>> [^2]?
>
> The LICENSE directory is already handled by spdxcheck as the license
> information is read from there. And no, we cannot add a GPL-2.0-only
> identifier to all of the files under the LICENSE directory for obvious
> reasons.
>
> license-rules.rst is not longer a problem as all incarnations have a
> proper SPDX identifier today.

There is also an argument to be made whether we really need to have SPDX
identifiers on trivial files:

#include <someheader.h>
<EOF>

Such files are not copyrightable by any means. So what's the value of
doubling the line count to add an SPDX identifier? Just to make nice
statistics?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-16 18:59     ` Thomas Gleixner
@ 2022-05-17  8:25       ` Max Mehl
  2022-05-17 21:43         ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Max Mehl @ 2022-05-17  8:25 UTC (permalink / raw)
  To: LKML, Thomas Gleixner; +Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

~ Thomas Gleixner [2022-05-16 20:59 +0200]:
> On Mon, May 16 2022 at 20:52, Thomas Gleixner wrote:
>>> Unfortunately, excluding files (i.e. not adding machine-readable
>>> license/copyright information to it) would also block reaching full
>>> compliance with the REUSE best practices. Have you considered making
>>> them available under GPL-2.0-only or a license similar to public domain
>>> [^2]?
>>
>> The LICENSE directory is already handled by spdxcheck as the license
>> information is read from there. And no, we cannot add a GPL-2.0-only
>> identifier to all of the files under the LICENSE directory for obvious
>> reasons.

Absolutely. REUSE obviously also ignores this directory, as well as
e.g. zero-length files, symlinks, submodules, or .git directory.

> There is also an argument to be made whether we really need to have SPDX
> identifiers on trivial files:
> 
> #include <someheader.h>
> <EOF>
> 
> Such files are not copyrightable by any means. So what's the value of
> doubling the line count to add an SPDX identifier? Just to make nice
> statistics?

We agree that such files are not copyrightable. But where is the
threshold? Lines of code? Creativity? Number of used functions? And how
to embed this threshold in tooling? So instead of fuzzy exclusion of
such files in tools like spdxcheck or REUSE, it makes sense to treat
them as every other file with the cost of adding two comment lines.

This clear-cut rule eases maintaining and growing the effort you and
others did because developers would know exactly what to add to a new
file (license + copyright) without requiring looking up the thresholds
or a manual review by maintainers who can interpret them.

Best,
Max

-- 
Max Mehl - Programme Manager -- Free Software Foundation Europe
Contact and information: https://fsfe.org/about/mehl -- @mxmehl
The FSFE is a charity that empowers users to control technology

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-17  8:25       ` Max Mehl
@ 2022-05-17 21:43         ` Thomas Gleixner
  2022-05-23 16:11           ` J Lovejoy
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-17 21:43 UTC (permalink / raw)
  To: Max Mehl, LKML; +Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

On Tue, May 17 2022 at 10:25, Max Mehl wrote:
> ~ Thomas Gleixner [2022-05-16 20:59 +0200]:
>> There is also an argument to be made whether we really need to have SPDX
>> identifiers on trivial files:
>> 
>> #include <someheader.h>
>> <EOF>
>> 
>> Such files are not copyrightable by any means. So what's the value of
>> doubling the line count to add an SPDX identifier? Just to make nice
>> statistics?
>
> We agree that such files are not copyrightable. But where is the
> threshold? Lines of code? Creativity? Number of used functions? And how
> to embed this threshold in tooling? So instead of fuzzy exclusion of
> such files in tools like spdxcheck or REUSE, it makes sense to treat
> them as every other file with the cost of adding two comment lines.
>
> This clear-cut rule eases maintaining and growing the effort you and
> others did because developers would know exactly what to add to a new
> file (license + copyright) without requiring looking up the thresholds
> or a manual review by maintainers who can interpret them.

Seriously no. I'm outright refusing to add my copyright to a trivial
file with one or two includes or a silly comment like '/* empty because */.

     There is nothing copyrightable there.

I'm not going to make myself a fool just to make tools happy, which can
figure out on their own whether there is reasonable content in the vast
majority of cases.

Also you need some exclude rules in any case. Why?

  - How do you tell a tool that a file is generated, e.g. in the kernel
    the default configuration files?

    Yes, the file content depends on human input to the generator tool,
    but I'm looking forward for the explanation how this is
    copyrightable especially with multiple people updating this file
    over time where some of the updates are just done by invoking the
    generator tool itself.

  - How do you tell a tool that a file contains licensing documentation?

    Go and look what license scanners make out of all the various
    license-rules.rst files.

  - ....

  Do all scanners have to grow heuristics for ignoring the content past
  the topmost SPDX License identifier in certain files or for figuring
  out what might be generated content?

You also might need to add information about binary blobs, which
obviously cannot be part of the binary blobs themself.

The exclude rules I added are lazy and mostly focussed on spdxcheck, but
I'm happy to make them more useful and let them carry information about
the nature of the exclude or morph them into a general scanner info
which also contains binary blob info and other helpful information. But
that needs a larger discussion about the format and rules for such a
file.

That said, I'm all for clear cut rules, but rules just for the rules
sake are almost as bad as no rules at all.

As always you have to apply common sense and look at the bigger picture
and come up with solutions which are practicable, enforcable and useful
for the larger eco-system.

Your goal of having SPDX ids and copyright notices in every file of a
project is honorable, but impractical for various reasons.

See above.

Aside of that you cannot replace a full blown license scanner by REUSE
even if your project is SPDX and Copyright notice clean at the top level
of a file. You still need to verify that there is no other information
in a 'clean' file which might be contradicting or supplemental. You
cannot add all of this functionality to REUSE or whatever.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 8/9] scripts/spdxcheck: Exclude dot files
  2022-05-16 18:43     ` Thomas Gleixner
@ 2022-05-18 13:36       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 20+ messages in thread
From: Greg Kroah-Hartman @ 2022-05-18 13:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Miguel Ojeda, LKML, linux-spdx, Christoph Hellwig

On Mon, May 16, 2022 at 08:43:52PM +0200, Thomas Gleixner wrote:
> On Mon, May 16 2022 at 16:22, Miguel Ojeda wrote:
> > On Mon, May 16, 2022 at 3:55 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >> None of these files
> >>
> >>      .clang-format, .cocciconfig, .get_maintainer.ignore, .gitattributes,
> >>      .gitignore, .mailmap
> >>
> >> have copyrightable content. They are configuration files which use a
> >> publicly documented format.
> >
> > Should this files remove their SPDX-License-Identifier? If yes, we
> > should do that for `.clang-format`.
> >
> > As another suggestion, we should check that the ignored files actually
> > do _not_ have the `SPDX-License-Identifier` (i.e. so the above case
> > would trigger a diagnostic).
> 
> Good questions. I'm happy to drop this patch for now until this
> discussion has been settled.

I've now taken all patches in this series except for this one.

thanks for doing this work,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-17 21:43         ` Thomas Gleixner
@ 2022-05-23 16:11           ` J Lovejoy
  2022-05-23 21:44             ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: J Lovejoy @ 2022-05-23 16:11 UTC (permalink / raw)
  To: Thomas Gleixner, Max Mehl, LKML
  Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx



On 5/17/22 3:43 PM, Thomas Gleixner wrote:
> On Tue, May 17 2022 at 10:25, Max Mehl wrote:
>> ~ Thomas Gleixner [2022-05-16 20:59 +0200]:
>>> There is also an argument to be made whether we really need to have SPDX
>>> identifiers on trivial files:
>>>
>>> #include <someheader.h>
>>> <EOF>
>>>
>>> Such files are not copyrightable by any means. So what's the value of
>>> doubling the line count to add an SPDX identifier? Just to make nice
>>> statistics?
>> We agree that such files are not copyrightable. But where is the
>> threshold? Lines of code? Creativity? Number of used functions? And how
>> to embed this threshold in tooling? So instead of fuzzy exclusion of
>> such files in tools like spdxcheck or REUSE, it makes sense to treat
>> them as every other file with the cost of adding two comment lines.
>>
>> This clear-cut rule eases maintaining and growing the effort you and
>> others did because developers would know exactly what to add to a new
>> file (license + copyright) without requiring looking up the thresholds
>> or a manual review by maintainers who can interpret them.
> Seriously no. I'm outright refusing to add my copyright to a trivial
> file with one or two includes or a silly comment like '/* empty because */.
>
>       There is nothing copyrightable there.
>
> I'm not going to make myself a fool just to make tools happy, which can
> figure out on their own whether there is reasonable content in the vast
> majority of cases.
>
> Also you need some exclude rules in any case. Why?
>
>    - How do you tell a tool that a file is generated, e.g. in the kernel
>      the default configuration files?
>
>      Yes, the file content depends on human input to the generator tool,
>      but I'm looking forward for the explanation how this is
>      copyrightable especially with multiple people updating this file
>      over time where some of the updates are just done by invoking the
>      generator tool itself.
>
>    - How do you tell a tool that a file contains licensing documentation?
>
>      Go and look what license scanners make out of all the various
>      license-rules.rst files.
>
>    - ....
>
>    Do all scanners have to grow heuristics for ignoring the content past
>    the topmost SPDX License identifier in certain files or for figuring
>    out what might be generated content?
>
> You also might need to add information about binary blobs, which
> obviously cannot be part of the binary blobs themself.
>
> The exclude rules I added are lazy and mostly focussed on spdxcheck, but
> I'm happy to make them more useful and let them carry information about
> the nature of the exclude or morph them into a general scanner info
> which also contains binary blob info and other helpful information. But
> that needs a larger discussion about the format and rules for such a
> file.
>
> That said, I'm all for clear cut rules, but rules just for the rules
> sake are almost as bad as no rules at all.
>
> As always you have to apply common sense and look at the bigger picture
> and come up with solutions which are practicable, enforcable and useful
> for the larger eco-system.
>
> Your goal of having SPDX ids and copyright notices in every file of a
> project is honorable, but impractical for various reasons.
>
> See above.
>
> Aside of that you cannot replace a full blown license scanner by REUSE
> even if your project is SPDX and Copyright notice clean at the top level
> of a file. You still need to verify that there is no other information
> in a 'clean' file which might be contradicting or supplemental. You
> cannot add all of this functionality to REUSE or whatever.
>
Max, Thomas,

I think the discussion here is hitting upon the "inconvenience" of the 
lack of black/white rules in the law (as to what is copyrightable) 
versus the convenience of downstream recipients of code who want to be 
sure they have proper rights (which mixes in the guidance/rules of 
Reuse, tooling, etc.).

I think some rules in terms of files that are clearly not copyrightable 
can be implemented in various tooling (hopefully, with the guidance of a 
lawyer steeped in copyright law), and I agree that putting a license (by 
way of an SPDX identifier or any other way for that matter) on such 
files is neither a good use of time nor a good idea (from the 
perspective of being inaccurate as to the need for a license and thus 
sending the wrong impression). That being said, there will not be a way 
to make clear cut rules for everything, without involving a judge. 
Sorry! That's just how the law works (and we actually often don't want 
black/white lines in the law, actually).

I can see a policy of, "when it's not clear (as to copyrightability), 
then add a license", though.

Thanks,
Jilayne


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling
  2022-05-23 16:11           ` J Lovejoy
@ 2022-05-23 21:44             ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2022-05-23 21:44 UTC (permalink / raw)
  To: J Lovejoy, Max Mehl, LKML
  Cc: Greg Kroah-Hartman, Christoph Hellwig, linux-spdx

On Mon, May 23 2022 at 10:11, J. Lovejoy wrote:
> On 5/17/22 3:43 PM, Thomas Gleixner wrote:
> I think the discussion here is hitting upon the "inconvenience" of the 
> lack of black/white rules in the law (as to what is copyrightable) 
> versus the convenience of downstream recipients of code who want to be 
> sure they have proper rights (which mixes in the guidance/rules of 
> Reuse, tooling, etc.).

Correct.

> I think some rules in terms of files that are clearly not copyrightable 
> can be implemented in various tooling (hopefully, with the guidance of a 
> lawyer steeped in copyright law), and I agree that putting a license (by 
> way of an SPDX identifier or any other way for that matter) on such 
> files is neither a good use of time nor a good idea (from the 
> perspective of being inaccurate as to the need for a license and thus 
> sending the wrong impression). That being said, there will not be a way 
> to make clear cut rules for everything, without involving a judge. 
> Sorry! That's just how the law works (and we actually often don't want 
> black/white lines in the law, actually).
>
> I can see a policy of, "when it's not clear (as to copyrightability), 
> then add a license", though.

No argument here, but trivial things like an include which file includes
another include file are pretty clear IMO and we really should make our
mind up on those. Even a header file which contains a single function
declaration is questionable at best, but yes it's hard to put a hard
line on those.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-05-23 21:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 10:27 [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Thomas Gleixner
2022-05-16 10:27 ` [patch 1/9] scripts/spdxcheck: Add percentage to statistics Thomas Gleixner
2022-05-16 10:27 ` [patch 2/9] scripts/spdxcheck: Add directory statistics Thomas Gleixner
2022-05-16 10:27 ` [patch 3/9] scripts/spdxcheck: Add [sub]directory statistics Thomas Gleixner
2022-05-16 10:27 ` [patch 4/9] scripts/spdxcheck: Add option to display files without SPDX Thomas Gleixner
2022-05-16 10:27 ` [patch 5/9] scripts/spdxcheck: Put excluded files and directories into a separate file Thomas Gleixner
2022-05-16 10:27 ` [patch 6/9] scripts/spdxcheck: Exclude config directories Thomas Gleixner
2022-05-16 10:27 ` [patch 7/9] scripts/spdxcheck: Exclude MAINTAINERS/CREDITS Thomas Gleixner
2022-05-16 10:27 ` [patch 8/9] scripts/spdxcheck: Exclude dot files Thomas Gleixner
2022-05-16 14:22   ` Miguel Ojeda
2022-05-16 18:43     ` Thomas Gleixner
2022-05-18 13:36       ` Greg Kroah-Hartman
2022-05-16 10:27 ` [patch 9/9] scripts/spdxcheck: Exclude top-level README Thomas Gleixner
2022-05-16 13:14 ` [patch 0/9] scripts/spdxcheck: Better statistics and exclude handling Max Mehl
2022-05-16 18:52   ` Thomas Gleixner
2022-05-16 18:59     ` Thomas Gleixner
2022-05-17  8:25       ` Max Mehl
2022-05-17 21:43         ` Thomas Gleixner
2022-05-23 16:11           ` J Lovejoy
2022-05-23 21:44             ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).