linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
From: Mrinal Pandey <mrinalmni@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>,
	skhan@linuxfoundation.org,
	Linux-kernel-mentees@lists.linuxfoundation.org,
	mrinalmni@gmail.com
Subject: Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
Date: Tue, 25 Aug 2020 10:23:58 +0530	[thread overview]
Message-ID: <20200825045358.hygtrkm6qxyiwjft@mrinalpandey> (raw)
In-Reply-To: <alpine.DEB.2.21.2008241052310.2932@felia>


[-- Attachment #1.1.1: Type: text/plain, Size: 12170 bytes --]

On 20/08/24 10:54AM, Lukas Bulwahn wrote:
> 
> 
> On Mon, 24 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> > > 
> > > 
> > > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > > 
> > > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > > 
> > > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > > 
> > > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > > isn't found on the line 1.
> > > > > > > > 
> > > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > > when the corresponding rule was first added.
> > > > > > > > > 
> > > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Make this sentence shorter. Try.
> > > > > > > >  
> > > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > > 
> > > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > > first line.
> > > > > > > > > 
> > > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > > 
> > > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > There is no notification here. Think about better wording.
> > > > > > > >  
> > > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > > >  		}
> > > > > > > > >  
> > > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > > +			$checklicenseline = 2;
> > > > > > > > > +		}
> > > > > > > > 
> > > > > > > > That check looks good now.
> > > > > > > > 
> > > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > > >  				$checklicenseline = 2;
> > > > > > > > 
> > > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > > then set checklicenseline to line 2, right?
> > > > > > > 
> > > > > > > Sir,
> > > > > > > 
> > > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > > better heuristic which is going to work for all cases where license
> > > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > > or not.
> > > > > > >
> > > > > > 
> > > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > > 
> > > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > > 
> > > > > > Which commands did you run to check this?
> > > > > >  
> > > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > > interfere with the existing logic.
> > > > > > > 
> > > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > > the patch with the modified commit message.
> > > > > > > 
> > > > > > 
> > > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > > show with some commands (as I did for example, for finding the first 
> > > > > > lines previously) that you can show that it practically makes a 
> > > > > > difference and you can numbers on those differences.
> > > > > > 
> > > > > > When you did that, send a new patch.
> > > > > > 
> > > > > > Lukas
> > > > > > 
> > > > > Sir,
> > > > > 
> > > > > I ran the evaluation as:
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > > #!/bin/bash
> > > > > 
> > > > > for file in $(git ls-files)
> > > > > do
> > > > >         permissions="$(stat -c "%a %n" $file)"
> > > > >         echo "$permissions"
> > > > > done
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > > #!/bin/bash
> > > > > file="executables"
> > > > > while IFS= read -r line
> > > > > do
> > > > >         firstline=`head -n 1 $line`
> > > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > > done <"$file"
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > > 611
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > > 540
> > > > > 
> > > > > We can see that there are 71 files where the executable bit is set but
> > > > > the first line is not a shebang. These include 13 directories which
> > > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > > checking only for the executable bit we can say that license should be
> > > > > on second line, we probably don't need to check for the shebang on line
> > > > > 1.
> > > > > Please let me know if the evaluation makes sense.
> > > > >
> > > 
> > > This evaluation makes sense to find the cases that should be cleaned up.
> > > 
> > > Either the executable flag is simply set wrongly and should be dropped or 
> > > it is actually a script and should get a shebang in the beginning.
> > > 
> > > I actually already started cleaning up. See:
> > > 
> > > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > > 
> > > We can discuss how to continue this cleanup.
> > >
> > 
> > Sir,
> > 
> > Sure. I would love to discuss that.
> > Should I send cleanup related patches directly to lkml or do they need to
> > go through you, Shuah ma'am and mentee list first?
> > 
> > > However, you cannot use this evaluation to say the checking the executable 
> > > bit is enough, simply as from this evaluation, you do not know how many 
> > > files have a shebang in the first line and are not set with executable 
> > > flag. For those files, we expect the SPDX identifier in the second line 
> > > but your suggested approach would expect it in the first line.
> > > 
> > > So we need an evaluation to find out how many cases of that situation 
> > > exist?
> > > 
> > > Then, can we easily fix those as well?
> > 
> > Here is the evaluation I ran to find such cases:
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_lines.sh
> > #!/bin/bash
> > 
> > for file in $(git ls-files)
> > do
> > 	permissions="$(stat -c '%a %n' $file)"
> > 	firstline="$(head -n 1 $file)"
> > 	echo "$permissions : $firstline"
> > done
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
> > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > 82
> > 
> > The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
> > where we have shebang in the first line but the file is non-executable.
> >
> 
> How about removing the symbolic link files and then sharing that list?

Sir,

I couldn't find any symbolic links in these 81 files. The list is
attached herewith.
The list is in the format:
<permissions> <file-name> : <first-line>

Thank you.
> 
> Lukas

[-- Attachment #1.1.2: make_executable --]
[-- Type: text/plain, Size: 4304 bytes --]

664 Documentation/admin-guide/aoe/autoload.sh : #!/bin/sh
664 Documentation/admin-guide/aoe/status.sh : #! /bin/sh
664 Documentation/arm64/kasan-offsets.sh : #!/bin/sh
664 Documentation/firmware_class/hotplug-script : #!/bin/sh
664 Documentation/networking/device_drivers/atm/cxacru-cf.py : #!/usr/bin/env python
664 Documentation/s390/config3270.sh : #!/bin/sh
664 Documentation/sphinx/parallel-wrapper.sh : #!/bin/sh
664 Documentation/trace/postprocess/decode_msr.py : #!/usr/bin/python
664 Documentation/trace/postprocess/trace-pagealloc-postprocess.pl : #!/usr/bin/perl
664 Documentation/trace/postprocess/trace-vmscan-postprocess.pl : #!/usr/bin/perl
664 arch/alpha/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/alpha/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/arm/boot/install.sh : #!/bin/sh
664 arch/arm/crypto/poly1305-armv4.pl : #!/usr/bin/env perl
664 arch/arm/crypto/sha256-armv4.pl : #!/usr/bin/env perl
664 arch/arm/crypto/sha512-armv4.pl : #!/usr/bin/env perl
664 arch/arm/tools/gen-mach-types : #!/bin/awk
664 arch/arm/tools/syscallhdr.sh : #!/bin/sh
664 arch/arm/tools/syscallnr.sh : #!/bin/sh
664 arch/arm/tools/syscalltbl.sh : #!/bin/sh
664 arch/arm64/boot/install.sh : #!/bin/sh
664 arch/arm64/crypto/poly1305-armv8.pl : #!/usr/bin/env perl
664 arch/arm64/crypto/sha512-armv8.pl : #! /usr/bin/env perl
664 arch/ia64/install.sh : #!/bin/sh
664 arch/ia64/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/ia64/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/ia64/scripts/unwcheck.py : #!/usr/bin/python
664 arch/m68k/install.sh : #!/bin/sh
664 arch/m68k/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/m68k/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/microblaze/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/microblaze/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/mips/crypto/poly1305-mips.pl : #!/usr/bin/env perl
664 arch/mips/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/mips/kernel/syscalls/syscallnr.sh : #!/bin/sh
664 arch/mips/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/nios2/boot/install.sh : #!/bin/sh
664 arch/parisc/boot/install.sh : #!/bin/sh
664 arch/parisc/install.sh : #!/bin/sh
664 arch/parisc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/parisc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/parisc/nm : #!/bin/sh
664 arch/powerpc/boot/install.sh : #!/bin/sh
664 arch/powerpc/kernel/prom_init_check.sh : #!/bin/sh
664 arch/powerpc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/powerpc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/powerpc/kernel/systbl_chk.sh : #!/bin/sh
664 arch/riscv/boot/install.sh : #!/bin/sh
664 arch/s390/boot/install.sh : #!/bin/sh
664 arch/sh/boot/compressed/install.sh : #!/bin/sh
664 arch/sh/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/sh/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/sh/tools/gen-mach-types : #!/bin/awk
664 arch/sparc/boot/install.sh : #!/bin/sh
664 arch/sparc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/sparc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/sparc/vdso/checkundef.sh : #!/bin/sh
664 arch/x86/boot/genimage.sh : #!/bin/sh
664 arch/x86/boot/install.sh : #!/bin/sh
664 arch/x86/crypto/poly1305-x86_64-cryptogams.pl : #!/usr/bin/env perl
664 arch/x86/entry/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/x86/entry/syscalls/syscalltbl.sh : #!/bin/bash
664 arch/x86/kernel/cpu/mkcapflags.sh : #!/bin/sh
664 arch/x86/tools/gen-insn-attr-x86.awk : #!/bin/awk -f
664 arch/x86/tools/objdump_reformat.awk : #!/bin/awk -f
664 arch/x86/um/vdso/checkundef.sh : #!/bin/sh
664 arch/xtensa/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/xtensa/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 drivers/block/paride/mkd : #!/bin/bash
664 drivers/crypto/vmx/aesp8-ppc.pl : #! /usr/bin/env perl
664 drivers/crypto/vmx/ghashp8-ppc.pl : #!/usr/bin/env perl
664 drivers/crypto/vmx/ppc-xlate.pl : #!/usr/bin/env perl
664 drivers/scsi/script_asm.pl : #!/usr/bin/perl -s
664 drivers/usb/serial/ezusb_convert.pl : #! /usr/bin/perl -w
664 samples/bpf/lwt_len_hist.sh : #!/bin/bash
664 samples/bpf/test_lwt_bpf.sh : #!/bin/bash
664 scripts/atomic/gen-atomics.sh : #!/bin/sh
664 scripts/gcc-plugins/gen-random-seed.sh : #!/bin/sh
664 scripts/nsdeps : #!/bin/sh
664 scripts/spdxcheck-test.sh : #!/bin/sh
664 scripts/xen-hypercalls.sh : #!/bin/sh

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2020-08-25  4:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03  7:58 [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files Mrinal Pandey
2020-08-03 10:59 ` Lukas Bulwahn
2020-08-04 15:56   ` Mrinal Pandey
2020-08-04 19:37     ` Lukas Bulwahn
2020-08-09  7:22       ` Mrinal Pandey
2020-08-20  4:42         ` Mrinal Pandey
2020-08-22  8:24           ` Lukas Bulwahn
2020-08-22 19:25             ` Lukas Bulwahn
2020-08-24  8:35               ` Mrinal Pandey
2020-08-24  8:56                 ` Lukas Bulwahn
2020-08-25  4:56                   ` Mrinal Pandey
2020-08-25  5:54                     ` Lukas Bulwahn
2020-08-24  8:23             ` Mrinal Pandey
2020-08-24  8:54               ` Lukas Bulwahn
2020-08-25  4:53                 ` Mrinal Pandey [this message]
2020-09-06  4:27                   ` Mrinal Pandey
2020-09-07  7:10                     ` Lukas Bulwahn
2020-09-11 10:16                       ` Mrinal Pandey
2020-09-11 10:33                         ` Lukas Bulwahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825045358.hygtrkm6qxyiwjft@mrinalpandey \
    --to=mrinalmni@gmail.com \
    --cc=Linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).