linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
To: Mrinal Pandey <mrinalmni@gmail.com>
Cc: Linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [Linux-kernel-mentees] [PATCH] checkpatch: Fix SPDX license check for scripts
Date: Fri, 24 Jul 2020 10:09:25 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.2007240956440.10209@felia> (raw)
In-Reply-To: <CAD1=X6mvdYOfGg__UNHKAskf8Oe17p9V2Rj8jEaRY1CbstoTCw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6305 bytes --]



On Fri, 24 Jul 2020, Mrinal Pandey wrote:

> 
> 
> On Wed, Jul 22, 2020 at 3:51 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> 
> 
>       On Tue, 21 Jul 2020, Mrinal Pandey wrote:
> 
>       > In all the scripts, the SPDX license should be on the second line,
>       > the first line being the shebang, but checkpatch issues a warning
>       > "Misplaced SPDX-License-Identifier tag - use line 1 instead" for the
>       > scripts that have SPDX license in the second line.
>       >
> 
>       That what you wrote is actually wrong. Checkpatch only wrongly warns if
>       the first line is not included in the diff, right? Otherwise, it does not
>       warn as intended.
> 
> 
> Sir,
> 
> Yes. I would modify the description accordingly.
> 
>       > However, this warning is not issued when checkpatch is run on a file.
>       > The case for files has been handled gracefully by checking first line of
>       > the file to be a shebang and then setting `$checklicenseline` to `2`but
>       > this doesn't work when we don't have shebang in diff content of a patch
>       > and `$checklicenseline` continues to be `1` in such cases. Therefore,
>       > checkpatch expects the line `1` to contain the SPDX license when it
>       > should have been `2` instead.
>       >
> 
>       It is described very complicated here. Maybe think about rephrasing it, so
>       it becomes more clear.
> 
> 
>       > I noticed this false positive while running checkpatch on the set of
>       > commits from v5.7 to v5.8-rc1 of the kernel on the commits which modified
>       > a script file.
>       >
> 
>       How about naming the commits and providing a statistics how often that
>       happens?
> 
>       > This false positive exists in checkpatch since commit a8da38a9cf0e
>       > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
>       > when the corresponding rule was first commited.
>       >
>       > Fix this by setting `$checklicenseline` to `2` whenever the file or diff
>       > content we are checking comes from a script instead of checking first
>       > line to be a shebang, thus, informing checkpatch that the SPDX license
>       > should be expected on the second line.
>       >
> 
>       Well, now the critical part:
> 
>       It seems that this now makes the situation worse compared to before.
> 
>       I quickly ran this evaluation:
> 
>       $ cat first-line.sh
>       #!/bin/bash
> 
>       for file in $(git ls-files)
>       do
>               firstline="$(head --silent -n 1 $file)"
>               echo "$file: $firstline"
>       done
> 
>       $ sh ./first-line.sh > first-lines
> 
>       $ grep "#!" first-lines | wc -l
>       810
> 
>       $ grep "#!" first-lines | sed 's/^\(.*[^\]*\):.*$/\1/' | grep -v "\."
> 
>       $ grep "#!" first-lines | sed 's/^\(.*[^\]*\):.*$/\1/' | grep -v "\." | wc
>       -l
>       120
> 
>       $ grep "#!" first-lines | sed 's/^\(.*[^\]*\):.*$/\1/' | rev | cut -d"/"
>       -f1 | rev | grep  "\." | cut -d"." -f2 | sort | uniq -c | sort -nr
>           509 sh
>            91 tc
>            45 pl
>            37 py
>             6 awk
>             1 config
> 
>       So, in 120 cases you actually would now warn on line 1 where the should be
>       a shebang as intended.
> 
>       Also, I cross-checked .yaml files:
> 
>       $ grep "\.yaml:" first-lines | grep "SPDX-License-Identifier"  | wc -l
>       1035
> 
>       $ grep "\.yaml:" first-lines | wc -l
>       1037
> 
> 
>       So, for yaml, SPDX is actually in line 1.
> 
> 
> Yes, I should remove the `yaml` files from this check.
> 
> 
>       I guess you need to think about this a bit more.
> 
> 
> If I am correct now we have an issue with all those files which are scripts without an extension.

Yes, correct.

> In this case, the earlier logic which I removed was working fine(now I get an idea of why it was like
> that in the first place). I think the best way is to keep both my suggested change and the logic that
> already exists so that we can at least avoid those false positives when we know the extension without
> essentially duplicating the logic.

You need to combine the previous heuristic with your further extension, 
making it overall better and try to keep the logic checking as simple as 
possible.

If the first line is there and a shebang, use it to make a decision. If 
the first line is not available, use another heuristic.

> Checking for files with no extension, no shebang and only SPDX in diff is tricky since we can't actually open the file.

You cannot open the file, but if a first line is available, you need to 
use that information.

I suggest that you evaluate your patch on a set of use cases with a clear 
report what was reported before and after applying your patch.

Lukas


> Please let me know your views.
>  
> Thank you.
> 
> 
>       > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
>       > ---
>       >  scripts/checkpatch.pl | 7 ++++---
>       >  1 file changed, 4 insertions(+), 3 deletions(-)
>       >
>       > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>       > index 4c820607540b..bdd2f9a80891 100755
>       > --- a/scripts/checkpatch.pl
>       > +++ b/scripts/checkpatch.pl
>       > @@ -3166,10 +3166,11 @@ sub process {
>       >               }
>       > 
>       >  # check for using SPDX license tag at beginning of files
>       > +             if ($realfile =~ /.*\.\(py\|sh\|pl\|awk\|tc\|yaml\)/) {
>       > +                     $checklicenseline = 2;
>       > +             }
>       >               if ($realline == $checklicenseline) {
>       > -                     if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
>       > -                             $checklicenseline = 2;
>       > -                     } elsif ($rawline =~ /^\+/) {
>       > +                     if ($rawline =~ /^\+/) {
>       >                               my $comment = "";
>       >                               if ($realfile =~ /\.(h|s|S)$/) {
>       >                                       $comment = '/*';
>       > --
>       > 2.25.1
>       >
>       >
> 
> 
> 

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2020-07-24  8:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  5:44 [Linux-kernel-mentees] [PATCH] checkpatch: Fix SPDX license check for scripts Mrinal Pandey
2020-07-22 10:20 ` Lukas Bulwahn
2020-07-24  7:02   ` Mrinal Pandey
2020-07-24  8:09     ` Lukas Bulwahn [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-07-13  9:57 Mrinal Pandey
2020-07-13 19:46 ` Lukas Bulwahn
2020-07-14  5:35   ` Mrinal Pandey
2020-07-14  6:03     ` Lukas Bulwahn
2020-07-16  5:15       ` Mrinal Pandey
2020-07-16  5:31         ` Lukas Bulwahn
2020-07-17  9:54           ` Mrinal Pandey
2020-07-17 11:47             ` Lukas Bulwahn
2020-07-19  6:27               ` Mrinal Pandey
2020-07-19  7:13                 ` Lukas Bulwahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.2007240956440.10209@felia \
    --to=lukas.bulwahn@gmail.com \
    --cc=Linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=mrinalmni@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).