linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
@ 2020-08-03  7:58 Mrinal Pandey
  2020-08-03 10:59 ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-03  7:58 UTC (permalink / raw)
  To: lukas.bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1: Type: text/plain, Size: 2491 bytes --]

The diff content includes the SPDX licensing information but excludes the
shebang when a change is made to a script file in commit 37f8173dd849
("locking/atomics: Flip fallbacks and  instrumentation") and commit
075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
test"). In these cases checkpatch issues a false positive warning:
"Misplaced SPDX-License-Identifier tag - use line 1 instead".

Currently, if checkpatch finds a shebang in line 1, it expects the
license identifier in line 2. However, this doesn't work when a shebang
isn't found on the line 1.

I noticed this false positive, while running checkpatch on the set of
commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
This false positive exists in checkpatch since commit a8da38a9cf0e
("checkpatch: add test for SPDX-License-Identifier on wrong line #")
when the corresponding rule was first added.

The alternatives considered to improve this check were looking the file
to be a script by either examining the file extension or file permissions.

The evaluation on former option resulted in 120 files which had a shebang
in the first line but no file extension. This didn't look like a promising
result and hence I dropped the idea of using this approach.

The evaluation on the latter approach shows that there are 53 files in the
kernel which have an executable bit set but don't have a shebang in the
first line.

At the first sight on these 53 files, it seems that they either have a
wrong file permission set or could be reasonably extended with a shebang
and SPDX license information. Thus, further cleanup in the repository
would make the latter approach to work even more precisely.

Hence, I chose to check the file permissions to determine if the file is a
script and notify checkpatch to expect SPDX on second line for such files.

Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
---
 scripts/checkpatch.pl | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 4c820607540b..bae1dd824518 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3166,6 +3166,9 @@ sub process {
 		}
 
 # check for using SPDX license tag at beginning of files
+		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
+			$checklicenseline = 2;
+		}
 		if ($realline == $checklicenseline) {
 			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
 				$checklicenseline = 2;
-- 
2.25.1


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-03  7:58 [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files Mrinal Pandey
@ 2020-08-03 10:59 ` Lukas Bulwahn
  2020-08-04 15:56   ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-03 10:59 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees



On Mon, 3 Aug 2020, Mrinal Pandey wrote:

> The diff content includes the SPDX licensing information but excludes the
> shebang when a change is made to a script file in commit 37f8173dd849
> ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> test"). In these cases checkpatch issues a false positive warning:
> "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> 
> Currently, if checkpatch finds a shebang in line 1, it expects the
> license identifier in line 2. However, this doesn't work when a shebang
> isn't found on the line 1.

It does not work when the diff does not contain line 1, but only line 2,
because then the shebang check for line 1 cannot work.

> 
> I noticed this false positive, while running checkpatch on the set of
> commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> This false positive exists in checkpatch since commit a8da38a9cf0e
> ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> when the corresponding rule was first added.
> 
> The alternatives considered to improve this check were looking the file
> to be a script by either examining the file extension or file permissions.
>

Make this sentence shorter. Try.
 
> The evaluation on former option resulted in 120 files which had a shebang
> in the first line but no file extension. This didn't look like a promising
> result and hence I dropped the idea of using this approach.
> 
> The evaluation on the latter approach shows that there are 53 files in the
> kernel which have an executable bit set but don't have a shebang in the
> first line.
> 
> At the first sight on these 53 files, it seems that they either have a
> wrong file permission set or could be reasonably extended with a shebang
> and SPDX license information. Thus, further cleanup in the repository
> would make the latter approach to work even more precisely.
> 
> Hence, I chose to check the file permissions to determine if the file is a
> script and notify checkpatch to expect SPDX on second line for such files.
>

There is no notification here. Think about better wording.
 
> Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> ---
>  scripts/checkpatch.pl | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 4c820607540b..bae1dd824518 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3166,6 +3166,9 @@ sub process {
>  		}
>  
>  # check for using SPDX license tag at beginning of files
> +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> +			$checklicenseline = 2;
> +		}

That check looks good now.

>  		if ($realline == $checklicenseline) {
>  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
>  				$checklicenseline = 2;

This is probably broken now. It should check for shebang in line 1 and 
then set checklicenseline to line 2, right?

> -- 
> 2.25.1
> 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-03 10:59 ` Lukas Bulwahn
@ 2020-08-04 15:56   ` Mrinal Pandey
  2020-08-04 19:37     ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-04 15:56 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 3882 bytes --]

On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> 
> 
> On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> 
> > The diff content includes the SPDX licensing information but excludes the
> > shebang when a change is made to a script file in commit 37f8173dd849
> > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > test"). In these cases checkpatch issues a false positive warning:
> > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > 
> > Currently, if checkpatch finds a shebang in line 1, it expects the
> > license identifier in line 2. However, this doesn't work when a shebang
> > isn't found on the line 1.
> 
> It does not work when the diff does not contain line 1, but only line 2,
> because then the shebang check for line 1 cannot work.
> 
> > 
> > I noticed this false positive, while running checkpatch on the set of
> > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > This false positive exists in checkpatch since commit a8da38a9cf0e
> > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > when the corresponding rule was first added.
> > 
> > The alternatives considered to improve this check were looking the file
> > to be a script by either examining the file extension or file permissions.
> >
> 
> Make this sentence shorter. Try.
>  
> > The evaluation on former option resulted in 120 files which had a shebang
> > in the first line but no file extension. This didn't look like a promising
> > result and hence I dropped the idea of using this approach.
> > 
> > The evaluation on the latter approach shows that there are 53 files in the
> > kernel which have an executable bit set but don't have a shebang in the
> > first line.
> > 
> > At the first sight on these 53 files, it seems that they either have a
> > wrong file permission set or could be reasonably extended with a shebang
> > and SPDX license information. Thus, further cleanup in the repository
> > would make the latter approach to work even more precisely.
> > 
> > Hence, I chose to check the file permissions to determine if the file is a
> > script and notify checkpatch to expect SPDX on second line for such files.
> >
> 
> There is no notification here. Think about better wording.
>  
> > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > ---
> >  scripts/checkpatch.pl | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 4c820607540b..bae1dd824518 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -3166,6 +3166,9 @@ sub process {
> >  		}
> >  
> >  # check for using SPDX license tag at beginning of files
> > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > +			$checklicenseline = 2;
> > +		}
> 
> That check looks good now.
> 
> >  		if ($realline == $checklicenseline) {
> >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> >  				$checklicenseline = 2;
> 
> This is probably broken now. It should check for shebang in line 1 and 
> then set checklicenseline to line 2, right?

Sir,

Should we remove this check? Earlier when I checked for file extension
we had 120 cases where this check was also needed but now we have a
better heuristic which is going to work for all cases where license
should be on line 2 irrespective of the fact that we know the first line
or not.

If I am missing out on something and we should not be removing this check,
then I suggest placing the new heuristics below this block so that it doesn't
interfere with the existing logic.

Please let me know which path should I go about and then I shall resend
the patch with the modified commit message.

Thank you.
> 
> > -- 
> > 2.25.1
> > 
> > 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-04 15:56   ` Mrinal Pandey
@ 2020-08-04 19:37     ` Lukas Bulwahn
  2020-08-09  7:22       ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-04 19:37 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees



On Tue, 4 Aug 2020, Mrinal Pandey wrote:

> On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > 
> > 
> > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > 
> > > The diff content includes the SPDX licensing information but excludes the
> > > shebang when a change is made to a script file in commit 37f8173dd849
> > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > test"). In these cases checkpatch issues a false positive warning:
> > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > 
> > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > license identifier in line 2. However, this doesn't work when a shebang
> > > isn't found on the line 1.
> > 
> > It does not work when the diff does not contain line 1, but only line 2,
> > because then the shebang check for line 1 cannot work.
> > 
> > > 
> > > I noticed this false positive, while running checkpatch on the set of
> > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > when the corresponding rule was first added.
> > > 
> > > The alternatives considered to improve this check were looking the file
> > > to be a script by either examining the file extension or file permissions.
> > >
> > 
> > Make this sentence shorter. Try.
> >  
> > > The evaluation on former option resulted in 120 files which had a shebang
> > > in the first line but no file extension. This didn't look like a promising
> > > result and hence I dropped the idea of using this approach.
> > > 
> > > The evaluation on the latter approach shows that there are 53 files in the
> > > kernel which have an executable bit set but don't have a shebang in the
> > > first line.
> > > 
> > > At the first sight on these 53 files, it seems that they either have a
> > > wrong file permission set or could be reasonably extended with a shebang
> > > and SPDX license information. Thus, further cleanup in the repository
> > > would make the latter approach to work even more precisely.
> > > 
> > > Hence, I chose to check the file permissions to determine if the file is a
> > > script and notify checkpatch to expect SPDX on second line for such files.
> > >
> > 
> > There is no notification here. Think about better wording.
> >  
> > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > ---
> > >  scripts/checkpatch.pl | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 4c820607540b..bae1dd824518 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3166,6 +3166,9 @@ sub process {
> > >  		}
> > >  
> > >  # check for using SPDX license tag at beginning of files
> > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > +			$checklicenseline = 2;
> > > +		}
> > 
> > That check looks good now.
> > 
> > >  		if ($realline == $checklicenseline) {
> > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > >  				$checklicenseline = 2;
> > 
> > This is probably broken now. It should check for shebang in line 1 and 
> > then set checklicenseline to line 2, right?
> 
> Sir,
> 
> Should we remove this check? Earlier when I checked for file extension
> we had 120 cases where this check was also needed but now we have a
> better heuristic which is going to work for all cases where license
> should be on line 2 irrespective of the fact that we know the first line
> or not.
>

Are you sure about that? Where is the evaluation that proves your point?

E.g., are all files that contain a shebang really with an executable flag?

Which commands did you run to check this?
 
> If I am missing out on something and we should not be removing this check,
> then I suggest placing the new heuristics below this block so that it doesn't
> interfere with the existing logic.
> 
> Please let me know which path should I go about and then I shall resend
> the patch with the modified commit message.
> 

Think about the strengths and weaknesses of the potential solutions, then 
show with some commands (as I did for example, for finding the first 
lines previously) that you can show that it practically makes a 
difference and you can numbers on those differences.

When you did that, send a new patch.

Lukas

> Thank you. > > 
> > > -- 
> > > 2.25.1
> > > 
> > > 
> 
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-04 19:37     ` Lukas Bulwahn
@ 2020-08-09  7:22       ` Mrinal Pandey
  2020-08-20  4:42         ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-09  7:22 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 7181 bytes --]

On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> 
> 
> On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > 
> > > 
> > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > 
> > > > The diff content includes the SPDX licensing information but excludes the
> > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > test"). In these cases checkpatch issues a false positive warning:
> > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > 
> > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > isn't found on the line 1.
> > > 
> > > It does not work when the diff does not contain line 1, but only line 2,
> > > because then the shebang check for line 1 cannot work.
> > > 
> > > > 
> > > > I noticed this false positive, while running checkpatch on the set of
> > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > when the corresponding rule was first added.
> > > > 
> > > > The alternatives considered to improve this check were looking the file
> > > > to be a script by either examining the file extension or file permissions.
> > > >
> > > 
> > > Make this sentence shorter. Try.
> > >  
> > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > in the first line but no file extension. This didn't look like a promising
> > > > result and hence I dropped the idea of using this approach.
> > > > 
> > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > kernel which have an executable bit set but don't have a shebang in the
> > > > first line.
> > > > 
> > > > At the first sight on these 53 files, it seems that they either have a
> > > > wrong file permission set or could be reasonably extended with a shebang
> > > > and SPDX license information. Thus, further cleanup in the repository
> > > > would make the latter approach to work even more precisely.
> > > > 
> > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > >
> > > 
> > > There is no notification here. Think about better wording.
> > >  
> > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > ---
> > > >  scripts/checkpatch.pl | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > index 4c820607540b..bae1dd824518 100755
> > > > --- a/scripts/checkpatch.pl
> > > > +++ b/scripts/checkpatch.pl
> > > > @@ -3166,6 +3166,9 @@ sub process {
> > > >  		}
> > > >  
> > > >  # check for using SPDX license tag at beginning of files
> > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > +			$checklicenseline = 2;
> > > > +		}
> > > 
> > > That check looks good now.
> > > 
> > > >  		if ($realline == $checklicenseline) {
> > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > >  				$checklicenseline = 2;
> > > 
> > > This is probably broken now. It should check for shebang in line 1 and 
> > > then set checklicenseline to line 2, right?
> > 
> > Sir,
> > 
> > Should we remove this check? Earlier when I checked for file extension
> > we had 120 cases where this check was also needed but now we have a
> > better heuristic which is going to work for all cases where license
> > should be on line 2 irrespective of the fact that we know the first line
> > or not.
> >
> 
> Are you sure about that? Where is the evaluation that proves your point?
> 
> E.g., are all files that contain a shebang really with an executable flag?
> 
> Which commands did you run to check this?
>  
> > If I am missing out on something and we should not be removing this check,
> > then I suggest placing the new heuristics below this block so that it doesn't
> > interfere with the existing logic.
> > 
> > Please let me know which path should I go about and then I shall resend
> > the patch with the modified commit message.
> > 
> 
> Think about the strengths and weaknesses of the potential solutions, then 
> show with some commands (as I did for example, for finding the first 
> lines previously) that you can show that it practically makes a 
> difference and you can numbers on those differences.
> 
> When you did that, send a new patch.
> 
> Lukas
> 
Sir,

I ran the evaluation as:

mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
#!/bin/bash

for file in $(git ls-files)
do
        permissions="$(stat -c "%a %n" $file)"
        echo "$permissions"
done

mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp

mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables

mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
#!/bin/bash
file="executables"
while IFS= read -r line
do
        firstline=`head -n 1 $line`
        printf '%s:%s\n' "$firstline" "$line"
done <"$file"

mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
611

mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
540

We can see that there are 71 files where the executable bit is set but
the first line is not a shebang. These include 13 directories which
throw the error above. Remaining 58 files(earlier the number was 53)
could be cleaned so that this heuristic works better as we saw. So, by
checking only for the executable bit we can say that license should be
on second line, we probably don't need to check for the shebang on line
1.
Please let me know if the evaluation makes sense.

Thank you.

> > Thank you. > > 
> > > > -- 
> > > > 2.25.1
> > > > 
> > > > 
> > 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-09  7:22       ` Mrinal Pandey
@ 2020-08-20  4:42         ` Mrinal Pandey
  2020-08-22  8:24           ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-20  4:42 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 7766 bytes --]

On 20/08/09 12:52PM, Mrinal Pandey wrote:
> On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > 
> > 
> > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > 
> > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > 
> > > > 
> > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > 
> > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > isn't found on the line 1.
> > > > 
> > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > because then the shebang check for line 1 cannot work.
> > > > 
> > > > > 
> > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > when the corresponding rule was first added.
> > > > > 
> > > > > The alternatives considered to improve this check were looking the file
> > > > > to be a script by either examining the file extension or file permissions.
> > > > >
> > > > 
> > > > Make this sentence shorter. Try.
> > > >  
> > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > in the first line but no file extension. This didn't look like a promising
> > > > > result and hence I dropped the idea of using this approach.
> > > > > 
> > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > first line.
> > > > > 
> > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > would make the latter approach to work even more precisely.
> > > > > 
> > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > >
> > > > 
> > > > There is no notification here. Think about better wording.
> > > >  
> > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > ---
> > > > >  scripts/checkpatch.pl | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > index 4c820607540b..bae1dd824518 100755
> > > > > --- a/scripts/checkpatch.pl
> > > > > +++ b/scripts/checkpatch.pl
> > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > >  		}
> > > > >  
> > > > >  # check for using SPDX license tag at beginning of files
> > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > +			$checklicenseline = 2;
> > > > > +		}
> > > > 
> > > > That check looks good now.
> > > > 
> > > > >  		if ($realline == $checklicenseline) {
> > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > >  				$checklicenseline = 2;
> > > > 
> > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > then set checklicenseline to line 2, right?
> > > 
> > > Sir,
> > > 
> > > Should we remove this check? Earlier when I checked for file extension
> > > we had 120 cases where this check was also needed but now we have a
> > > better heuristic which is going to work for all cases where license
> > > should be on line 2 irrespective of the fact that we know the first line
> > > or not.
> > >
> > 
> > Are you sure about that? Where is the evaluation that proves your point?
> > 
> > E.g., are all files that contain a shebang really with an executable flag?
> > 
> > Which commands did you run to check this?
> >  
> > > If I am missing out on something and we should not be removing this check,
> > > then I suggest placing the new heuristics below this block so that it doesn't
> > > interfere with the existing logic.
> > > 
> > > Please let me know which path should I go about and then I shall resend
> > > the patch with the modified commit message.
> > > 
> > 
> > Think about the strengths and weaknesses of the potential solutions, then 
> > show with some commands (as I did for example, for finding the first 
> > lines previously) that you can show that it practically makes a 
> > difference and you can numbers on those differences.
> > 
> > When you did that, send a new patch.
> > 
> > Lukas
> > 
> Sir,
> 
> I ran the evaluation as:
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> #!/bin/bash
> 
> for file in $(git ls-files)
> do
>         permissions="$(stat -c "%a %n" $file)"
>         echo "$permissions"
> done
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> #!/bin/bash
> file="executables"
> while IFS= read -r line
> do
>         firstline=`head -n 1 $line`
>         printf '%s:%s\n' "$firstline" "$line"
> done <"$file"
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> 611
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> 540
> 
> We can see that there are 71 files where the executable bit is set but
> the first line is not a shebang. These include 13 directories which
> throw the error above. Remaining 58 files(earlier the number was 53)
> could be cleaned so that this heuristic works better as we saw. So, by
> checking only for the executable bit we can say that license should be
> on second line, we probably don't need to check for the shebang on line
> 1.
> Please let me know if the evaluation makes sense.
> 
Sir,

Did you get the time to see this evaluation? Your comments would help me
to proceed with this particular patch.
Please let me know about this as time permits.

Thank you.

> Thank you.
> 
> > > Thank you. > > 
> > > > > -- 
> > > > > 2.25.1
> > > > > 
> > > > > 
> > > 



[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-20  4:42         ` Mrinal Pandey
@ 2020-08-22  8:24           ` Lukas Bulwahn
  2020-08-22 19:25             ` Lukas Bulwahn
  2020-08-24  8:23             ` Mrinal Pandey
  0 siblings, 2 replies; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-22  8:24 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees



On Thu, 20 Aug 2020, Mrinal Pandey wrote:

> On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > 
> > > 
> > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > 
> > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > 
> > > > > 
> > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > 
> > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > 
> > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > isn't found on the line 1.
> > > > > 
> > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > because then the shebang check for line 1 cannot work.
> > > > > 
> > > > > > 
> > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > when the corresponding rule was first added.
> > > > > > 
> > > > > > The alternatives considered to improve this check were looking the file
> > > > > > to be a script by either examining the file extension or file permissions.
> > > > > >
> > > > > 
> > > > > Make this sentence shorter. Try.
> > > > >  
> > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > result and hence I dropped the idea of using this approach.
> > > > > > 
> > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > first line.
> > > > > > 
> > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > would make the latter approach to work even more precisely.
> > > > > > 
> > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > >
> > > > > 
> > > > > There is no notification here. Think about better wording.
> > > > >  
> > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > ---
> > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > --- a/scripts/checkpatch.pl
> > > > > > +++ b/scripts/checkpatch.pl
> > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > >  		}
> > > > > >  
> > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > +			$checklicenseline = 2;
> > > > > > +		}
> > > > > 
> > > > > That check looks good now.
> > > > > 
> > > > > >  		if ($realline == $checklicenseline) {
> > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > >  				$checklicenseline = 2;
> > > > > 
> > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > then set checklicenseline to line 2, right?
> > > > 
> > > > Sir,
> > > > 
> > > > Should we remove this check? Earlier when I checked for file extension
> > > > we had 120 cases where this check was also needed but now we have a
> > > > better heuristic which is going to work for all cases where license
> > > > should be on line 2 irrespective of the fact that we know the first line
> > > > or not.
> > > >
> > > 
> > > Are you sure about that? Where is the evaluation that proves your point?
> > > 
> > > E.g., are all files that contain a shebang really with an executable flag?
> > > 
> > > Which commands did you run to check this?
> > >  
> > > > If I am missing out on something and we should not be removing this check,
> > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > interfere with the existing logic.
> > > > 
> > > > Please let me know which path should I go about and then I shall resend
> > > > the patch with the modified commit message.
> > > > 
> > > 
> > > Think about the strengths and weaknesses of the potential solutions, then 
> > > show with some commands (as I did for example, for finding the first 
> > > lines previously) that you can show that it practically makes a 
> > > difference and you can numbers on those differences.
> > > 
> > > When you did that, send a new patch.
> > > 
> > > Lukas
> > > 
> > Sir,
> > 
> > I ran the evaluation as:
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > #!/bin/bash
> > 
> > for file in $(git ls-files)
> > do
> >         permissions="$(stat -c "%a %n" $file)"
> >         echo "$permissions"
> > done
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > #!/bin/bash
> > file="executables"
> > while IFS= read -r line
> > do
> >         firstline=`head -n 1 $line`
> >         printf '%s:%s\n' "$firstline" "$line"
> > done <"$file"
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > 611
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > 540
> > 
> > We can see that there are 71 files where the executable bit is set but
> > the first line is not a shebang. These include 13 directories which
> > throw the error above. Remaining 58 files(earlier the number was 53)
> > could be cleaned so that this heuristic works better as we saw. So, by
> > checking only for the executable bit we can say that license should be
> > on second line, we probably don't need to check for the shebang on line
> > 1.
> > Please let me know if the evaluation makes sense.
> >

This evaluation makes sense to find the cases that should be cleaned up.

Either the executable flag is simply set wrongly and should be dropped or 
it is actually a script and should get a shebang in the beginning.

I actually already started cleaning up. See:

https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/

We can discuss how to continue this cleanup.

However, you cannot use this evaluation to say the checking the executable 
bit is enough, simply as from this evaluation, you do not know how many 
files have a shebang in the first line and are not set with executable 
flag. For those files, we expect the SPDX identifier in the second line 
but your suggested approach would expect it in the first line.

So we need an evaluation to find out how many cases of that situation 
exist?

Then, can we easily fix those as well?

This is certainly good investigation and it leads to some cleanup, so let 
us continue here.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-22  8:24           ` Lukas Bulwahn
@ 2020-08-22 19:25             ` Lukas Bulwahn
  2020-08-24  8:35               ` Mrinal Pandey
  2020-08-24  8:23             ` Mrinal Pandey
  1 sibling, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-22 19:25 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: linux-kernel-mentees



On Sat, 22 Aug 2020, Lukas Bulwahn wrote:

> 
> 
> On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > 
> > > > 
> > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > 
> > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > isn't found on the line 1.
> > > > > > 
> > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > because then the shebang check for line 1 cannot work.
> > > > > > 
> > > > > > > 
> > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > when the corresponding rule was first added.
> > > > > > > 
> > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > >
> > > > > > 
> > > > > > Make this sentence shorter. Try.
> > > > > >  
> > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > 
> > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > first line.
> > > > > > > 
> > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > would make the latter approach to work even more precisely.
> > > > > > > 
> > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > >
> > > > > > 
> > > > > > There is no notification here. Think about better wording.
> > > > > >  
> > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > ---
> > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > >  		}
> > > > > > >  
> > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > +			$checklicenseline = 2;
> > > > > > > +		}
> > > > > > 
> > > > > > That check looks good now.
> > > > > > 
> > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > >  				$checklicenseline = 2;
> > > > > > 
> > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > then set checklicenseline to line 2, right?
> > > > > 
> > > > > Sir,
> > > > > 
> > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > we had 120 cases where this check was also needed but now we have a
> > > > > better heuristic which is going to work for all cases where license
> > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > or not.
> > > > >
> > > > 
> > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > 
> > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > 
> > > > Which commands did you run to check this?
> > > >  
> > > > > If I am missing out on something and we should not be removing this check,
> > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > interfere with the existing logic.
> > > > > 
> > > > > Please let me know which path should I go about and then I shall resend
> > > > > the patch with the modified commit message.
> > > > > 
> > > > 
> > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > show with some commands (as I did for example, for finding the first 
> > > > lines previously) that you can show that it practically makes a 
> > > > difference and you can numbers on those differences.
> > > > 
> > > > When you did that, send a new patch.
> > > > 
> > > > Lukas
> > > > 
> > > Sir,
> > > 
> > > I ran the evaluation as:
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > #!/bin/bash
> > > 
> > > for file in $(git ls-files)
> > > do
> > >         permissions="$(stat -c "%a %n" $file)"
> > >         echo "$permissions"
> > > done
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > #!/bin/bash
> > > file="executables"
> > > while IFS= read -r line
> > > do
> > >         firstline=`head -n 1 $line`
> > >         printf '%s:%s\n' "$firstline" "$line"
> > > done <"$file"
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > 611
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > 540
> > > 
> > > We can see that there are 71 files where the executable bit is set but
> > > the first line is not a shebang. These include 13 directories which
> > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > could be cleaned so that this heuristic works better as we saw. So, by
> > > checking only for the executable bit we can say that license should be
> > > on second line, we probably don't need to check for the shebang on line
> > > 1.
> > > Please let me know if the evaluation makes sense.
> > >
> 
> This evaluation makes sense to find the cases that should be cleaned up.
> 
> Either the executable flag is simply set wrongly and should be dropped or 
> it is actually a script and should get a shebang in the beginning.
> 
> I actually already started cleaning up. See:
> 
> https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> 
> We can discuss how to continue this cleanup.
>

I had another look at the results of your script.

Just a minor improvement to that resulting list:

I think symbolic links in the repository are always of permission 777, and 
I think that is reasonable.

So maybe you can filter out the symbolic links in your get_permissions.sh?

Then, the list is probably down to a few 20 to 30 cases that should 
probably really be cleaned up.

Can you share that script and the results? Then, let us start cleaning 
up.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-22  8:24           ` Lukas Bulwahn
  2020-08-22 19:25             ` Lukas Bulwahn
@ 2020-08-24  8:23             ` Mrinal Pandey
  2020-08-24  8:54               ` Lukas Bulwahn
  1 sibling, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-24  8:23 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1: Type: text/plain, Size: 10996 bytes --]

On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> 
> 
> On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > 
> > > > 
> > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > 
> > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > isn't found on the line 1.
> > > > > > 
> > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > because then the shebang check for line 1 cannot work.
> > > > > > 
> > > > > > > 
> > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > when the corresponding rule was first added.
> > > > > > > 
> > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > >
> > > > > > 
> > > > > > Make this sentence shorter. Try.
> > > > > >  
> > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > 
> > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > first line.
> > > > > > > 
> > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > would make the latter approach to work even more precisely.
> > > > > > > 
> > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > >
> > > > > > 
> > > > > > There is no notification here. Think about better wording.
> > > > > >  
> > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > ---
> > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > >  		}
> > > > > > >  
> > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > +			$checklicenseline = 2;
> > > > > > > +		}
> > > > > > 
> > > > > > That check looks good now.
> > > > > > 
> > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > >  				$checklicenseline = 2;
> > > > > > 
> > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > then set checklicenseline to line 2, right?
> > > > > 
> > > > > Sir,
> > > > > 
> > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > we had 120 cases where this check was also needed but now we have a
> > > > > better heuristic which is going to work for all cases where license
> > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > or not.
> > > > >
> > > > 
> > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > 
> > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > 
> > > > Which commands did you run to check this?
> > > >  
> > > > > If I am missing out on something and we should not be removing this check,
> > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > interfere with the existing logic.
> > > > > 
> > > > > Please let me know which path should I go about and then I shall resend
> > > > > the patch with the modified commit message.
> > > > > 
> > > > 
> > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > show with some commands (as I did for example, for finding the first 
> > > > lines previously) that you can show that it practically makes a 
> > > > difference and you can numbers on those differences.
> > > > 
> > > > When you did that, send a new patch.
> > > > 
> > > > Lukas
> > > > 
> > > Sir,
> > > 
> > > I ran the evaluation as:
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > #!/bin/bash
> > > 
> > > for file in $(git ls-files)
> > > do
> > >         permissions="$(stat -c "%a %n" $file)"
> > >         echo "$permissions"
> > > done
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > #!/bin/bash
> > > file="executables"
> > > while IFS= read -r line
> > > do
> > >         firstline=`head -n 1 $line`
> > >         printf '%s:%s\n' "$firstline" "$line"
> > > done <"$file"
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > 611
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > 540
> > > 
> > > We can see that there are 71 files where the executable bit is set but
> > > the first line is not a shebang. These include 13 directories which
> > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > could be cleaned so that this heuristic works better as we saw. So, by
> > > checking only for the executable bit we can say that license should be
> > > on second line, we probably don't need to check for the shebang on line
> > > 1.
> > > Please let me know if the evaluation makes sense.
> > >
> 
> This evaluation makes sense to find the cases that should be cleaned up.
> 
> Either the executable flag is simply set wrongly and should be dropped or 
> it is actually a script and should get a shebang in the beginning.
> 
> I actually already started cleaning up. See:
> 
> https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> 
> We can discuss how to continue this cleanup.
>

Sir,

Sure. I would love to discuss that.
Should I send cleanup related patches directly to lkml or do they need to
go through you, Shuah ma'am and mentee list first?

> However, you cannot use this evaluation to say the checking the executable 
> bit is enough, simply as from this evaluation, you do not know how many 
> files have a shebang in the first line and are not set with executable 
> flag. For those files, we expect the SPDX identifier in the second line 
> but your suggested approach would expect it in the first line.
> 
> So we need an evaluation to find out how many cases of that situation 
> exist?
> 
> Then, can we easily fix those as well?

Here is the evaluation I ran to find such cases:

mrinalpandey@mrinalpandey:~/linux/linux$ cat first_lines.sh
#!/bin/bash

for file in $(git ls-files)
do
	permissions="$(stat -c '%a %n' $file)"
	firstline="$(head -n 1 $file)"
	echo "$permissions : $firstline"
done
mrinalpandey@mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
82

The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
where we have shebang in the first line but the file is non-executable.

> 
> This is certainly good investigation and it leads to some cleanup, so let 
> us continue here.
>
Thank you sir.

> Lukas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-22 19:25             ` Lukas Bulwahn
@ 2020-08-24  8:35               ` Mrinal Pandey
  2020-08-24  8:56                 ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-24  8:35 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1: Type: text/plain, Size: 10470 bytes --]

On 20/08/22 09:25PM, Lukas Bulwahn wrote:
> 
> 
> On Sat, 22 Aug 2020, Lukas Bulwahn wrote:
> 
> > 
> > 
> > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > 
> > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > 
> > > > > 
> > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > 
> > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > 
> > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > 
> > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > isn't found on the line 1.
> > > > > > > 
> > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > 
> > > > > > > > 
> > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > when the corresponding rule was first added.
> > > > > > > > 
> > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > >
> > > > > > > 
> > > > > > > Make this sentence shorter. Try.
> > > > > > >  
> > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > 
> > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > first line.
> > > > > > > > 
> > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > 
> > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > >
> > > > > > > 
> > > > > > > There is no notification here. Think about better wording.
> > > > > > >  
> > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > ---
> > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > >  		}
> > > > > > > >  
> > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > +			$checklicenseline = 2;
> > > > > > > > +		}
> > > > > > > 
> > > > > > > That check looks good now.
> > > > > > > 
> > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > >  				$checklicenseline = 2;
> > > > > > > 
> > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > then set checklicenseline to line 2, right?
> > > > > > 
> > > > > > Sir,
> > > > > > 
> > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > better heuristic which is going to work for all cases where license
> > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > or not.
> > > > > >
> > > > > 
> > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > 
> > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > 
> > > > > Which commands did you run to check this?
> > > > >  
> > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > interfere with the existing logic.
> > > > > > 
> > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > the patch with the modified commit message.
> > > > > > 
> > > > > 
> > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > show with some commands (as I did for example, for finding the first 
> > > > > lines previously) that you can show that it practically makes a 
> > > > > difference and you can numbers on those differences.
> > > > > 
> > > > > When you did that, send a new patch.
> > > > > 
> > > > > Lukas
> > > > > 
> > > > Sir,
> > > > 
> > > > I ran the evaluation as:
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > #!/bin/bash
> > > > 
> > > > for file in $(git ls-files)
> > > > do
> > > >         permissions="$(stat -c "%a %n" $file)"
> > > >         echo "$permissions"
> > > > done
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > #!/bin/bash
> > > > file="executables"
> > > > while IFS= read -r line
> > > > do
> > > >         firstline=`head -n 1 $line`
> > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > done <"$file"
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > 611
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > 540
> > > > 
> > > > We can see that there are 71 files where the executable bit is set but
> > > > the first line is not a shebang. These include 13 directories which
> > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > checking only for the executable bit we can say that license should be
> > > > on second line, we probably don't need to check for the shebang on line
> > > > 1.
> > > > Please let me know if the evaluation makes sense.
> > > >
> > 
> > This evaluation makes sense to find the cases that should be cleaned up.
> > 
> > Either the executable flag is simply set wrongly and should be dropped or 
> > it is actually a script and should get a shebang in the beginning.
> > 
> > I actually already started cleaning up. See:
> > 
> > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > 
> > We can discuss how to continue this cleanup.
> >
> 
> I had another look at the results of your script.
> 
> Just a minor improvement to that resulting list:
> 
> I think symbolic links in the repository are always of permission 777, and 
> I think that is reasonable.
> 
> So maybe you can filter out the symbolic links in your get_permissions.sh?
> 
> Then, the list is probably down to a few 20 to 30 cases that should 
> probably really be cleaned up.
> 
> Can you share that script and the results? Then, let us start cleaning 
> up.

Sir,

Here is what I ran:

mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
#!/bin/bash

for file in $(git ls-files)
do
	permissions="$(stat -c '%a %n' $file)"
	details="$(ls -l $file)"
	echo "$permissions $details"
done
mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] | grep -v "\->" > temp
mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
574
mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
#!/bin/bash
file="executables"
while IFS= read -r line
do
	firstline=`head -n 1 $line`
	printf '%s:%s\n' "$firstline" "$line"
done <"$file"
mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
539

Hence, there are only 35 cases to be cleaned up.

Thank you.
> 
> Lukas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-24  8:23             ` Mrinal Pandey
@ 2020-08-24  8:54               ` Lukas Bulwahn
  2020-08-25  4:53                 ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-24  8:54 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: linux-kernel-mentees



On Mon, 24 Aug 2020, Mrinal Pandey wrote:

> On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> > 
> > 
> > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > 
> > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > 
> > > > > 
> > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > 
> > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > 
> > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > 
> > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > isn't found on the line 1.
> > > > > > > 
> > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > 
> > > > > > > > 
> > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > when the corresponding rule was first added.
> > > > > > > > 
> > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > >
> > > > > > > 
> > > > > > > Make this sentence shorter. Try.
> > > > > > >  
> > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > 
> > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > first line.
> > > > > > > > 
> > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > 
> > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > >
> > > > > > > 
> > > > > > > There is no notification here. Think about better wording.
> > > > > > >  
> > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > ---
> > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > >  		}
> > > > > > > >  
> > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > +			$checklicenseline = 2;
> > > > > > > > +		}
> > > > > > > 
> > > > > > > That check looks good now.
> > > > > > > 
> > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > >  				$checklicenseline = 2;
> > > > > > > 
> > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > then set checklicenseline to line 2, right?
> > > > > > 
> > > > > > Sir,
> > > > > > 
> > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > better heuristic which is going to work for all cases where license
> > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > or not.
> > > > > >
> > > > > 
> > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > 
> > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > 
> > > > > Which commands did you run to check this?
> > > > >  
> > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > interfere with the existing logic.
> > > > > > 
> > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > the patch with the modified commit message.
> > > > > > 
> > > > > 
> > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > show with some commands (as I did for example, for finding the first 
> > > > > lines previously) that you can show that it practically makes a 
> > > > > difference and you can numbers on those differences.
> > > > > 
> > > > > When you did that, send a new patch.
> > > > > 
> > > > > Lukas
> > > > > 
> > > > Sir,
> > > > 
> > > > I ran the evaluation as:
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > #!/bin/bash
> > > > 
> > > > for file in $(git ls-files)
> > > > do
> > > >         permissions="$(stat -c "%a %n" $file)"
> > > >         echo "$permissions"
> > > > done
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > #!/bin/bash
> > > > file="executables"
> > > > while IFS= read -r line
> > > > do
> > > >         firstline=`head -n 1 $line`
> > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > done <"$file"
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > 611
> > > > 
> > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > 540
> > > > 
> > > > We can see that there are 71 files where the executable bit is set but
> > > > the first line is not a shebang. These include 13 directories which
> > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > checking only for the executable bit we can say that license should be
> > > > on second line, we probably don't need to check for the shebang on line
> > > > 1.
> > > > Please let me know if the evaluation makes sense.
> > > >
> > 
> > This evaluation makes sense to find the cases that should be cleaned up.
> > 
> > Either the executable flag is simply set wrongly and should be dropped or 
> > it is actually a script and should get a shebang in the beginning.
> > 
> > I actually already started cleaning up. See:
> > 
> > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > 
> > We can discuss how to continue this cleanup.
> >
> 
> Sir,
> 
> Sure. I would love to discuss that.
> Should I send cleanup related patches directly to lkml or do they need to
> go through you, Shuah ma'am and mentee list first?
> 
> > However, you cannot use this evaluation to say the checking the executable 
> > bit is enough, simply as from this evaluation, you do not know how many 
> > files have a shebang in the first line and are not set with executable 
> > flag. For those files, we expect the SPDX identifier in the second line 
> > but your suggested approach would expect it in the first line.
> > 
> > So we need an evaluation to find out how many cases of that situation 
> > exist?
> > 
> > Then, can we easily fix those as well?
> 
> Here is the evaluation I ran to find such cases:
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cat first_lines.sh
> #!/bin/bash
> 
> for file in $(git ls-files)
> do
> 	permissions="$(stat -c '%a %n' $file)"
> 	firstline="$(head -n 1 $file)"
> 	echo "$permissions : $firstline"
> done
> mrinalpandey@mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
> head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> 82
> 
> The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
> where we have shebang in the first line but the file is non-executable.
>

How about removing the symbolic link files and then sharing that list?

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-24  8:35               ` Mrinal Pandey
@ 2020-08-24  8:56                 ` Lukas Bulwahn
  2020-08-25  4:56                   ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-24  8:56 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees



On Mon, 24 Aug 2020, Mrinal Pandey wrote:

> On 20/08/22 09:25PM, Lukas Bulwahn wrote:
> > 
> > 
> > On Sat, 22 Aug 2020, Lukas Bulwahn wrote:
> > 
> > > 
> > > 
> > > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > > 
> > > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > > 
> > > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > > 
> > > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > > isn't found on the line 1.
> > > > > > > > 
> > > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > > when the corresponding rule was first added.
> > > > > > > > > 
> > > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Make this sentence shorter. Try.
> > > > > > > >  
> > > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > > 
> > > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > > first line.
> > > > > > > > > 
> > > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > > 
> > > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > There is no notification here. Think about better wording.
> > > > > > > >  
> > > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > > >  		}
> > > > > > > > >  
> > > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > > +			$checklicenseline = 2;
> > > > > > > > > +		}
> > > > > > > > 
> > > > > > > > That check looks good now.
> > > > > > > > 
> > > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > > >  				$checklicenseline = 2;
> > > > > > > > 
> > > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > > then set checklicenseline to line 2, right?
> > > > > > > 
> > > > > > > Sir,
> > > > > > > 
> > > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > > better heuristic which is going to work for all cases where license
> > > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > > or not.
> > > > > > >
> > > > > > 
> > > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > > 
> > > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > > 
> > > > > > Which commands did you run to check this?
> > > > > >  
> > > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > > interfere with the existing logic.
> > > > > > > 
> > > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > > the patch with the modified commit message.
> > > > > > > 
> > > > > > 
> > > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > > show with some commands (as I did for example, for finding the first 
> > > > > > lines previously) that you can show that it practically makes a 
> > > > > > difference and you can numbers on those differences.
> > > > > > 
> > > > > > When you did that, send a new patch.
> > > > > > 
> > > > > > Lukas
> > > > > > 
> > > > > Sir,
> > > > > 
> > > > > I ran the evaluation as:
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > > #!/bin/bash
> > > > > 
> > > > > for file in $(git ls-files)
> > > > > do
> > > > >         permissions="$(stat -c "%a %n" $file)"
> > > > >         echo "$permissions"
> > > > > done
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > > #!/bin/bash
> > > > > file="executables"
> > > > > while IFS= read -r line
> > > > > do
> > > > >         firstline=`head -n 1 $line`
> > > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > > done <"$file"
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > > 611
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > > 540
> > > > > 
> > > > > We can see that there are 71 files where the executable bit is set but
> > > > > the first line is not a shebang. These include 13 directories which
> > > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > > checking only for the executable bit we can say that license should be
> > > > > on second line, we probably don't need to check for the shebang on line
> > > > > 1.
> > > > > Please let me know if the evaluation makes sense.
> > > > >
> > > 
> > > This evaluation makes sense to find the cases that should be cleaned up.
> > > 
> > > Either the executable flag is simply set wrongly and should be dropped or 
> > > it is actually a script and should get a shebang in the beginning.
> > > 
> > > I actually already started cleaning up. See:
> > > 
> > > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > > 
> > > We can discuss how to continue this cleanup.
> > >
> > 
> > I had another look at the results of your script.
> > 
> > Just a minor improvement to that resulting list:
> > 
> > I think symbolic links in the repository are always of permission 777, and 
> > I think that is reasonable.
> > 
> > So maybe you can filter out the symbolic links in your get_permissions.sh?
> > 
> > Then, the list is probably down to a few 20 to 30 cases that should 
> > probably really be cleaned up.
> > 
> > Can you share that script and the results? Then, let us start cleaning 
> > up.
> 
> Sir,
> 
> Here is what I ran:
> 
> mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> #!/bin/bash
> 
> for file in $(git ls-files)
> do
> 	permissions="$(stat -c '%a %n' $file)"
> 	details="$(ls -l $file)"
> 	echo "$permissions $details"
> done
> mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] | grep -v "\->" > temp
> mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> 574
> mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> #!/bin/bash
> file="executables"
> while IFS= read -r line
> do
> 	firstline=`head -n 1 $line`
> 	printf '%s:%s\n' "$firstline" "$line"
> done <"$file"
> mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> 539
> 
> Hence, there are only 35 cases to be cleaned up.
>

Can you share those 35 cases you identified?

Then, we can discuss the individual changes for those 35 cases.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-24  8:54               ` Lukas Bulwahn
@ 2020-08-25  4:53                 ` Mrinal Pandey
  2020-09-06  4:27                   ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-25  4:53 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1.1: Type: text/plain, Size: 12170 bytes --]

On 20/08/24 10:54AM, Lukas Bulwahn wrote:
> 
> 
> On Mon, 24 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> > > 
> > > 
> > > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > > 
> > > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > > 
> > > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > > 
> > > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > > isn't found on the line 1.
> > > > > > > > 
> > > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > > when the corresponding rule was first added.
> > > > > > > > > 
> > > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Make this sentence shorter. Try.
> > > > > > > >  
> > > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > > 
> > > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > > first line.
> > > > > > > > > 
> > > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > > 
> > > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > There is no notification here. Think about better wording.
> > > > > > > >  
> > > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > > >  		}
> > > > > > > > >  
> > > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > > +			$checklicenseline = 2;
> > > > > > > > > +		}
> > > > > > > > 
> > > > > > > > That check looks good now.
> > > > > > > > 
> > > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > > >  				$checklicenseline = 2;
> > > > > > > > 
> > > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > > then set checklicenseline to line 2, right?
> > > > > > > 
> > > > > > > Sir,
> > > > > > > 
> > > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > > better heuristic which is going to work for all cases where license
> > > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > > or not.
> > > > > > >
> > > > > > 
> > > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > > 
> > > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > > 
> > > > > > Which commands did you run to check this?
> > > > > >  
> > > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > > interfere with the existing logic.
> > > > > > > 
> > > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > > the patch with the modified commit message.
> > > > > > > 
> > > > > > 
> > > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > > show with some commands (as I did for example, for finding the first 
> > > > > > lines previously) that you can show that it practically makes a 
> > > > > > difference and you can numbers on those differences.
> > > > > > 
> > > > > > When you did that, send a new patch.
> > > > > > 
> > > > > > Lukas
> > > > > > 
> > > > > Sir,
> > > > > 
> > > > > I ran the evaluation as:
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > > #!/bin/bash
> > > > > 
> > > > > for file in $(git ls-files)
> > > > > do
> > > > >         permissions="$(stat -c "%a %n" $file)"
> > > > >         echo "$permissions"
> > > > > done
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > > #!/bin/bash
> > > > > file="executables"
> > > > > while IFS= read -r line
> > > > > do
> > > > >         firstline=`head -n 1 $line`
> > > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > > done <"$file"
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > > 611
> > > > > 
> > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > > 540
> > > > > 
> > > > > We can see that there are 71 files where the executable bit is set but
> > > > > the first line is not a shebang. These include 13 directories which
> > > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > > checking only for the executable bit we can say that license should be
> > > > > on second line, we probably don't need to check for the shebang on line
> > > > > 1.
> > > > > Please let me know if the evaluation makes sense.
> > > > >
> > > 
> > > This evaluation makes sense to find the cases that should be cleaned up.
> > > 
> > > Either the executable flag is simply set wrongly and should be dropped or 
> > > it is actually a script and should get a shebang in the beginning.
> > > 
> > > I actually already started cleaning up. See:
> > > 
> > > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > > 
> > > We can discuss how to continue this cleanup.
> > >
> > 
> > Sir,
> > 
> > Sure. I would love to discuss that.
> > Should I send cleanup related patches directly to lkml or do they need to
> > go through you, Shuah ma'am and mentee list first?
> > 
> > > However, you cannot use this evaluation to say the checking the executable 
> > > bit is enough, simply as from this evaluation, you do not know how many 
> > > files have a shebang in the first line and are not set with executable 
> > > flag. For those files, we expect the SPDX identifier in the second line 
> > > but your suggested approach would expect it in the first line.
> > > 
> > > So we need an evaluation to find out how many cases of that situation 
> > > exist?
> > > 
> > > Then, can we easily fix those as well?
> > 
> > Here is the evaluation I ran to find such cases:
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_lines.sh
> > #!/bin/bash
> > 
> > for file in $(git ls-files)
> > do
> > 	permissions="$(stat -c '%a %n' $file)"
> > 	firstline="$(head -n 1 $file)"
> > 	echo "$permissions : $firstline"
> > done
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
> > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > 82
> > 
> > The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
> > where we have shebang in the first line but the file is non-executable.
> >
> 
> How about removing the symbolic link files and then sharing that list?

Sir,

I couldn't find any symbolic links in these 81 files. The list is
attached herewith.
The list is in the format:
<permissions> <file-name> : <first-line>

Thank you.
> 
> Lukas

[-- Attachment #1.1.2: make_executable --]
[-- Type: text/plain, Size: 4304 bytes --]

664 Documentation/admin-guide/aoe/autoload.sh : #!/bin/sh
664 Documentation/admin-guide/aoe/status.sh : #! /bin/sh
664 Documentation/arm64/kasan-offsets.sh : #!/bin/sh
664 Documentation/firmware_class/hotplug-script : #!/bin/sh
664 Documentation/networking/device_drivers/atm/cxacru-cf.py : #!/usr/bin/env python
664 Documentation/s390/config3270.sh : #!/bin/sh
664 Documentation/sphinx/parallel-wrapper.sh : #!/bin/sh
664 Documentation/trace/postprocess/decode_msr.py : #!/usr/bin/python
664 Documentation/trace/postprocess/trace-pagealloc-postprocess.pl : #!/usr/bin/perl
664 Documentation/trace/postprocess/trace-vmscan-postprocess.pl : #!/usr/bin/perl
664 arch/alpha/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/alpha/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/arm/boot/install.sh : #!/bin/sh
664 arch/arm/crypto/poly1305-armv4.pl : #!/usr/bin/env perl
664 arch/arm/crypto/sha256-armv4.pl : #!/usr/bin/env perl
664 arch/arm/crypto/sha512-armv4.pl : #!/usr/bin/env perl
664 arch/arm/tools/gen-mach-types : #!/bin/awk
664 arch/arm/tools/syscallhdr.sh : #!/bin/sh
664 arch/arm/tools/syscallnr.sh : #!/bin/sh
664 arch/arm/tools/syscalltbl.sh : #!/bin/sh
664 arch/arm64/boot/install.sh : #!/bin/sh
664 arch/arm64/crypto/poly1305-armv8.pl : #!/usr/bin/env perl
664 arch/arm64/crypto/sha512-armv8.pl : #! /usr/bin/env perl
664 arch/ia64/install.sh : #!/bin/sh
664 arch/ia64/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/ia64/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/ia64/scripts/unwcheck.py : #!/usr/bin/python
664 arch/m68k/install.sh : #!/bin/sh
664 arch/m68k/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/m68k/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/microblaze/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/microblaze/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/mips/crypto/poly1305-mips.pl : #!/usr/bin/env perl
664 arch/mips/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/mips/kernel/syscalls/syscallnr.sh : #!/bin/sh
664 arch/mips/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/nios2/boot/install.sh : #!/bin/sh
664 arch/parisc/boot/install.sh : #!/bin/sh
664 arch/parisc/install.sh : #!/bin/sh
664 arch/parisc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/parisc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/parisc/nm : #!/bin/sh
664 arch/powerpc/boot/install.sh : #!/bin/sh
664 arch/powerpc/kernel/prom_init_check.sh : #!/bin/sh
664 arch/powerpc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/powerpc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/powerpc/kernel/systbl_chk.sh : #!/bin/sh
664 arch/riscv/boot/install.sh : #!/bin/sh
664 arch/s390/boot/install.sh : #!/bin/sh
664 arch/sh/boot/compressed/install.sh : #!/bin/sh
664 arch/sh/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/sh/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/sh/tools/gen-mach-types : #!/bin/awk
664 arch/sparc/boot/install.sh : #!/bin/sh
664 arch/sparc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/sparc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 arch/sparc/vdso/checkundef.sh : #!/bin/sh
664 arch/x86/boot/genimage.sh : #!/bin/sh
664 arch/x86/boot/install.sh : #!/bin/sh
664 arch/x86/crypto/poly1305-x86_64-cryptogams.pl : #!/usr/bin/env perl
664 arch/x86/entry/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/x86/entry/syscalls/syscalltbl.sh : #!/bin/bash
664 arch/x86/kernel/cpu/mkcapflags.sh : #!/bin/sh
664 arch/x86/tools/gen-insn-attr-x86.awk : #!/bin/awk -f
664 arch/x86/tools/objdump_reformat.awk : #!/bin/awk -f
664 arch/x86/um/vdso/checkundef.sh : #!/bin/sh
664 arch/xtensa/kernel/syscalls/syscallhdr.sh : #!/bin/sh
664 arch/xtensa/kernel/syscalls/syscalltbl.sh : #!/bin/sh
664 drivers/block/paride/mkd : #!/bin/bash
664 drivers/crypto/vmx/aesp8-ppc.pl : #! /usr/bin/env perl
664 drivers/crypto/vmx/ghashp8-ppc.pl : #!/usr/bin/env perl
664 drivers/crypto/vmx/ppc-xlate.pl : #!/usr/bin/env perl
664 drivers/scsi/script_asm.pl : #!/usr/bin/perl -s
664 drivers/usb/serial/ezusb_convert.pl : #! /usr/bin/perl -w
664 samples/bpf/lwt_len_hist.sh : #!/bin/bash
664 samples/bpf/test_lwt_bpf.sh : #!/bin/bash
664 scripts/atomic/gen-atomics.sh : #!/bin/sh
664 scripts/gcc-plugins/gen-random-seed.sh : #!/bin/sh
664 scripts/nsdeps : #!/bin/sh
664 scripts/spdxcheck-test.sh : #!/bin/sh
664 scripts/xen-hypercalls.sh : #!/bin/sh

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-24  8:56                 ` Lukas Bulwahn
@ 2020-08-25  4:56                   ` Mrinal Pandey
  2020-08-25  5:54                     ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-08-25  4:56 UTC (permalink / raw)
  To: Lukas Bulwahn, Linux-kernel-mentees, skhan, mrinalmni


[-- Attachment #1.1.1: Type: text/plain, Size: 11729 bytes --]

On 20/08/24 10:56AM, Lukas Bulwahn wrote:
> 
> 
> On Mon, 24 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/22 09:25PM, Lukas Bulwahn wrote:
> > > 
> > > 
> > > On Sat, 22 Aug 2020, Lukas Bulwahn wrote:
> > > 
> > > > 
> > > > 
> > > > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > > > 
> > > > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > > > 
> > > > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > > > 
> > > > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > > > isn't found on the line 1.
> > > > > > > > > 
> > > > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > > > when the corresponding rule was first added.
> > > > > > > > > > 
> > > > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > > > >
> > > > > > > > > 
> > > > > > > > > Make this sentence shorter. Try.
> > > > > > > > >  
> > > > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > > > 
> > > > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > > > first line.
> > > > > > > > > > 
> > > > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > > > 
> > > > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > > > >
> > > > > > > > > 
> > > > > > > > > There is no notification here. Think about better wording.
> > > > > > > > >  
> > > > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > > > ---
> > > > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > > > >  		}
> > > > > > > > > >  
> > > > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > > > +			$checklicenseline = 2;
> > > > > > > > > > +		}
> > > > > > > > > 
> > > > > > > > > That check looks good now.
> > > > > > > > > 
> > > > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > > > >  				$checklicenseline = 2;
> > > > > > > > > 
> > > > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > > > then set checklicenseline to line 2, right?
> > > > > > > > 
> > > > > > > > Sir,
> > > > > > > > 
> > > > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > > > better heuristic which is going to work for all cases where license
> > > > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > > > or not.
> > > > > > > >
> > > > > > > 
> > > > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > > > 
> > > > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > > > 
> > > > > > > Which commands did you run to check this?
> > > > > > >  
> > > > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > > > interfere with the existing logic.
> > > > > > > > 
> > > > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > > > the patch with the modified commit message.
> > > > > > > > 
> > > > > > > 
> > > > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > > > show with some commands (as I did for example, for finding the first 
> > > > > > > lines previously) that you can show that it practically makes a 
> > > > > > > difference and you can numbers on those differences.
> > > > > > > 
> > > > > > > When you did that, send a new patch.
> > > > > > > 
> > > > > > > Lukas
> > > > > > > 
> > > > > > Sir,
> > > > > > 
> > > > > > I ran the evaluation as:
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > > > #!/bin/bash
> > > > > > 
> > > > > > for file in $(git ls-files)
> > > > > > do
> > > > > >         permissions="$(stat -c "%a %n" $file)"
> > > > > >         echo "$permissions"
> > > > > > done
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > > > #!/bin/bash
> > > > > > file="executables"
> > > > > > while IFS= read -r line
> > > > > > do
> > > > > >         firstline=`head -n 1 $line`
> > > > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > > > done <"$file"
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > > > 611
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > > > 540
> > > > > > 
> > > > > > We can see that there are 71 files where the executable bit is set but
> > > > > > the first line is not a shebang. These include 13 directories which
> > > > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > > > checking only for the executable bit we can say that license should be
> > > > > > on second line, we probably don't need to check for the shebang on line
> > > > > > 1.
> > > > > > Please let me know if the evaluation makes sense.
> > > > > >
> > > > 
> > > > This evaluation makes sense to find the cases that should be cleaned up.
> > > > 
> > > > Either the executable flag is simply set wrongly and should be dropped or 
> > > > it is actually a script and should get a shebang in the beginning.
> > > > 
> > > > I actually already started cleaning up. See:
> > > > 
> > > > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > > > 
> > > > We can discuss how to continue this cleanup.
> > > >
> > > 
> > > I had another look at the results of your script.
> > > 
> > > Just a minor improvement to that resulting list:
> > > 
> > > I think symbolic links in the repository are always of permission 777, and 
> > > I think that is reasonable.
> > > 
> > > So maybe you can filter out the symbolic links in your get_permissions.sh?
> > > 
> > > Then, the list is probably down to a few 20 to 30 cases that should 
> > > probably really be cleaned up.
> > > 
> > > Can you share that script and the results? Then, let us start cleaning 
> > > up.
> > 
> > Sir,
> > 
> > Here is what I ran:
> > 
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > #!/bin/bash
> > 
> > for file in $(git ls-files)
> > do
> > 	permissions="$(stat -c '%a %n' $file)"
> > 	details="$(ls -l $file)"
> > 	echo "$permissions $details"
> > done
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] | grep -v "\->" > temp
> > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > 574
> > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > #!/bin/bash
> > file="executables"
> > while IFS= read -r line
> > do
> > 	firstline=`head -n 1 $line`
> > 	printf '%s:%s\n' "$firstline" "$line"
> > done <"$file"
> > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > 539
> > 
> > Hence, there are only 35 cases to be cleaned up.
> >
> 
> Can you share those 35 cases you identified?
> 
> Then, we can discuss the individual changes for those 35 cases.

Sir,

The list is attached herewith and is in the format:
<first-line> : <file-name>

Thank you.
> 
> Lukas

[-- Attachment #1.1.2: tobecleaned --]
[-- Type: text/plain, Size: 2250 bytes --]

# : Documentation/features/list-arch.sh
# : Documentation/features/scripts/features-refresh.sh
# Copyright © 2016 IBM Corporation : arch/powerpc/tools/unrel_branch_check.sh
/* : drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c
/* : drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
#ifndef _dcn_3_0_0_OFFSET_HEADER : drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_offset.h
#ifndef _dcn_3_0_0_SH_MASK_HEADER : drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_3_0_0_sh_mask.h
#ifndef _dpcs_3_0_0_OFFSET_HEADER : drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_0_offset.h
#ifndef _dpcs_3_0_0_SH_MASK_HEADER : drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_0_sh_mask.h
# name	meta	args... : scripts/atomic/atomics.tbl
cat <<EOF : scripts/atomic/fallbacks/acquire
cat <<EOF : scripts/atomic/fallbacks/add_negative
cat << EOF : scripts/atomic/fallbacks/add_unless
cat <<EOF : scripts/atomic/fallbacks/andnot
cat <<EOF : scripts/atomic/fallbacks/dec
cat <<EOF : scripts/atomic/fallbacks/dec_and_test
cat <<EOF : scripts/atomic/fallbacks/dec_if_positive
cat <<EOF : scripts/atomic/fallbacks/dec_unless_positive
cat <<EOF : scripts/atomic/fallbacks/fence
cat << EOF : scripts/atomic/fallbacks/fetch_add_unless
cat <<EOF : scripts/atomic/fallbacks/inc
cat <<EOF : scripts/atomic/fallbacks/inc_and_test
cat <<EOF : scripts/atomic/fallbacks/inc_not_zero
cat <<EOF : scripts/atomic/fallbacks/inc_unless_negative
cat <<EOF : scripts/atomic/fallbacks/read_acquire
cat <<EOF : scripts/atomic/fallbacks/release
cat <<EOF : scripts/atomic/fallbacks/set_release
cat <<EOF : scripts/atomic/fallbacks/sub_and_test
cat <<EOF : scripts/atomic/fallbacks/try_cmpxchg
# EventClass.py : tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/EventClass.py
# flamegraph.py - create flame graphs from perf samples : tools/perf/scripts/python/flamegraph.py
# Monitor the system for dropped packets and proudce a report of drop locations and counts : tools/perf/scripts/python/net_dropmonitor.py
# stackcollapse.py - format perf samples with one line per distinct call stack : tools/perf/scripts/python/stackcollapse.py
 : tools/testing/selftests/gpio/gpio-mockup-sysfs.sh
# SPDX-License-Identifier: GPL-2.0 : tools/testing/selftests/net/forwarding/sch_red.sh

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-25  4:56                   ` Mrinal Pandey
@ 2020-08-25  5:54                     ` Lukas Bulwahn
  0 siblings, 0 replies; 19+ messages in thread
From: Lukas Bulwahn @ 2020-08-25  5:54 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees

> >
> > Can you share those 35 cases you identified?
> >
> > Then, we can discuss the individual changes for those 35 cases.
>
> Sir,
>
> The list is attached herewith and is in the format:
> <first-line> : <file-name>

664 scripts/gcc-plugins/gen-random-seed.sh : #!/bin/sh
664 scripts/nsdeps : #!/bin/sh
664 scripts/spdxcheck-test.sh : #!/bin/sh
664 scripts/xen-hypercalls.sh : #!/bin/sh

These four can all be set to be executable and spdxcheck-test.sh, also
can be a SPDX-License line.

The changes should be clear, can you phrase a commit message and
determine the developers the different patches shall be sent to.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-08-25  4:53                 ` Mrinal Pandey
@ 2020-09-06  4:27                   ` Mrinal Pandey
  2020-09-07  7:10                     ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-09-06  4:27 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1: Type: text/plain, Size: 17712 bytes --]

On 20/08/25 10:23AM, Mrinal Pandey wrote:
> On 20/08/24 10:54AM, Lukas Bulwahn wrote:
> > 
> > 
> > On Mon, 24 Aug 2020, Mrinal Pandey wrote:
> > 
> > > On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> > > > 
> > > > 
> > > > On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > > > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > > > > 
> > > > > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > > > > 
> > > > > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > > > > 
> > > > > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > > > > isn't found on the line 1.
> > > > > > > > > 
> > > > > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > > > > because then the shebang check for line 1 cannot work.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > > > > when the corresponding rule was first added.
> > > > > > > > > > 
> > > > > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > > > > >
> > > > > > > > > 
> > > > > > > > > Make this sentence shorter. Try.
> > > > > > > > >  
> > > > > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > > > > 
> > > > > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > > > > first line.
> > > > > > > > > > 
> > > > > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > > > > would make the latter approach to work even more precisely.
> > > > > > > > > > 
> > > > > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > > > > >
> > > > > > > > > 
> > > > > > > > > There is no notification here. Think about better wording.
> > > > > > > > >  
> > > > > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
> > > > > > > > > > ---
> > > > > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > > > > >  		}
> > > > > > > > > >  
> > > > > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > > > > +			$checklicenseline = 2;
> > > > > > > > > > +		}
> > > > > > > > > 
> > > > > > > > > That check looks good now.
> > > > > > > > > 
> > > > > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > > > > >  				$checklicenseline = 2;
> > > > > > > > > 
> > > > > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > > > > then set checklicenseline to line 2, right?
> > > > > > > > 
> > > > > > > > Sir,
> > > > > > > > 
> > > > > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > > > > we had 120 cases where this check was also needed but now we have a
> > > > > > > > better heuristic which is going to work for all cases where license
> > > > > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > > > > or not.
> > > > > > > >
> > > > > > > 
> > > > > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > > > > 
> > > > > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > > > > 
> > > > > > > Which commands did you run to check this?
> > > > > > >  
> > > > > > > > If I am missing out on something and we should not be removing this check,
> > > > > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > > > > interfere with the existing logic.
> > > > > > > > 
> > > > > > > > Please let me know which path should I go about and then I shall resend
> > > > > > > > the patch with the modified commit message.
> > > > > > > > 
> > > > > > > 
> > > > > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > > > > show with some commands (as I did for example, for finding the first 
> > > > > > > lines previously) that you can show that it practically makes a 
> > > > > > > difference and you can numbers on those differences.
> > > > > > > 
> > > > > > > When you did that, send a new patch.
> > > > > > > 
> > > > > > > Lukas
> > > > > > > 
> > > > > > Sir,
> > > > > > 
> > > > > > I ran the evaluation as:
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > > > > #!/bin/bash
> > > > > > 
> > > > > > for file in $(git ls-files)
> > > > > > do
> > > > > >         permissions="$(stat -c "%a %n" $file)"
> > > > > >         echo "$permissions"
> > > > > > done
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_line.sh
> > > > > > #!/bin/bash
> > > > > > file="executables"
> > > > > > while IFS= read -r line
> > > > > > do
> > > > > >         firstline=`head -n 1 $line`
> > > > > >         printf '%s:%s\n' "$firstline" "$line"
> > > > > > done <"$file"
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > > > > 611
> > > > > > 
> > > > > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > > > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > > > > 540
> > > > > > 
> > > > > > We can see that there are 71 files where the executable bit is set but
> > > > > > the first line is not a shebang. These include 13 directories which
> > > > > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > > > > could be cleaned so that this heuristic works better as we saw. So, by
> > > > > > checking only for the executable bit we can say that license should be
> > > > > > on second line, we probably don't need to check for the shebang on line
> > > > > > 1.
> > > > > > Please let me know if the evaluation makes sense.
> > > > > >
> > > > 
> > > > This evaluation makes sense to find the cases that should be cleaned up.
> > > > 
> > > > Either the executable flag is simply set wrongly and should be dropped or 
> > > > it is actually a script and should get a shebang in the beginning.
> > > > 
> > > > I actually already started cleaning up. See:
> > > > 
> > > > https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> > > > 
> > > > We can discuss how to continue this cleanup.
> > > >
> > > 
> > > Sir,
> > > 
> > > Sure. I would love to discuss that.
> > > Should I send cleanup related patches directly to lkml or do they need to
> > > go through you, Shuah ma'am and mentee list first?
> > > 
> > > > However, you cannot use this evaluation to say the checking the executable 
> > > > bit is enough, simply as from this evaluation, you do not know how many 
> > > > files have a shebang in the first line and are not set with executable 
> > > > flag. For those files, we expect the SPDX identifier in the second line 
> > > > but your suggested approach would expect it in the first line.
> > > > 
> > > > So we need an evaluation to find out how many cases of that situation 
> > > > exist?
> > > > 
> > > > Then, can we easily fix those as well?
> > > 
> > > Here is the evaluation I ran to find such cases:
> > > 
> > > mrinalpandey@mrinalpandey:~/linux/linux$ cat first_lines.sh
> > > #!/bin/bash
> > > 
> > > for file in $(git ls-files)
> > > do
> > > 	permissions="$(stat -c '%a %n' $file)"
> > > 	firstline="$(head -n 1 $file)"
> > > 	echo "$permissions : $firstline"
> > > done
> > > mrinalpandey@mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
> > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > 82
> > > 
> > > The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
> > > where we have shebang in the first line but the file is non-executable.
> > >
> > 
> > How about removing the symbolic link files and then sharing that list?
> 
> Sir,
> 
> I couldn't find any symbolic links in these 81 files. The list is
> attached herewith.
> The list is in the format:
> <permissions> <file-name> : <first-line>
> 
> Thank you.
> > 
> > Lukas

> 664 Documentation/admin-guide/aoe/autoload.sh : #!/bin/sh
> 664 Documentation/admin-guide/aoe/status.sh : #! /bin/sh
> 664 Documentation/arm64/kasan-offsets.sh : #!/bin/sh
> 664 Documentation/firmware_class/hotplug-script : #!/bin/sh
> 664 Documentation/networking/device_drivers/atm/cxacru-cf.py : #!/usr/bin/env python
> 664 Documentation/s390/config3270.sh : #!/bin/sh
> 664 Documentation/sphinx/parallel-wrapper.sh : #!/bin/sh
> 664 Documentation/trace/postprocess/decode_msr.py : #!/usr/bin/python
> 664 Documentation/trace/postprocess/trace-pagealloc-postprocess.pl : #!/usr/bin/perl
> 664 Documentation/trace/postprocess/trace-vmscan-postprocess.pl : #!/usr/bin/perl
> 664 arch/alpha/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/alpha/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/arm/boot/install.sh : #!/bin/sh
> 664 arch/arm/crypto/poly1305-armv4.pl : #!/usr/bin/env perl
> 664 arch/arm/crypto/sha256-armv4.pl : #!/usr/bin/env perl
> 664 arch/arm/crypto/sha512-armv4.pl : #!/usr/bin/env perl
> 664 arch/arm/tools/gen-mach-types : #!/bin/awk
> 664 arch/arm/tools/syscallhdr.sh : #!/bin/sh
> 664 arch/arm/tools/syscallnr.sh : #!/bin/sh
> 664 arch/arm/tools/syscalltbl.sh : #!/bin/sh
> 664 arch/arm64/boot/install.sh : #!/bin/sh
> 664 arch/arm64/crypto/poly1305-armv8.pl : #!/usr/bin/env perl
> 664 arch/arm64/crypto/sha512-armv8.pl : #! /usr/bin/env perl
> 664 arch/ia64/install.sh : #!/bin/sh
> 664 arch/ia64/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/ia64/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/ia64/scripts/unwcheck.py : #!/usr/bin/python
> 664 arch/m68k/install.sh : #!/bin/sh
> 664 arch/m68k/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/m68k/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/microblaze/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/microblaze/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/mips/crypto/poly1305-mips.pl : #!/usr/bin/env perl
> 664 arch/mips/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/mips/kernel/syscalls/syscallnr.sh : #!/bin/sh
> 664 arch/mips/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/nios2/boot/install.sh : #!/bin/sh
> 664 arch/parisc/boot/install.sh : #!/bin/sh
> 664 arch/parisc/install.sh : #!/bin/sh
> 664 arch/parisc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/parisc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/parisc/nm : #!/bin/sh
> 664 arch/powerpc/boot/install.sh : #!/bin/sh
> 664 arch/powerpc/kernel/prom_init_check.sh : #!/bin/sh
> 664 arch/powerpc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/powerpc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/powerpc/kernel/systbl_chk.sh : #!/bin/sh
> 664 arch/riscv/boot/install.sh : #!/bin/sh
> 664 arch/s390/boot/install.sh : #!/bin/sh
> 664 arch/sh/boot/compressed/install.sh : #!/bin/sh
> 664 arch/sh/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/sh/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/sh/tools/gen-mach-types : #!/bin/awk
> 664 arch/sparc/boot/install.sh : #!/bin/sh
> 664 arch/sparc/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/sparc/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 arch/sparc/vdso/checkundef.sh : #!/bin/sh
> 664 arch/x86/boot/genimage.sh : #!/bin/sh
> 664 arch/x86/boot/install.sh : #!/bin/sh
> 664 arch/x86/crypto/poly1305-x86_64-cryptogams.pl : #!/usr/bin/env perl
> 664 arch/x86/entry/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/x86/entry/syscalls/syscalltbl.sh : #!/bin/bash
> 664 arch/x86/kernel/cpu/mkcapflags.sh : #!/bin/sh
> 664 arch/x86/tools/gen-insn-attr-x86.awk : #!/bin/awk -f
> 664 arch/x86/tools/objdump_reformat.awk : #!/bin/awk -f
> 664 arch/x86/um/vdso/checkundef.sh : #!/bin/sh
> 664 arch/xtensa/kernel/syscalls/syscallhdr.sh : #!/bin/sh
> 664 arch/xtensa/kernel/syscalls/syscalltbl.sh : #!/bin/sh
> 664 drivers/block/paride/mkd : #!/bin/bash
> 664 drivers/crypto/vmx/aesp8-ppc.pl : #! /usr/bin/env perl
> 664 drivers/crypto/vmx/ghashp8-ppc.pl : #!/usr/bin/env perl
> 664 drivers/crypto/vmx/ppc-xlate.pl : #!/usr/bin/env perl
> 664 drivers/scsi/script_asm.pl : #!/usr/bin/perl -s
> 664 drivers/usb/serial/ezusb_convert.pl : #! /usr/bin/perl -w
> 664 samples/bpf/lwt_len_hist.sh : #!/bin/bash
> 664 samples/bpf/test_lwt_bpf.sh : #!/bin/bash
> 664 scripts/atomic/gen-atomics.sh : #!/bin/sh
> 664 scripts/gcc-plugins/gen-random-seed.sh : #!/bin/sh
> 664 scripts/nsdeps : #!/bin/sh
> 664 scripts/spdxcheck-test.sh : #!/bin/sh
> 664 scripts/xen-hypercalls.sh : #!/bin/sh

Sir,

I was wondering how should I go about the original patch? Should I keep
both the checks or just the executable one and let be the patch as it is
as further cleanup can lead to better performance as we have already
seen with the evaluations.

Also, I have completed all the tasks related to this project on Community
Bridge's site. What are the next steps of mentorship? By when can I
expect the results sir/ma'am?

Thank you.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-09-06  4:27                   ` Mrinal Pandey
@ 2020-09-07  7:10                     ` Lukas Bulwahn
  2020-09-11 10:16                       ` Mrinal Pandey
  0 siblings, 1 reply; 19+ messages in thread
From: Lukas Bulwahn @ 2020-09-07  7:10 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees

> I was wondering how should I go about the original patch? Should I keep
> both the checks or just the executable one and let be the patch as it is
> as further cleanup can lead to better performance as we have already
> seen with the evaluations.
>

Kees Cook and Andrew Morton pointed out that a tool may not rely on
the executable flag. So, checkpatch.pl may not rely on the executable
flag; the patch we created here is misdesigned, now we know that.

> Also, I have completed all the tasks related to this project on Community
> Bridge's site. What are the next steps of mentorship? By when can I
> expect the results sir/ma'am?
>

Mentorship can begin by October 1st. We need to develop a project plan
for the mentorship, you are in the lead to write that. We need a
project plan for the mentorship to start.

In the process of the checkpatch.pl patch creation, we identified that
we can check if we find any invocations that rely on the executable
flag. This could be a first set of tasks to do in the mentorship.

Maybe, you can find another aspect to look into the checkpatch.pl
evaluation? Maybe the check for the line of the SPDX-Identifier was
not the best case to start with, because it is more complex than
initially thought.


Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-09-07  7:10                     ` Lukas Bulwahn
@ 2020-09-11 10:16                       ` Mrinal Pandey
  2020-09-11 10:33                         ` Lukas Bulwahn
  0 siblings, 1 reply; 19+ messages in thread
From: Mrinal Pandey @ 2020-09-11 10:16 UTC (permalink / raw)
  To: Lukas Bulwahn, skhan, Linux-kernel-mentees, mrinalmni


[-- Attachment #1.1: Type: text/plain, Size: 1835 bytes --]

On 20/09/07 09:10AM, Lukas Bulwahn wrote:
> > I was wondering how should I go about the original patch? Should I keep
> > both the checks or just the executable one and let be the patch as it is
> > as further cleanup can lead to better performance as we have already
> > seen with the evaluations.
> >
> 
> Kees Cook and Andrew Morton pointed out that a tool may not rely on
> the executable flag. So, checkpatch.pl may not rely on the executable
> flag; the patch we created here is misdesigned, now we know that.

Sir,

Yes.
> 
> > Also, I have completed all the tasks related to this project on Community
> > Bridge's site. What are the next steps of mentorship? By when can I
> > expect the results sir/ma'am?
> >
> 
> Mentorship can begin by October 1st. We need to develop a project plan
> for the mentorship, you are in the lead to write that. We need a
> project plan for the mentorship to start.
> 
Eagerly looking forward to writing the plan. How, when and to whom am I
required to submit this plan and what should it broadly contain?

> In the process of the checkpatch.pl patch creation, we identified that
> we can check if we find any invocations that rely on the executable
> flag. This could be a first set of tasks to do in the mentorship.
>
Yes, sure sir.

> Maybe, you can find another aspect to look into the checkpatch.pl
> evaluation? Maybe the check for the line of the SPDX-Identifier was
> not the best case to start with, because it is more complex than
> initially thought.
>
I suggested few patches before, guess they can sip into mentorship if
you find them fit enough.
Apart from those I have a few other observations on checkpatch.pl and I
am continuously evaluating the output file for more.
I will share these evaluations on a different thread.
> 
> Lukas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files
  2020-09-11 10:16                       ` Mrinal Pandey
@ 2020-09-11 10:33                         ` Lukas Bulwahn
  0 siblings, 0 replies; 19+ messages in thread
From: Lukas Bulwahn @ 2020-09-11 10:33 UTC (permalink / raw)
  To: Mrinal Pandey; +Cc: Linux-kernel-mentees



On Fri, 11 Sep 2020, Mrinal Pandey wrote:

> On 20/09/07 09:10AM, Lukas Bulwahn wrote:
> > > I was wondering how should I go about the original patch? Should I keep
> > > both the checks or just the executable one and let be the patch as it is
> > > as further cleanup can lead to better performance as we have already
> > > seen with the evaluations.
> > >
> > 
> > Kees Cook and Andrew Morton pointed out that a tool may not rely on
> > the executable flag. So, checkpatch.pl may not rely on the executable
> > flag; the patch we created here is misdesigned, now we know that.
> 
> Sir,
> 
> Yes.
> > 
> > > Also, I have completed all the tasks related to this project on Community
> > > Bridge's site. What are the next steps of mentorship? By when can I
> > > expect the results sir/ma'am?
> > >
> > 
> > Mentorship can begin by October 1st. We need to develop a project plan
> > for the mentorship, you are in the lead to write that. We need a
> > project plan for the mentorship to start.
> > 
> Eagerly looking forward to writing the plan. How, when and to whom am I
> required to submit this plan and what should it broadly contain?
>

A rough plan is required for the beginning of the mentorship, if you would 
like to start on October 1st.

Simply send the plan as text email or share it as a plain Google Doc 
document with me.

I should describe buckets of tasks around checkpatch.pl improvements that 
you see that you can work on.

Just as a hint:

You should have the script for evaluating checkpatch.pl.
If you find many false positives, you can try to improve the check.
If you find many same true positives, you can try to implement a "fix 
automation" in checkpatch.pl.

The end goal is to have a setup/config where we have very few false 
positives and very much fix automation.
Then, we can report back the checkpatch.pl findings for the patches sent 
to the mailing list and integrate the reliable fix automation into
tools for maintainers to pick up patches, such as b4, to quickly fix 
patches on the fly when integrating.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-09-11 10:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03  7:58 [Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files Mrinal Pandey
2020-08-03 10:59 ` Lukas Bulwahn
2020-08-04 15:56   ` Mrinal Pandey
2020-08-04 19:37     ` Lukas Bulwahn
2020-08-09  7:22       ` Mrinal Pandey
2020-08-20  4:42         ` Mrinal Pandey
2020-08-22  8:24           ` Lukas Bulwahn
2020-08-22 19:25             ` Lukas Bulwahn
2020-08-24  8:35               ` Mrinal Pandey
2020-08-24  8:56                 ` Lukas Bulwahn
2020-08-25  4:56                   ` Mrinal Pandey
2020-08-25  5:54                     ` Lukas Bulwahn
2020-08-24  8:23             ` Mrinal Pandey
2020-08-24  8:54               ` Lukas Bulwahn
2020-08-25  4:53                 ` Mrinal Pandey
2020-09-06  4:27                   ` Mrinal Pandey
2020-09-07  7:10                     ` Lukas Bulwahn
2020-09-11 10:16                       ` Mrinal Pandey
2020-09-11 10:33                         ` Lukas Bulwahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).