linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
       [not found] ` <alpine.DEB.2.21.2009110925160.9220@felia>
@ 2020-09-12  9:09   ` Dwaipayan Ray
  2020-09-12 11:03     ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-12  9:09 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 3908 bytes --]

On Fri, Sep 11, 2020 at 12:59 PM Lukas Bulwahn <lukas.bulwahn@gmail.com>
wrote:

> Dear Dwaipayan,
>
>
> The zeroth task is to learn suitable netiquette for the communication with
> the kernel community.
>
> First, please do not top-post.
>
>     A: Because we read from top to bottom, left to right.
>     Q: Why should I start my reply below the quoted text?
>
>     A: Because it messes up the order in which people normally read text.
>     Q: Why is top-posting such a bad thing?
>
>     A: The lost context.
>     Q: What makes top-posted replies harder to read than bottom-posted?
>
>     A: Yes.
>     Q: Should I trim down the quoted part of an email to which I'm
> replying?
>
> Second, please always CC: linux-kernel-mentees@lists.linuxfoundation.org.
>
> Third, set up your email client according to the kernel community rules.
>
>
> Then, the first task is to run checkpatch.pl on a few kernel patches and
> collect the results. When you have that, please share your script with
> me, e.g., in a github repository.
>
>
> Hints to the first task:
>
> Can you create a list of all non-merge commits that were added in the
> version v5.8 of the kernel, i.e., all non-merge commits that are in v5.8
> and not already in v5.7?
>
> Can you share the script/command you executed and the resulting list on
> github?
>
> Can you run your script on all commits of this list above and record
> all checkpatch.pl reports, and store them in your github repository?
>
> Can you suggest ideas how to aggregate the findings and create a
> statistics? For example: Which type of error is reported most?
> Can you implement that idea?
>
>
> I also suggest to have a look at the options ./scripts/checkpatch.pl
> --list-types and ./scripts/checkpatch.pl --show-types. The option
> --show-types changes the output of checkpatch.pl to list type
> identifiers,
> so it is easier to parse and aggregate the output.
>
> Please also share the script you create for that purpose on your
> github repository.
>
>
> The second task is to pick one warning that appears often and improve
> checkpatch.pl to handle that better and get it accepted by the kernel
> community.
>
> Hints to the second task follow when the first task is solved.
>
> If you fail on any of those tasks, you are out of the selection process.
>
> I could implement that with just a few lines of code changes, but please
> do not underestimate the learning curve here. I hope you are very fit in
> Perl, that is required for this project.
>
>
> Lukas
>


Hello Sir,
I have gone through the zeroth task and I am aware of the mailing rules
now.
Also I have implemented the first task and I would like you to review it.

The task was to run checkpatch.pl on some commits and aggregate the
reports.

The first subtask was to collect the non merge commits between versions 5.7
and 5.8.
I aggregated the commit hashes and author names into a single file using
git's log
and pretty format directives.

The next subtask was to run checkpatch.pl on all the given commits.
I wrote a perl script to this effect which reads the commits stored in the
file and runs the checkpatch.pl script with the commit hash. Also I used
the
--show-types directives in this stage which allowed me easier collection of
warning and error identifiers.

The final subtask was to aggregate and parse the data. Looking at the
checkpatch's output
format, I determined it was enough to parse only the first two tokens from
each line.
I calculated three possibilities:  "WARNING:{warning_identifier}",
"ERROR:{error_identifier}",
"Commit {commit_abbrev_hash}", and aggregated these values.

Finally I used them to find total commits read, total errors, total
warnings,
and the most frequent warnings and errors from checkpatch's output.

I have uploaded the scripts and output files to
https://github.com/raydwaipayan/lkm-task-1

Hoping to get your review soon.

Thanking you,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 5552 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-12  9:09   ` [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship Dwaipayan Ray
@ 2020-09-12 11:03     ` Lukas Bulwahn
  2020-09-12 12:08       ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-12 11:03 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 5433 bytes --]



On Sat, 12 Sep 2020, Dwaipayan Ray wrote:

> 
> 
> On Fri, Sep 11, 2020 at 12:59 PM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>       Dear Dwaipayan,
> 
> 
>       The zeroth task is to learn suitable netiquette for the communication with
>       the kernel community.
> 
>       First, please do not top-post.
> 
>           A: Because we read from top to bottom, left to right.
>           Q: Why should I start my reply below the quoted text?
> 
>           A: Because it messes up the order in which people normally read text.
>           Q: Why is top-posting such a bad thing?
> 
>           A: The lost context.
>           Q: What makes top-posted replies harder to read than bottom-posted?
> 
>           A: Yes.
>           Q: Should I trim down the quoted part of an email to which I'm
>       replying?
> 
>       Second, please always CC: linux-kernel-mentees@lists.linuxfoundation.org.
> 
>       Third, set up your email client according to the kernel community rules.
> 
> 
>       Then, the first task is to run checkpatch.pl on a few kernel patches and
>       collect the results. When you have that, please share your script with
>       me, e.g., in a github repository.
> 
> 
>       Hints to the first task:
> 
>       Can you create a list of all non-merge commits that were added in the
>       version v5.8 of the kernel, i.e., all non-merge commits that are in v5.8
>       and not already in v5.7?
> 
>       Can you share the script/command you executed and the resulting list on
>       github?
> 
>       Can you run your script on all commits of this list above and record
>       all checkpatch.pl reports, and store them in your github repository?
> 
>       Can you suggest ideas how to aggregate the findings and create a
>       statistics? For example: Which type of error is reported most?
>       Can you implement that idea?
> 
> 
>       I also suggest to have a look at the options ./scripts/checkpatch.pl
>       --list-types and ./scripts/checkpatch.pl --show-types. The option
>       --show-types changes the output of checkpatch.pl to list type identifiers,
>       so it is easier to parse and aggregate the output.
> 
>       Please also share the script you create for that purpose on your
>       github repository.
> 
> 
>       The second task is to pick one warning that appears often and improve
>       checkpatch.pl to handle that better and get it accepted by the kernel
>       community.
> 
>       Hints to the second task follow when the first task is solved.
> 
>       If you fail on any of those tasks, you are out of the selection process.
> 
>       I could implement that with just a few lines of code changes, but please
>       do not underestimate the learning curve here. I hope you are very fit in
>       Perl, that is required for this project.
> 
> 
>       Lukas
> 
> 
> 
> Hello Sir,
> I have gone through the zeroth task and I am aware of the mailing rules now. 
> Also I have implemented the first task and I would like you to review it.
>

Hmm, your email client still seems to be broken :( If you answer to my 
email, it should use ">" not tabs. Maybe you can fix that.

> The task was to run checkpatch.pl on some commits and aggregate the reports. 
> 
> The first subtask was to collect the non merge commits between versions 5.7 and 5.8. 
> I aggregated the commit hashes and author names into a single file using git's log
> and pretty format directives.
> 
> The next subtask was to run checkpatch.pl on all the given commits.
> I wrote a perl script to this effect which reads the commits stored in the 
> file and runs the checkpatch.pl script with the commit hash. Also I used the 
> --show-types directives in this stage which allowed me easier collection of
> warning and error identifiers.
> 
> The final subtask was to aggregate and parse the data. Looking at the checkpatch's output 
> format, I determined it was enough to parse only the first two tokens from each line.
> I calculated three possibilities:  "WARNING:{warning_identifier}", "ERROR:{error_identifier}",
> "Commit {commit_abbrev_hash}", and aggregated these values.
> 
> Finally I used them to find total commits read, total errors, total warnings,
> and the most frequent warnings and errors from checkpatch's output. 
> 
> I have uploaded the scripts and output files to https://github.com/raydwaipayan/lkm-task-1
>

I looked at your scripts, I did not run them. They look as if they would 
do the job you claim they do. They are more complicated than needed, but 
it was not the task to find a simple solution. So, let us try them.

Please have a look at this patch:

https://lore.kernel.org/linux-kernel-mentees/20200912094826.150170-1-ayush@disroot.org/

The author states:

This issue was discovered through a thorough analysis of checkpatch.pl
errors and warnings of type GIT_COMMIT_ID on commits between v5.7 and 
v5.8.

Before applying this patch, checkpatch.pl reported 342 errors of type
GIT_COMMIT_ID. After applying patch, errors reduced to 284.


If your scripts work, you should be able to confirm the statement.

The tasks are:

1. Run your scripts and create a full statistics of all error types with 
their according count for v5.7..v5.8.

2. Apply the patch with git am.

3. Run your scripts again and create a new statistics.

4. Compare before and after

5. Make all results available on your github repository.


Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-12 11:03     ` Lukas Bulwahn
@ 2020-09-12 12:08       ` Dwaipayan Ray
  2020-09-12 12:21         ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-12 12:08 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 1439 bytes --]

> Hmm, your email client still seems to be broken :( If you answer to my
> email, it should use ">" not tabs. Maybe you can fix that.
>
> I looked at your scripts, I did not run them. They look as if they would
> do the job you claim they do. They are more complicated than needed, but
> it was not the task to find a simple solution. So, let us try them.
>
> Please have a look at this patch:
>
>
https://lore.kernel.org/linux-kernel-mentees/20200912094826.150170-1-ayush@disroot.org/
>
> The author states:
>
> This issue was discovered through a thorough analysis of checkpatch.pl
> errors and warnings of type GIT_COMMIT_ID on commits between v5.7 and
> v5.8.
>
> Before applying this patch, checkpatch.pl reported 342 errors of type
> GIT_COMMIT_ID. After applying patch, errors reduced to 284.
>
>
> If your scripts work, you should be able to confirm the statement.
>
> The tasks are:
>
> 1. Run your scripts and create a full statistics of all error types with
> their according count for v5.7..v5.8.
>
> 2. Apply the patch with git am.
>
> 3. Run your scripts again and create a new statistics.
>
> 4. Compare before and after
>
> 5. Make all results available on your github repository.
>

Hi,
I think I got my email client working now.

The last time checkpatch.pl took awfully long to run on so many commits. :(
So, it might take a while but I will report my findings on this as soon
it is done processing.

Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 2114 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-12 12:08       ` Dwaipayan Ray
@ 2020-09-12 12:21         ` Lukas Bulwahn
  2020-09-13  8:16           ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-12 12:21 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]



On Sat, 12 Sep 2020, Dwaipayan Ray wrote:

> 
> > Hmm, your email client still seems to be broken :( If you answer to my
> > email, it should use ">" not tabs. Maybe you can fix that.>
> > I looked at your scripts, I did not run them. They look as if they would
> > do the job you claim they do. They are more complicated than needed, but
> > it was not the task to find a simple solution. So, let us try them.
> >
> > Please have a look at this patch:
> >
> > https://lore.kernel.org/linux-kernel-mentees/20200912094826.150170-1-ayush@disroot.org/
> >
> > The author states:
> >
> > This issue was discovered through a thorough analysis of checkpatch.pl
> > errors and warnings of type GIT_COMMIT_ID on commits between v5.7 and
> > v5.8.
> >
> > Before applying this patch, checkpatch.pl reported 342 errors of type
> > GIT_COMMIT_ID. After applying patch, errors reduced to 284.
> >
> >
> > If your scripts work, you should be able to confirm the statement.
> >
> > The tasks are:
> >
> > 1. Run your scripts and create a full statistics of all error types with
> > their according count for v5.7..v5.8.
> >
> > 2. Apply the patch with git am.
> >
> > 3. Run your scripts again and create a new statistics.
> >
> > 4. Compare before and after
> >
> > 5. Make all results available on your github repository.
> >
> 
> Hi,
> I think I got my email client working now.
> 

Looks better.

> The last time checkpatch.pl took awfully long to run on so many commits. :( 
> So, it might take a while but I will report my findings on this as soon 
> it is done processing.
>

I guess a bit computing power is required, SSDs, parallelization and 
multi-core systems help :)

Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-12 12:21         ` Lukas Bulwahn
@ 2020-09-13  8:16           ` Dwaipayan Ray
  2020-09-13 11:05             ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-13  8:16 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 2809 bytes --]

On Sat, Sep 12, 2020 at 5:51 PM Lukas Bulwahn <lukas.bulwahn@gmail.com>
wrote:
>
>
>
> On Sat, 12 Sep 2020, Dwaipayan Ray wrote:
>
> >
> > > Hmm, your email client still seems to be broken :( If you answer to my
> > > email, it should use ">" not tabs. Maybe you can fix that.>
> > > I looked at your scripts, I did not run them. They look as if they
would
> > > do the job you claim they do. They are more complicated than needed,
but
> > > it was not the task to find a simple solution. So, let us try them.
> > >
> > > Please have a look at this patch:
> > >
> > >
https://lore.kernel.org/linux-kernel-mentees/20200912094826.150170-1-ayush@disroot.org/
> > >
> > > The author states:
> > >
> > > This issue was discovered through a thorough analysis of checkpatch.pl
> > > errors and warnings of type GIT_COMMIT_ID on commits between v5.7 and
> > > v5.8.
> > >
> > > Before applying this patch, checkpatch.pl reported 342 errors of type
> > > GIT_COMMIT_ID. After applying patch, errors reduced to 284.
> > >
> > >
> > > If your scripts work, you should be able to confirm the statement.
> > >
> > > The tasks are:
> > >
> > > 1. Run your scripts and create a full statistics of all error types
with
> > > their according count for v5.7..v5.8.
> > >
> > > 2. Apply the patch with git am.
> > >
> > > 3. Run your scripts again and create a new statistics.
> > >
> > > 4. Compare before and after
> > >
> > > 5. Make all results available on your github repository.
> > >

> > The last time checkpatch.pl took awfully long to run on so many
commits. :(
> > So, it might take a while but I will report my findings on this as soon
> > it is done processing.
> >
>
> I guess a bit computing power is required, SSDs, parallelization and
> multi-core systems help :)
>

Hi,
So i managed to add parallelization in my script and the generation was a
lot faster
( brought down from >3 hours to <40 mins ).
The script is at:
https://github.com/raydwaipayan/lkm-task-1/blob/master/run_checkpatch.pl

As you told, I applied the patch and compared the statistics. In my case
however the statistics
vary. Before patching total no. of errors of type GIT_COMMIT_ID was 270,
and after patching,
the number went down to 251. This is in constrast to the author's finding
but the number
reduces, so the patch works apparently.

I am posting the diff of statistics before and after the patch:

99c99
< COMMIT_LOG_LONG_LINE: 1059
---
> COMMIT_LOG_LONG_LINE: 1057
126c126
< GIT_COMMIT_ID: 270
---
> GIT_COMMIT_ID: 251
142,143c142,143
< Warnings generated: 25661
< Errors generated: 4768
---
> Warnings generated: 25659
> Errors generated: 4749

As you can see, the warnings of type COMMIT_LOG_LONG_LINE decreased too.

The entire output and scripts can be found at:
https://github.com/raydwaipayan/lkm-task-1


Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 4576 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-13  8:16           ` Dwaipayan Ray
@ 2020-09-13 11:05             ` Lukas Bulwahn
       [not found]               ` <CABJPP5BmRcC+OTSjuX_QrYononVq__DkhjGOgiKrP147MAXK+g@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-13 11:05 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

> 
> As you told, I applied the patch and compared the statistics. In my case however the statistics 
> vary. Before patching total no. of errors of type GIT_COMMIT_ID was 270, and after patching,
> the number went down to 251. This is in constrast to the author's finding but the number 
> reduces, so the patch works apparently.
> 
> I am posting the diff of statistics before and after the patch:
> 
> 99c99
> < COMMIT_LOG_LONG_LINE: 1059
> ---
> > COMMIT_LOG_LONG_LINE: 1057
> 126c126
> < GIT_COMMIT_ID: 270
> ---
> > GIT_COMMIT_ID: 251
> 142,143c142,143
> < Warnings generated: 25661
> < Errors generated: 4768
> ---
> > Warnings generated: 25659
> > Errors generated: 4749
> 
> As you can see, the warnings of type COMMIT_LOG_LONG_LINE decreased too.
>

Can you share the errors that have changed after the patch was applied?

- Which two cases of COMMIT_LOG_LONG_LINE have disappeared?
- Which cases of GIT_COMMIT_ID have disappeared?
- Which cases of GIT_COMMIT_ID are new?
- Which cases of GIT_COMMIT_ID have changed?


Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
       [not found]                 ` <alpine.DEB.2.21.2009132015570.6806@felia>
@ 2020-09-13 18:23                   ` Dwaipayan Ray
  0 siblings, 0 replies; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-13 18:23 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 1563 bytes --]

I missed to CC linux kernel mentees list in my previous mail. The contents
follow:

Posting the abbrev. commit hashes for the changed cases:

> Can you share the errors that have changed after the patch was applied?
>
> - Which two cases of COMMIT_LOG_LONG_LINE have disappeared?

The following commits have disappeared which earlier had a
COMMIT_LOG_LONG_LINE warning:
4714d13791f8
6536993371fa

> - Which cases of GIT_COMMIT_ID have disappeared?

The following commits which threw GIT_COMMIT_ID have disappeared:
0b987032f8b5
4557ac6b344b
ad0f75e5f57c
c3dbe541ef77
da785a87787c
e575fb9e76c8
0b22c25e1b81
ba8c90c61847
0ae705f3d2b2
490705888107
0828137e8f16
e9e81b634303
e9e4ef9116b1
68cda40d9f3c
1ca0c2f61211
6f8b12d661d0
97a37c919f62

> - Which cases of GIT_COMMIT_ID are new?

Only one case is new:
ebf57440ec59

> - Which cases of GIT_COMMIT_ID have changed?

As there can be multiple GIT_COMMIT_ID errors per commit hash processed, I
am
giving the change in occurrence per commit hash:

e9e4ef9116b1: 1-> 0
490705888107: 1-> 0
ba8c90c61847: 1-> 0
97a37c919f62: 1-> 0
e9e81b634303: 1-> 0
0828137e8f16: 1-> 0
5fdbe136ae19: 2-> 3
0b22c25e1b81: 1-> 0
e575fb9e76c8: 1-> 0
c3dbe541ef77: 1-> 0
ad0f75e5f57c: 1-> 0
e0a851fe6b9b: 2-> 1
0ae705f3d2b2: 1-> 0
064c73afe738: 3-> 2
4557ac6b344b: 1-> 0
6f8b12d661d0: 1-> 0
da785a87787c: 1-> 0
68cda40d9f3c: 1-> 0
1ca0c2f61211: 1-> 0
bcf41dc480b1: 4-> 2
0b987032f8b5: 1-> 0
ebf57440ec59: 0-> 1

The aggregation of the occurences will tell you that there has been a net
decrease of 19
GIT_COMMIT_ID errors.

Thanks,
Dwaipayan

[-- Attachment #1.2: Type: text/html, Size: 2041 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
       [not found]                 ` <alpine.DEB.2.21.2009132010300.6806@felia>
@ 2020-09-13 18:39                   ` Dwaipayan Ray
  2020-09-14  5:17                     ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-13 18:39 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 847 bytes --]

> You should also try to find a case where checkpatch.pl can be improved:
>
> - It is best if you look at the checkpatch.pl error/warning reports and
> check if you can find a class of typical false positives in the reported
> data. Then, think about how to make the pattern in checkpatch and propose
> a patch to checkpatch.pl on this mailing list; with the evaluation data
> backing that your patch really improves the situation. We will then test
> and check your patch as well.
>
> - If you really cannot find a typical false positive, maybe you can take
> the task from Ayush to fix checkpatch with git ranges?
>
Hi,
I will try to find and evaluate the possiblity of such a case, and report
my finding
as soon as I find something substantial.

I may also require some advice from you on the patch submission process.

Thanking you,
Dwaipayan

[-- Attachment #1.2: Type: text/html, Size: 1174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-13 18:39                   ` Dwaipayan Ray
@ 2020-09-14  5:17                     ` Lukas Bulwahn
  2020-09-14 12:31                       ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-14  5:17 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]



On Mon, 14 Sep 2020, Dwaipayan Ray wrote:

> 
> 
> > You should also try to find a case where checkpatch.pl can be improved:
> >
> > - It is best if you look at the checkpatch.pl error/warning reports and
> > check if you can find a class of typical false positives in the reported
> > data. Then, think about how to make the pattern in checkpatch and propose
> > a patch to checkpatch.pl on this mailing list; with the evaluation data
> > backing that your patch really improves the situation. We will then test
> > and check your patch as well.
> >
> > - If you really cannot find a typical false positive, maybe you can take
> > the task from Ayush to fix checkpatch with git ranges?
> >
> Hi,I will try to find and evaluate the possiblity of such a case, and report my finding 
> as soon as I find something substantial.
> 
> I may also require some advice from you on the patch submission process.
> 

That is what a mentor is there for, but try hard to understand the 
warnings yourself first. This is a mentorship, not a tutorial.

Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-14  5:17                     ` Lukas Bulwahn
@ 2020-09-14 12:31                       ` Dwaipayan Ray
  2020-09-14 13:49                         ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-14 12:31 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 3766 bytes --]

On Mon, Sep 14, 2020 at 10:47 AM Lukas Bulwahn <lukas.bulwahn@gmail.com>
wrote:
>
>
>
> On Mon, 14 Sep 2020, Dwaipayan Ray wrote:
>
> >
> >
> > > You should also try to find a case where checkpatch.pl can be
improved:
> > >
> > > - It is best if you look at the checkpatch.pl error/warning reports
and
> > > check if you can find a class of typical false positives in the
reported
> > > data. Then, think about how to make the pattern in checkpatch and
propose
> > > a patch to checkpatch.pl on this mailing list; with the evaluation
data
> > > backing that your patch really improves the situation. We will then
test
> > > and check your patch as well.
> > >
> > > - If you really cannot find a typical false positive, maybe you can
take
> > > the task from Ayush to fix checkpatch with git ranges?
> > >
> > Hi,I will try to find and evaluate the possibiltiy of such a case, and
report my finding
> > as soon as I find something substantial.
> >
> > I may also require some advice from you on the patch submission process.
> >
>
> That is what a mentor is there for, but try hard to understand the
> warnings yourself first. This is a mentorship, not a tutorial.
>
> Lukas

Hi,
I looked into the checkpatch.pl 's output I collected. And I have found
some possible candidates
but I would like your opinion on them.

The first one is double warnings for the same kind of coding style error.
If an SPDX tag is in the following format: "// SPDX-License-Identifier:
GPL-2.0",
then checkpatch.pl generates two warnings:

1) Improper SPDX comment style for ... , please use '/*' instead.
2) Missing or malformed SPDX-License-Identifier tag in line 1.

Example: commit 726721a51838
Warnings generated:

WARNING:SPDX_LICENSE_TAG: Improper SPDX comment style for
'kernel/trace/trace_synth.h', please use '/*' instead
#4041: FILE: kernel/trace/trace_synth.h:1:
+// SPDX-License-Identifier: GPL-2.0

WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag
in line 1
#4041: FILE: kernel/trace/trace_synth.h:1:
+// SPDX-License-Identifier: GPL-2.0

In my opinion, only one warning should suffice if a '//' comment style is
used in spdx tag.
But I would like to know your view on it.

----------------------------------------------------------------------------------------------
The second candidate is related to the following warning:

WARNING:SPDX_LICENSE_TAG: DT binding documents should be licensed
(GPL-2.0-only OR BSD-2-Clause)
#110: FILE:
Documentation/devicetree/bindings/i2c/snps,designware-i2c.yaml:1:
+# SPDX-License-Identifier: GPL-2.0-only

In checkpath.pl line 3209:
 $fixed[$fixlinenr] =~ s/SPDX-License-Identifier:
.*/SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)/;

The desired fix is "(GPL-2.0-only OR BSD-2-Clause)" . Shouldn't the
identifier be "GPL-2.0-only"
or "BSD-2-Clause", that is either of them? Or was the former the decided
one?

----------------------------------------------------------------------------------------------
The third candidate is related to the warning:

WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch
author ...

I found several such commits in which the author had used different mail
addresses in the
signed-off -by section, due to which this warning is generated.

An example is Commit dc5bdb68b5b3 .
Git log show:
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
....,.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

This is infact a common scenario. I easily found another commit
207324a321a8.
Git log shows:
Author: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
...
Signed-off-by: Minas Harutyunyan <hminas@synopsys.com>

This warning could be avoided or at least handled better.


I would like to know if any of them can be worked on.

Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 5594 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-14 12:31                       ` Dwaipayan Ray
@ 2020-09-14 13:49                         ` Lukas Bulwahn
  2020-09-14 15:39                           ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-14 13:49 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 4697 bytes --]



On Mon, 14 Sep 2020, Dwaipayan Ray wrote:

> 
> 
> 
> On Mon, Sep 14, 2020 at 10:47 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> >
> >
> >
> > On Mon, 14 Sep 2020, Dwaipayan Ray wrote:
> >
> > >
> > >
> > > > You should also try to find a case where checkpatch.pl can be improved:
> > > >
> > > > - It is best if you look at the checkpatch.pl error/warning reports and
> > > > check if you can find a class of typical false positives in the reported
> > > > data. Then, think about how to make the pattern in checkpatch and propose
> > > > a patch to checkpatch.pl on this mailing list; with the evaluation data
> > > > backing that your patch really improves the situation. We will then test
> > > > and check your patch as well.
> > > >
> > > > - If you really cannot find a typical false positive, maybe you can take
> > > > the task from Ayush to fix checkpatch with git ranges?
> > > >
> > > Hi,I will try to find and evaluate the possibiltiy of such a case, and report my finding
> > > as soon as I find something substantial.
> > >
> > > I may also require some advice from you on the patch submission process.
> > >
> >
> > That is what a mentor is there for, but try hard to understand the
> > warnings yourself first. This is a mentorship, not a tutorial.
> >
> > Lukas
> Hi,
> I looked into the checkpatch.pl 's output I collected. And I have found some possible candidates
> but I would like your opinion on them.
> 
> The first one is double warnings for the same kind of coding style error.
> If an SPDX tag is in the following format: "// SPDX-License-Identifier: GPL-2.0",
> then checkpatch.pl generates two warnings:
> 
> 1) Improper SPDX comment style for ... , please use '/*' instead.
> 2) Missing or malformed SPDX-License-Identifier tag in line 1.
> 
> Example: commit 726721a51838
> Warnings generated:
> 
> WARNING:SPDX_LICENSE_TAG: Improper SPDX comment style for 'kernel/trace/trace_synth.h', please use '/*' instead
> #4041: FILE: kernel/trace/trace_synth.h:1:
> +// SPDX-License-Identifier: GPL-2.0
> 
> WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
> #4041: FILE: kernel/trace/trace_synth.h:1:
> +// SPDX-License-Identifier: GPL-2.0
> 
> In my opinion, only one warning should suffice if a '//' comment style is used in spdx tag. 
> But I would like to know your view on it.
> 
> ----------------------------------------------------------------------------------------------
> The second candidate is related to the following warning:
>

I do not think that this is a good candidate.
 
> WARNING:SPDX_LICENSE_TAG: DT binding documents should be licensed (GPL-2.0-only OR BSD-2-Clause)
> #110: FILE: Documentation/devicetree/bindings/i2c/snps,designware-i2c.yaml:1:
> +# SPDX-License-Identifier: GPL-2.0-only
> 
> In checkpath.pl line 3209:
>  $fixed[$fixlinenr] =~ s/SPDX-License-Identifier: .*/SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)/;
> 
> The desired fix is "(GPL-2.0-only OR BSD-2-Clause)" . Shouldn't the identifier be "GPL-2.0-only" 
> or "BSD-2-Clause", that is either of them? Or was the former the decided one?
>

Hmm, I do not think so. It is supposed to be dual-licensed with 
GPL-2.0-only and BSD-2-Clause. In SPDX, this is written exactly as 
described in the provided warning of checkpatch.pl.
  
> ----------------------------------------------------------------------------------------------
> The third candidate is related to the warning:
> 
> WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author ...
> 
> I found several such commits in which the author had used different mail addresses in the 
> signed-off -by section, due to which this warning is generated.
> 
> An example is Commit dc5bdb68b5b3 .
> Git log show:
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> ....,.
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> This is infact a common scenario. I easily found another commit 207324a321a8.
> Git log shows:
> Author: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
> ...
> Signed-off-by: Minas Harutyunyan <hminas@synopsys.com>
> 
> This warning could be avoided or at least handled better.
> 
> 
> I would like to know if any of them can be worked on.
> 

This last one might be good to look into.

But what is your specific solution you have in mind?

There is a file .mailmap in the repository that allows some kind of 
mapping. Maybe that is helpful.

I suggest that you describe your proposed change in a clear way.
Then, we can discuss if that change is reasonable or not.


Maybe you can continue to look at further potential candidates, just to 
see if there is another type for improvement.

Best regards,

Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-14 13:49                         ` Lukas Bulwahn
@ 2020-09-14 15:39                           ` Dwaipayan Ray
  2020-09-14 18:32                             ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-14 15:39 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 2217 bytes --]

> > The third candidate is related to the warning:
> >
> > WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
patch author ...
> >
> > I found several such commits in which the author had used different
mail addresses in the
> > signed-off -by section, due to which this warning is generated.
> >
> > An example is Commit dc5bdb68b5b3 .
> > Git log show:
> > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ....,.
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >
> > This is infact a common scenario. I easily found another commit
207324a321a8.
> > Git log shows:
> > Author: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
> > ...
> > Signed-off-by: Minas Harutyunyan <hminas@synopsys.com>
> >
> > This warning could be avoided or at least handled better.
> >
> >
> > I would like to know if any of them can be worked on.
> >
>
> This last one might be good to look into.
>
> But what is your specific solution you have in mind?
>
> There is a file .mailmap in the repository that allows some kind of
> mapping. Maybe that is helpful.
>
> I suggest that you describe your proposed change in a clear way.
> Then, we can discuss if that change is reasonable or not.

I checked the .mailmap file and it doesn't have all the email addresses. So
it might not
be effective.

checkpatch.pl seems to compare only the email addresses to determine
whether the author
has signed off.

checkpatch, line 2673:
if ($author ne '') {
                if (same_email_addresses($1, $author)) {
                    $authorsignoff = 1;
                }
            }

This causes numerous false positives if the author uses a different mail,
which seems to be quite frequent.

A possible solution would be to compare the names, i.e. $1 and $author, and
keep its result stored
in some variable authorsignoff_byname. If the author's mail is not found,
but his name matches,
there can be a better warning message on the lines of:
"Possible missing signed-off-by line by nominal author. Author's email
$email does not match signed off email. $email2"

Or, if the warning message cannot be changed, there could be verbose
information regarding the mismatch of email.
Is this feasible?

Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 3064 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-14 15:39                           ` Dwaipayan Ray
@ 2020-09-14 18:32                             ` Lukas Bulwahn
  2020-09-15 13:04                               ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-14 18:32 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 3949 bytes --]



On Mon, 14 Sep 2020, Dwaipayan Ray wrote:

> 
> > > The third candidate is related to the warning:
> > >
> > > WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author ...
> > >
> > > I found several such commits in which the author had used different mail addresses in the
> > > signed-off -by section, due to which this warning is generated.
> > >
> > > An example is Commit dc5bdb68b5b3 .
> > > Git log show:
> > > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ....,.
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > >
> > > This is infact a common scenario. I easily found another commit 207324a321a8.
> > > Git log shows:
> > > Author: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
> > > ...
> > > Signed-off-by: Minas Harutyunyan <hminas@synopsys.com>
> > >
> > > This warning could be avoided or at least handled better.
> > >
> > >
> > > I would like to know if any of them can be worked on.
> > >
> >
> > This last one might be good to look into.
> >
> > But what is your specific solution you have in mind?
> >
> > There is a file .mailmap in the repository that allows some kind of
> > mapping. Maybe that is helpful.
> >
> > I suggest that you describe your proposed change in a clear way.
> > Then, we can discuss if that change is reasonable or not.
> 
> I checked the .mailmap file and it doesn't have all the email addresses. So it might not
> be effective.
> 
> checkpatch.pl seems to compare only the email addresses to determine whether the author
> has signed off.
> 
> checkpatch, line 2673:
> if ($author ne '') {
>                 if (same_email_addresses($1, $author)) {
>                     $authorsignoff = 1;
>                 }
>             }
> 
> This causes numerous false positives if the author uses a different mail, which seems to be quite frequent.
>

But using different mail addresses should be complained about; unless we 
really have a record that these two email addresses point to the same 
person.
 
> A possible solution would be to compare the names, i.e. $1 and $author, and keep its result stored
> in some variable authorsignoff_byname. If the author's mail is not found, but his name matches, 
> there can be a better warning message on the lines of:
> "Possible missing signed-off-by line by nominal author. Author's email $email does not match signed off email.
> $email2"
> 
> Or, if the warning message cannot be changed, there could be verbose information regarding the mismatch of email.
> Is this feasible? 
> 

(Hmm, your email client is broken... lines so long...)

Sure, that is possible if the check is reasonable. But I do not understand 
what we are doing here.

It should complain when the author email and sign-off-by email does not 
match, right?

First explain:

- which situations does checkpatch.pl currently complain about?

- for which situation do you want to have more refined checks?

- why does that actually improve checkpatch.pl?

Checkpatch.pl should complain when developers do something wrong.

To really understand what is wrong behavior and what is not, you probably 
need to create some statistics on who authors and signs off with which 
names and email addresses.

Maybe we can collect all the previous instances where we know that 
frequent developers, e.g., with more >100 commits, use multiple email 
addresses interchangeably. If we add that list to the repository and
let others know how to maintain it, checkpatch.pl can make use of that.

With that extended check, we can warn newbies that just have a broken git 
and sign-off setup and still reduce the false positives for the 
experienced developers that might just have good reasons to have the 
setup they have, i.e., they have this setup for many years and want to 
keep it that way. 

You can try to work that through or look for another case of potential 
checkpatch.pl improvement in your evaluation data.

Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-14 18:32                             ` Lukas Bulwahn
@ 2020-09-15 13:04                               ` Dwaipayan Ray
  2020-09-16  7:01                                 ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-15 13:04 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 2495 bytes --]

Hi,
Sorry for the late reply.

> First explain:
>
> - which situations does checkpatch.pl currently complain about?
>
Currently, checkpatch complains whenever the author mail is not
found in any signed-off-by block in the patch.


> - for which situation do you want to have more refined checks?
>
The situations where author might have signed off using a different
email. I believe multiple mail addresses isn't uncommon.


> - why does that actually improve checkpatch.pl?

It shall significantly reduce the number of author_sign_off warnings. I have
not yet created a statistical count, but looking at the data I found several
such instances. This is certainly a false positive due to a condition which
checkpatch was not programmed to handle.
The avoidance of warnings on such known cases might also save the
committer and the maintainers some time.


> Checkpatch.pl should complain when developers do something wrong.
>
> To really understand what is wrong behavior and what is not, you probably
> need to create some statistics on who authors and signs off with which
> names and email addresses.
>
> Maybe we can collect all the previous instances where we know that
> frequent developers, e.g., with more >100 commits, use multiple email
> addresses interchangeably. If we add that list to the repository and
> let others know how to maintain it, checkpatch.pl can make use of that.
>
> With that extended check, we can warn newbies that just have a broken git
> and sign-off setup and still reduce the false positives for the
> experienced developers that might just have good reasons to have the
> setup they have, i.e., they have this setup for many years and want to
> keep it that way.

This seems like a great idea. I can load the mailmap data into checkpatch
and form some kind of map between names and mail addresses.
If two mail addresses belong to same user then the warning can be ignored
totally.

I know that the kernel community is strict about such changes. So will this
be
acceptible? I can generate a proof of concept patch with the data at hand
if it seems like a good thing to work on.


> You can try to work that through or look for another case of potential
> checkpatch.pl improvement in your evaluation data.

I haven't found anything substantial yet. I will continue looking.
Earlier, you had told if I would like to take the task from Ayush to
fix checkpatch with git ranges. I would like to know about the task
and take it up if possible.

Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 4841 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-15 13:04                               ` Dwaipayan Ray
@ 2020-09-16  7:01                                 ` Lukas Bulwahn
  2020-09-17 13:09                                   ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-16  7:01 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 4617 bytes --]



On Tue, 15 Sep 2020, Dwaipayan Ray wrote:

> Hi,
> Sorry for the late reply.
> 
> > First explain:
> >> - which situations does checkpatch.pl currently complain about?
> >
> Currently, checkpatch complains whenever the author mail is not
> found in any signed-off-by block in the patch.
> 

Generally, this makes sense, right?

The author shall also sign-off the patch.

Now, the question is how to determine identities?

We have two bits of information, the name and the email.

So, there are three options to define 'identity':

A. the name and the email needs to match.
B. at least, the name needs to match
C. at least, the email needs to match

I believe A. is perfect; B and C deserve at least a note from 
checkpatch.pl.

> 
> > - for which situation do you want to have more refined checks?
> >
> The situations where author might have signed off using a different
> email. I believe multiple mail addresses isn't uncommon.
>

Yes, but who is doing it on purpose, and who is doing it by mistake.

We need numbers. How often is A met? How often is B met? How often is C 
met?

We can improve the identity check, but it probably needs more thought than 
the email does not need to match, but the name does. That will probably 
just show more cases where the name does not match, but the email does.

> 
> > - why does that actually improve checkpatch.pl?
> 
> It shall significantly reduce the number of author_sign_off warnings. I have
> not yet created a statistical count, but looking at the data I found several
> such instances. This is certainly a false positive due to a condition which
> checkpatch was not programmed to handle.
> The avoidance of warnings on such known cases might also save the 
> committer and the maintainers some time.
>

Hmm, not really a good rationale yet. How about deleting checkpatch.pl 
completely? Then it cannot complain either. No false positives, no problem 
:)

Maybe it was not programmed to handle that on purpose, e.g., it was a 
clear design decision to check if at least the email is met.

Check the git history of checkpatch.pl with git blame and you will know 
more. You need to understand the history of this check and warning.

> 
> > Checkpatch.pl should complain when developers do something wrong.
> >
> > To really understand what is wrong behavior and what is not, you probably
> > need to create some statistics on who authors and signs off with which
> > names and email addresses.
> >
> > Maybe we can collect all the previous instances where we know that
> > frequent developers, e.g., with more >100 commits, use multiple email
> > addresses interchangeably. If we add that list to the repository and
> > let others know how to maintain it, checkpatch.pl can make use of that.
> >
> > With that extended check, we can warn newbies that just have a broken git
> > and sign-off setup and still reduce the false positives for the
> > experienced developers that might just have good reasons to have the
> > setup they have, i.e., they have this setup for many years and want to
> > keep it that way.
> 
> This seems like a great idea. I can load the mailmap data into checkpatch
> and form some kind of map between names and mail addresses. 
> If two mail addresses belong to same user then the warning can be ignored
> totally.
>

Please check with an evaluation if that makes a big difference, though.
We have other sources to determine identities as well.
 
> I know that the kernel community is strict about such changes. So will this be
> acceptible? I can generate a proof of concept patch with the data at hand
> if it seems like a good thing to work on.
>

If you do your homework, proper research what was decided in the past, 
proper evaluations what the difference of your change is, proper 
implementation, proper arguments for your change, it has high chances of 
being accepted. Many agree that checkpatch.pl can be useful, but many 
agree that it needs some improvements.

It is certainly not a quick improvement, and needs some thought to make it 
really better.

> 
> > You can try to work that through or look for another case of potential
> > checkpatch.pl improvement in your evaluation data.
> 
> I haven't found anything substantial yet. I will continue looking. 
> Earlier, you had told if I would like to take the task from Ayush to
> fix checkpatch with git ranges. I would like to know about the task
> and take it up if possible.
>

Please reach out to Ayush to understand the encountered issue and CC: this 
mailing list.

I know there are more issues that checkpatch.pl can be improved with, keep 
looking :)


Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-16  7:01                                 ` Lukas Bulwahn
@ 2020-09-17 13:09                                   ` Dwaipayan Ray
  2020-09-17 13:41                                     ` Dwaipayan Ray
  2020-09-17 14:13                                     ` Lukas Bulwahn
  0 siblings, 2 replies; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-17 13:09 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees


[-- Attachment #1.1: Type: text/plain, Size: 2618 bytes --]

> If you do your homework, proper research what was decided in the past,
> proper evaluations what the difference of your change is, proper
> implementation, proper arguments for your change, it has high chances of
> being accepted. Many agree that checkpatch.pl can be useful, but many
> agree that it needs some improvements.
>
> It is certainly not a quick improvement, and needs some thought to make it
> really better.
>
> >
> > > You can try to work that through or look for another case of potential
> > > checkpatch.pl improvement in your evaluation data.
>

Hi,
I would like to report you another finding about AUTHOR_SIGN_OFF.

I found that some commits with the same author mail and sign off mail
also threw off this warning when there were non-ascii characters in
the name.
Let me give some examples:

Commit 9d9cc58aff46 :

Author: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
.....
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>

As you can see, there should have been no error message, but checkpatch
gave this:

WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
 patch author ''


Also commit 440d7a6f7390 and commit 424c85e1ffea by the same person
threw the error again. Evidently non ascii characters cause the parsing to
fail.
Both had the same chinese characters in the name.


I also found another person with the same issue.
Commit b03628b73564
Author: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
...
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Checkpatch gave the warning:
WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
patch author '' . The string here is again empty.

Another same issue
Commit f0a087a533b3
Author: Vesa Jääskeläinen <vesa.jaaskelainen@vaisala.com>
...
Signed-off-by: Vesa Jääskeläinen <vesa.jaaskelainen@vaisala.com>

So you see in all these cases non ascii characters are present. I looked
into checkpatch.pl .

line 2662:
        if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
           $author = $1;
            $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
            $author =~ s/"//g;
            $author = reformat_email($author);
        }

When i looked into $line, it gave below:
From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie
=29?= <zhouyanjie@wanyeetech.com>

And at the end of this block, $author was equal to ''.
There seems to be a parsing problem there.

Does this seem like a proper fixable candidate?

Thanks,
Dwaipayan.

[-- Attachment #1.2: Type: text/html, Size: 4387 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-17 13:09                                   ` Dwaipayan Ray
@ 2020-09-17 13:41                                     ` Dwaipayan Ray
  2020-09-17 14:18                                       ` Lukas Bulwahn
  2020-09-17 14:13                                     ` Lukas Bulwahn
  1 sibling, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-17 13:41 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

> So you see in all these cases non ascii characters are present. I looked
> into checkpatch.pl .
>
> line 2662:
>         if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>            $author = $1;
>             $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>             $author =~ s/"//g;
>             $author = reformat_email($author);
>         }
>
> When i looked into $line, it gave below:
> From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie
> =29?=
> <zhouyanjie@wanyeetech.com>
>
> And at the end of this block, $author was equal to ''.
> There seems to be a parsing problem there.

Also I looked into this output more, and I looked into the git
history. For the FROM:
parsing part, the author wrote this:

"Non-ASCII quoted printable encoding in From: headers and (lack of) double
quotes are handled.  Split From: headers are not fully handled: only the
the first part is compared."

So the problem instead seems to be the Split From: headers. In all the
cases that I
mentioned, due to formatting issues or something, the email address was
pushed off into the next line, due to which parsing fails.

A solution would be to write logic for handling these Split From:
headers, which the
author left off.

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-17 13:09                                   ` Dwaipayan Ray
  2020-09-17 13:41                                     ` Dwaipayan Ray
@ 2020-09-17 14:13                                     ` Lukas Bulwahn
  1 sibling, 0 replies; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-17 14:13 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees

[-- Attachment #1: Type: text/plain, Size: 3677 bytes --]



On Thu, 17 Sep 2020, Dwaipayan Ray wrote:

> 
> 
> > If you do your homework, proper research what was decided in the past,
> > proper evaluations what the difference of your change is, proper
> > implementation, proper arguments for your change, it has high chances of
> > being accepted. Many agree that checkpatch.pl can be useful, but many
> > agree that it needs some improvements.
> >
> > It is certainly not a quick improvement, and needs some thought to make it
> > really better.
> >
> > >
> > > > You can try to work that through or look for another case of potential
> > > > checkpatch.pl improvement in your evaluation data.
> > 
> 
> Hi,
> I would like to report you another finding about AUTHOR_SIGN_OFF.
> 
> I found that some commits with the same author mail and sign off mail 
> also threw off this warning when there were non-ascii characters in
> the name.
> Let me give some examples:
> 
> Commit 9d9cc58aff46 :
> 
> Author: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
> .....
> Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
> 
> As you can see, there should have been no error message, but checkpatch
> gave this:
> 
> WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal
>  patch author ''
> 
> 
> Also commit 440d7a6f7390 and commit 424c85e1ffea by the same person 
> threw the error again. Evidently non ascii characters cause the parsing to fail.
> Both had the same chinese characters in the name.
> 
> 
> I also found another person with the same issue.
> Commit b03628b73564
> Author: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
> ...
> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
> 
> Checkpatch gave the warning:
> WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal 
> patch author '' . The string here is again empty.
> 
> Another same issue
> Commit f0a087a533b3
> Author: Vesa Jääskeläinen <vesa.jaaskelainen@vaisala.com>
> ...
> Signed-off-by: Vesa Jääskeläinen <vesa.jaaskelainen@vaisala.com>
>

This is a very good investigation and I think it deserves to be handled 
better in checkpatch.pl if we can do this 'easily'. It is a bit involved, 
but I think it feasible because you do not need to write the complete 
encoding stuff.
 
> So you see in all these cases non ascii characters are present. I looked 
> into checkpatch.pl .
> 
> line 2662:
>         if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
>            $author = $1; 
>             $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
>             $author =~ s/"//g;
>             $author = reformat_email($author); 
>         }
>

Just a quick comment, referring to a line really does not work if you are 
not providing the sha of the git change you are looking at.

We might just be looking at two different versions, right? ...and hence 
the lines differ.

> When i looked into $line, it gave below:
> From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie
> =29?= <zhouyanjie@wanyeetech.com>
>
> And at the end of this block, $author was equal to ''.
> There seems to be a parsing problem there.
>

Agree. Somehow the parsing seems to be just not fit for the job...
 
> Does this seem like a proper fixable candidate?
>

Yes, please move ahead and see if you find a suitable way to handle names 
with these different encodings.

It is very good that you kept searching in your investigation data for 
problems that deserve a fix and are feasible to fix. If you can 
actually fix it in an acceptable way, That will get you a job for the
future; there is certainly some work ahead.


Lukas

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-17 13:41                                     ` Dwaipayan Ray
@ 2020-09-17 14:18                                       ` Lukas Bulwahn
  2020-09-17 14:43                                         ` Dwaipayan Ray
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-17 14:18 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Thu, 17 Sep 2020, Dwaipayan Ray wrote:

> > So you see in all these cases non ascii characters are present. I looked
> > into checkpatch.pl .
> >
> > line 2662:
> >         if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> >            $author = $1;
> >             $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> >             $author =~ s/"//g;
> >             $author = reformat_email($author);
> >         }
> >
> > When i looked into $line, it gave below:
> > From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie
> > =29?=
> > <zhouyanjie@wanyeetech.com>
> >
> > And at the end of this block, $author was equal to ''.
> > There seems to be a parsing problem there.
> 
> Also I looked into this output more, and I looked into the git
> history. For the FROM:
> parsing part, the author wrote this:
> 
> "Non-ASCII quoted printable encoding in From: headers and (lack of) double
> quotes are handled.  Split From: headers are not fully handled: only the
> the first part is compared."
> 
> So the problem instead seems to be the Split From: headers. In all the
> cases that I
> mentioned, due to formatting issues or something, the email address was
> pushed off into the next line, due to which parsing fails.
> 
> A solution would be to write logic for handling these Split From:
> headers, which the
> author left off.
>

Good investigation. Can you please share the commit of the commit message 
you refer to in the future?

I found it, commit cd2614967d8b ("checkpatch: warn if missing author 
Signed-off-by"), but to convince the maintainers and authors that you did 
your homework properly, it is good to refer to the commits you looked at.

The kernel documentation tells you how to refer to commits.

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-17 14:18                                       ` Lukas Bulwahn
@ 2020-09-17 14:43                                         ` Dwaipayan Ray
  2020-09-17 15:03                                           ` Lukas Bulwahn
  0 siblings, 1 reply; 21+ messages in thread
From: Dwaipayan Ray @ 2020-09-17 14:43 UTC (permalink / raw)
  To: Lukas Bulwahn; +Cc: linux-kernel-mentees

> Good investigation. Can you please share the commit of the commit message
> you refer to in the future?
>
> I found it, commit cd2614967d8b ("checkpatch: warn if missing author
> Signed-off-by"), but to convince the maintainers and authors that you did
> your homework properly, it is good to refer to the commits you looked at.
>
> The kernel documentation tells you how to refer to commits.
>

Yes sure, I will remember to do that in the future.

For now, I am thinking to fix the split FROM header issue. What I have thought
of is to keep a variable keeping the part data and if the author cannot be
resolved at the present line, then at the next line join the data and parse
the email.
I think the implementation to this will be fairly easy, with one
flag variable to indicate whether the from header has been parsed in the
previous line.

Should i proceed with this?

Thanks,
Dwaipayan.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship
  2020-09-17 14:43                                         ` Dwaipayan Ray
@ 2020-09-17 15:03                                           ` Lukas Bulwahn
  0 siblings, 0 replies; 21+ messages in thread
From: Lukas Bulwahn @ 2020-09-17 15:03 UTC (permalink / raw)
  To: Dwaipayan Ray; +Cc: linux-kernel-mentees



On Thu, 17 Sep 2020, Dwaipayan Ray wrote:

> > Good investigation. Can you please share the commit of the commit message
> > you refer to in the future?
> >
> > I found it, commit cd2614967d8b ("checkpatch: warn if missing author
> > Signed-off-by"), but to convince the maintainers and authors that you did
> > your homework properly, it is good to refer to the commits you looked at.
> >
> > The kernel documentation tells you how to refer to commits.
> >
> 
> Yes sure, I will remember to do that in the future.
> 
> For now, I am thinking to fix the split FROM header issue. What I have thought
> of is to keep a variable keeping the part data and if the author cannot be
> resolved at the present line, then at the next line join the data and parse
> the email.
> I think the implementation to this will be fairly easy, with one
> flag variable to indicate whether the from header has been parsed in the
> previous line.
> 
> Should i proceed with this?
>

Sure. I suggest you just have a look at how checkpatch.pl handles rules 
when content is split over multiple lines, though. I think Ayush copied 
one typical pattern of checking the current line and the next line.

I will trust you that you will find a solution that works...

Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-09-17 15:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABJPP5CSa_QowD-O3_E5ndoQJvuBv=n2x5WG-acwQKE=rt5+Rw@mail.gmail.com>
     [not found] ` <alpine.DEB.2.21.2009110925160.9220@felia>
2020-09-12  9:09   ` [Linux-kernel-mentees] Linux kernel checkpatch.pl mentorship Dwaipayan Ray
2020-09-12 11:03     ` Lukas Bulwahn
2020-09-12 12:08       ` Dwaipayan Ray
2020-09-12 12:21         ` Lukas Bulwahn
2020-09-13  8:16           ` Dwaipayan Ray
2020-09-13 11:05             ` Lukas Bulwahn
     [not found]               ` <CABJPP5BmRcC+OTSjuX_QrYononVq__DkhjGOgiKrP147MAXK+g@mail.gmail.com>
     [not found]                 ` <alpine.DEB.2.21.2009132015570.6806@felia>
2020-09-13 18:23                   ` Dwaipayan Ray
     [not found]                 ` <alpine.DEB.2.21.2009132010300.6806@felia>
2020-09-13 18:39                   ` Dwaipayan Ray
2020-09-14  5:17                     ` Lukas Bulwahn
2020-09-14 12:31                       ` Dwaipayan Ray
2020-09-14 13:49                         ` Lukas Bulwahn
2020-09-14 15:39                           ` Dwaipayan Ray
2020-09-14 18:32                             ` Lukas Bulwahn
2020-09-15 13:04                               ` Dwaipayan Ray
2020-09-16  7:01                                 ` Lukas Bulwahn
2020-09-17 13:09                                   ` Dwaipayan Ray
2020-09-17 13:41                                     ` Dwaipayan Ray
2020-09-17 14:18                                       ` Lukas Bulwahn
2020-09-17 14:43                                         ` Dwaipayan Ray
2020-09-17 15:03                                           ` Lukas Bulwahn
2020-09-17 14:13                                     ` Lukas Bulwahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).