All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aditya <yashsri421@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [Linux-kernel-mentees] Fix for BAD_SIGN_OFF: non-standard signature
Date: Wed, 18 Nov 2020 02:24:32 +0530	[thread overview]
Message-ID: <050ddb53-33ff-83bc-7f91-b7c2874211f6@gmail.com> (raw)
In-Reply-To: <CAKXUXMwsUYgg+oOADM3wrryr2Vob_Yo1YX-ms5eKCeCAx7HTpw@mail.gmail.com>

On 17/11/20 11:12 pm, Lukas Bulwahn wrote:
> On Tue, Nov 17, 2020 at 7:03 PM Aditya <yashsri421@gmail.com> wrote:
>>
>> On 13/11/20 11:55 pm, Aditya wrote:
>>> On 13/11/20 8:56 pm, Lukas Bulwahn wrote:
>>>> On Fri, Nov 13, 2020 at 4:00 PM Aditya <yashsri421@gmail.com> wrote:
>>>>>
>>>>> On 13/11/20 8:05 pm, Aditya wrote:
>>>>>> On 12/11/20 1:34 am, Lukas Bulwahn wrote:
>>>>>>> On Wed, Nov 11, 2020 at 3:13 PM Aditya <yashsri421@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Sir
>>>>>>>> I have analyzed the checkpatch report for BAD_SIGN_OFF(over
>>>>>>>> v4.13..v5.8) for non-standard signature and generated reports for it.
>>>>>>>> Some mistakes are more frequent than others, whereas some mistakes
>>>>>>>> even have a frequency of 1.
>>>>>>>>
>>>>>>>> Non-standard signatures occurring with their frequency:
>>>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>>>>>
>>>>>>>> Complete warning messages:
>>>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/warn_msgs.txt
>>>>>>>>
>>>>>>>> Should I implement the fix similar to TYPO_FIX, where we have a
>>>>>>>> separate file for common misspellings and corrected words? Or should I
>>>>>>>> make a hash of these misspellings in checkpatch.pl file as well?
>>>>>>>>
>>>>>>>> Also should I include all these misspelled words in it? Or omit words
>>>>>>>> below certain frequency?
>>>>>>>>
>>>>>>>
>>>>>>> I think the best way would be to compute some kind of edit distance to
>>>>>>> the known signature tags and if this edit distance is below a certain
>>>>>>> threshold, suggest that signature tag as the fix. We can then evaluate
>>>>>>> to determine the best suitable threshold. The edit distance between
>>>>>>> the different tags are so large that this should always work as
>>>>>>> intended.
>>>>>>>
>>>>>>> Then, we can look into these other creative tags and propose suitable
>>>>>>> existing tags for the more frequent ones that are non-standard. Or in
>>>>>>> the case, none of the existing ones fit we can start the discussion on
>>>>>>> proposing some new standard ones.
>>>>>>>
>>>>>>
>>>>>> I have generated a list of non-standard signatures and their fixes on
>>>>>> the basis of edit distance.
>>>>>>
>>>>>> This is the common list of non standard signatures and fixes (in
>>>>>> detail):
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/min_dists.txt
>>>>>>
>>>>>> As I observed, I think, we can consider '<=2' as the threshold edit
>>>>>> distance.
>>>>>> List for non-standard signature and their proposed fix with edit
>>>>>> distance<=2 :
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than_3.txt
>>>>>>
>>>>>> I have also generated lists for 3 and 4 edit distance separately for
>>>>>> reference:
>>>>>> Equal to 3:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_3.txt
>>>>>>
>>>>>> Equal to 4:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_4.txt
>>>>>>
>>>>>> For the rest I guess we'll need to hard code eg. for 'Debugged-by',
>>>>>> 'Requested-by' etc.
>>>>>>
>>>>>> These are the complete lists of non-standard signatures:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>>>
>>>>
>>>> Can you share which non-standard-signatures would be
>>>> handled/transformed with edit distance 2 and which would not in a
>>>> similar format to non_standard_signs.txt (so, ordered by frequency).
>>>>
>>>> We can then consider those that remain and find a good next strategy
>>>> for the most frequent non-standard signatures.
>>>>
>>>
>>> Non standard signatures handled with edit distance 2:
>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
>>>
>>> Non standard signatures with edit distance greater than 2:
>>> https://github.com/AdityaSrivast/kernel-tasks/tree/master/random/non_standard_signature/more_than2
>>>
>>
>> I think this mail probably got missed. I'll summarize it a bit for
>> simplicity:
>> With edit distance approach and threshold as 2, we're able to handle
>> 39 out of 109 'distinct' cases of non-standard signature. In this 39,
>> the maximum count of non-standard signature is 19 for 'Reviwed-by:'; 9
>> for 'Reviewd-by:' and other common mispellings.
>> Complete List:
>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
>>
>> However, still we are unable to account for 70 non-standard signatures
>> which occur more frequently (eg 'Debugged-by:', which has occurred 61
>> times; 'Requested-by:', 48 times; and so on).
>> Complete list:
>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/more_than2/signs_freq.txt
>>
>> I think for these cases we'd need to make some file (as is used for
>> TYPO_SPELLING), or hash.
>> What do you think/suggest?
>>
> 
> Yes, I agree.
> 
> Goal 1: Try to map all the non-default signatures to their "standard"
> counterpart as much as possible.
> 
> Goal 2: Introduce a few very little signatures to handle those cases
> that really cannot be mapped to a non-default signature.
> 
> Provide good rationales that you can defend and provide documentation
> for when checkpatch shall explain the fix it proposes.
> 
> Here an example for the first ten cases:
> 
> 1)Debugged-by: 61 -> Codeveloped-by:
> 
> Rationale: Debugging is part of Software Development; so
> Codeveloped-by is perfectly fine, even if the contributor did not
> create code.
> 
> (alternatively: maybe a new Assisted-by would do here.)
> 
> 2)Requested-by: 48 -> Suggested-by:
> 
> Rationale: In an open-source project, there are "no requests", just
> "suggestions" to convince a maintainer to accept your patch.
> 
> 3)Co-authored-by: 43 -> Codeveloped-by:
> 
> Rationale: clear. Codeveloped-by and Co-authored-by are synonyms.
> 
> 4)Originally-by: 39
> 
> Maybe something like this deserves to be a new tag. There is a
> significant difference to codeveloped-by. But that needs discussion.
> 
> 5)Analyzed-by: 22
> 
> Rationale: Analyzing is part of Software Development; so
> Codeveloped-by is perfectly fine, even if the contributor did not
> create code.
> (alternatively: maybe a new Assisted-by would do here.)
> 
> 6)Bisected-by: 20
> 
> Difficult...
> (maybe a new Assisted-by would do here.)
> 
> 7)Improvements-by: 19 -> Codeveloped-by:
> 
> 8)Generated-by: 17 -> Reported-by: ?
> 
> What does generated-by actually mean?
> 
> 9)Noticed-by: 11 -> Reported-by:
> 
> 10)Inspired-by: 11 -> Suggested-by:
> 
> Maybe you can come up with a list for the next twenty and then we
> discuss them with Joe Perches and then a larger group?
> 

Sounds good. Will send by tomorrow morning :)

Thanks
Aditya
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2020-11-17 20:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 14:13 [Linux-kernel-mentees] Fix for BAD_SIGN_OFF: non-standard signature Aditya
2020-11-11 20:04 ` Lukas Bulwahn
2020-11-13 14:35   ` Aditya
2020-11-13 15:00     ` Aditya
2020-11-13 15:26       ` Lukas Bulwahn
2020-11-13 18:25         ` Aditya
2020-11-17 18:03           ` Aditya
2020-11-17 17:42             ` Lukas Bulwahn
2020-11-17 20:54               ` Aditya [this message]
2020-11-18 10:12                 ` Aditya
2020-11-18 19:17                   ` Lukas Bulwahn
2020-11-19  5:53                   ` Lukas Bulwahn
2020-11-19 14:09                     ` Aditya
2020-11-20 19:58                       ` [Linux-kernel-mentees] [PATCH] checkpatch: add fix and improve warning msg for Non-standard signature Aditya Srivastava
2020-11-20 20:03                         ` Aditya
2020-11-20 20:23                           ` Lukas Bulwahn
2020-11-20 21:30                             ` Aditya
2020-11-21  4:58                               ` [Linux-kernel-mentees] [PATCH v2] " Aditya Srivastava
2020-11-21  9:52                                 ` Lukas Bulwahn
2020-11-23 12:21                                   ` [Linux-kernel-mentees] [PATCH v3] " Aditya Srivastava
2020-11-23 13:09                                     ` Lukas Bulwahn
2020-11-23 15:16                                       ` Aditya
2020-11-23 15:18                                         ` Lukas Bulwahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=050ddb53-33ff-83bc-7f91-b7c2874211f6@gmail.com \
    --to=yashsri421@gmail.com \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=lukas.bulwahn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.