From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
To: Dwaipayan Ray <dwaipayanray1@gmail.com>,
Aditya Srivastava <yashsri421@gmail.com>
Cc: linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [Linux-kernel-mentees] [PATCH RFC] checkpatch: improve handling of email comments
Date: Wed, 28 Oct 2020 16:59:47 +0100 (CET) [thread overview]
Message-ID: <alpine.DEB.2.21.2010281659160.13040@felia> (raw)
In-Reply-To: <CABJPP5CkzaQ8x+b8UYZCLwXC1WaZL6+rpkespV684zkPAeJXrA@mail.gmail.com>
On Wed, 28 Oct 2020, Dwaipayan Ray wrote:
> On Wed, Oct 28, 2020 at 8:55 PM Dwaipayan Ray <dwaipayanray1@gmail.com> wrote:
> >
> > checkpatch has limited support for parsing email comments. It only
> > support single name comments or single after address comments.
> > Whereas, RFC 5322 specifies that comments can be inserted in
> > between any tokens of the email fields.
> >
> > On analyzing 50,000 commits from v5.4 it was found that there were
> > about 370 false positives resulting from wrong parsing of comments.
> >
> > Improve comment parsing mechanism in checkpatch.
> >
> > What is handled now:
> >
> > - Multiple name/address comments
> > - Comments anywhere in between name/address
> > - Multi level comments like (John (Doe) )
> >
> > Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
> > ---
> > scripts/checkpatch.pl | 19 ++++++++++++++-----
> > 1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index fab38b493cef..ae8436385fc1 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1183,14 +1183,20 @@ sub parse_email {
> > }
> > }
> >
> > - $comment = trim($comment);
> > + # Comments in between name like John(A nice chap) Doe
> > + while ($name =~ s/\s*($balanced_parens)\s*/ /) {
> > + $name_comment .= trim($1);
> > + }
> > $name = trim($name);
> > $name =~ s/^\"|\"$//g;
> > - if ($name =~ s/(\s*\([^\)]+\))\s*//) {
> > - $name_comment = trim($1);
> > +
> > + # Comments in between address like <john(his account)@doe.com>
> > + while ($address =~ s/\s*($balanced_parens)\s*//) {
> > + $comment .= trim($1);
> > }
> > $address = trim($address);
> > $address =~ s/^\<|\>$//g;
> > + $comment = trim($comment);
> >
> > if ($name =~ /[^\w \-]/i) { ##has "must quote" chars
> > $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
> > @@ -1205,8 +1211,6 @@ sub format_email {
> >
> > my $formatted_email;
> >
> > - $name_comment = trim($name_comment);
> > - $comment = trim($comment);
> > $name = trim($name);
> > $name =~ s/^\"|\"$//g;
> > $address = trim($address);
> > @@ -1216,6 +1220,11 @@ sub format_email {
> > $name = "\"$name\"";
> > }
> >
> > + $name_comment = trim($name_comment);
> > + $name_comment =~ s/(.+)/ $1/;
> > + $comment = trim($comment);
> > + $comment =~ s/(.+)/ $1/;
> > +
> > if ("$name" eq "") {
> > $formatted_email = "$address";
> > } else {
> > --
> > 2.27.0
> >
>
> Hi,
> This patch is a follow up to our discussion earlier about improving the
> comment handling at:
> https://lore.kernel.org/linux-kernel-mentees/alpine.DEB.2.21.2010251022060.25172@felia/
>
> I have only tested it on a few patterns. But I will run an
> evaluation again on those 50k commits to see what changes.
>
> What do you think of this?
>
That was exactly the comment I wanted to make.
Commit message is clear; code addition also looks pretty clear.
We are just missing the evaluation to make sure how much we improve and
that we did not add some stupid bug on the way.
Aditya, can you help us here with evaluation?
E.g., Dwaipayan takes from v5.4 into the future, e.g., until v5.8/v5.9 or
so.
Aditya, you could then check e.g., v5.0..v5.4, so v5.4 down to the past as
far as your computer and scripts handles within roughly half a day...
An evaluation on 100,000 commits is certainly a good basis.
Of course, every mentorship candidate can join here as well and show off
their own evaluation script.
Lukas
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
next prev parent reply other threads:[~2020-10-28 16:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-28 15:25 [Linux-kernel-mentees] [PATCH RFC] checkpatch: improve handling of email comments Dwaipayan Ray
2020-10-28 15:28 ` Dwaipayan Ray
2020-10-28 15:59 ` Lukas Bulwahn
2020-10-28 15:59 ` Lukas Bulwahn [this message]
2020-10-28 16:28 ` Aditya
2020-10-28 16:38 ` Dwaipayan Ray
2020-10-29 15:03 ` Aditya
2020-10-29 15:47 ` Dwaipayan Ray
2020-10-29 19:43 ` Dwaipayan Ray
2020-10-30 5:30 ` Lukas Bulwahn
2020-10-30 8:31 ` Aditya
2020-10-30 8:47 ` Dwaipayan Ray
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.2010281659160.13040@felia \
--to=lukas.bulwahn@gmail.com \
--cc=dwaipayanray1@gmail.com \
--cc=linux-kernel-mentees@lists.linuxfoundation.org \
--cc=yashsri421@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).