linux-kernel-mentees.lists.linuxfoundation.org archive mirror
 help / color / mirror / Atom feed
From: Joe Perches <joe@perches.com>
To: Dwaipayan Ray <dwaipayanray1@gmail.com>,
	Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: apw@canonical.com,
	linux-kernel-mentees@lists.linuxfoundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header
Date: Sat, 19 Sep 2020 12:47:43 -0700	[thread overview]
Message-ID: <6f612f0e19c0877763ce964e3164e3f062d28741.camel@perches.com> (raw)
In-Reply-To: <CABJPP5Daf0UiKjeQp71_bViAkQQh7znWrArKAS1FqnjGU1nX4A@mail.gmail.com>

On Sun, 2020-09-20 at 01:08 +0530, Dwaipayan Ray wrote:
> On Sun, Sep 20, 2020 at 12:06 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> > On Sat, 19 Sep 2020, Joe Perches wrote:
> > > On Sat, 2020-09-19 at 20:12 +0200, Lukas Bulwahn wrote:
> > > > On Sat, 19 Sep 2020, Joe Perches wrote:
> > > > > On Sat, 2020-09-19 at 13:42 +0530, Dwaipayan Ray wrote:
> > > > > > Checkpatch did not handle cases where the author From: header
> > > > > > was split into two lines. The author string went empty and
> > > > > > checkpatch generated a false NO_AUTHOR_SIGN_OFF warning.
> > > > > 
> > > > > It's good to provide an example where the current code
> > > > > doesn't work.
> > > > > 
> > > > Joe, as this is a linux-kernel-mentees patch, we discussed that before
> > > > reaching out to you; you can find Dwaipayan's own evaluation here:
> > > > 
> > > > https://lore.kernel.org/linux-kernel-mentees/CABJPP5BOTG0QLFSaRJTb2vAZ_hJf229OAQihHKG4sYd35i_WMw@mail.gmail.com/
> > > > 
> > > > Dwaipayan, Joe's comment is still valid; it would be good to describe
> > > > the reasons why patches might have split lines (as far as see, long
> > > > encodings for non-ascii names).
> > > > 
> > > > I will run my own evaluation of checkpatch.pl before and after patch
> > > > application on Monday and then check if I can confirm Dwaipayan's results.
> > > > 
> > > > > It likely would be better to do this by searching forward for
> > > > > any extension lines after a "^From:' rather than searching
> > > > > backwards as there can be any number of extension lines.
> > > > > 
> > > > Just to sure what you are talking about...
> > > > 
> > > > You mean just to access the next line through the lines array, rather
> > > > than using prevheader and trying to decode that one line twice.
> > > > 
> > > > I agree the logic is a bit redundant and complicated at the moment.
> > > > 
> > > > Once prevheader is non-empty, it already clear that author is '' and
> > > > prevheader decodes with that match, because that is the only way to
> > > > make prevheader non-empty in the first place; at least as far I see it
> > > > right now.
> > > 
> > > Yeah, something like this (completely untested):
> > > ---
> > >  scripts/checkpatch.pl | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 3e474072aa90..2c710d05b184 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -2679,9 +2679,13 @@ sub process {
> > >               }
> > > 
> > >  # Check the patch for a From:
> > > -             if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > > +             if ($line =~ /^From:\s*(.*)/i) {
> > >                       $author = $1;
> > > -                     $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > > +                     my $curline = $linenr;
> > > +                     while (defined($rawlines[$curline] && $rawlines[$curline++] =~ /^ \s*(.*)/) {
> > > +                             $author .= $1;
> > > +                     }
> > > +                     $author = encode("utf8", $author) if ($author =~ /=\?utf-8\?/i);
> > >                       $author =~ s/"//g;
> > >                       $author = reformat_email($author);
> > >               }
> > > 
> 
> Hi,
> 
> Yeah I think the backwards checking was pretty redundant after all. If the
> extended encoding went too long, the From: header would be split into
> more than two lines and my proposed solution would fail.
> 
> Thanks for the heads up, Joe!
> 
> > Yeah, I get how you would like to see that being implemented. I will work
> > with Dwaipayan to get that properly implemented, properly described and
> > tested.
> > 
> > But let us keep the fun of that task to Dwaipayan... that is what a
> > mentorship is all about :)
> > 
> > Lukas
> 
> Yes definitely, the task is interesting for me, and I would like to solve
> it in a proper way.
> 
> As for the fix, shouldn't we stop the author string concatenation once
> an email address is found? something like:
> 
>   last if  $rawlines[$curline] = ~/^\s*(\S+\@\S+)\s*/

Probably not.

I think it should follow the rfc standard with extension
lines starting with a space.

See rfc 5322, 2.2.3 Long Header Fields

> I will update the patch and sync up with Lukas on this.

Enjoy.

I believe I now have a working version, we can compare later.

cheers, Joe

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

      reply	other threads:[~2020-09-19 19:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-19  8:12 [Linux-kernel-mentees] [PATCH] checkpatch: extend author Signed-off-by check for split From: header Dwaipayan Ray
2020-09-19 17:26 ` Joe Perches
2020-09-19 18:12   ` Lukas Bulwahn
2020-09-19 18:19     ` Joe Perches
2020-09-19 18:36       ` Lukas Bulwahn
2020-09-19 19:38         ` Dwaipayan Ray
2020-09-19 19:47           ` Joe Perches [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f612f0e19c0877763ce964e3164e3f062d28741.camel@perches.com \
    --to=joe@perches.com \
    --cc=apw@canonical.com \
    --cc=dwaipayanray1@gmail.com \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas.bulwahn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).