All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
@ 2014-09-15 18:43 Fabian Frederick
  2014-09-15 21:02 ` Joe Perches
  2014-09-16  3:22 ` Joe Perches
  0 siblings, 2 replies; 7+ messages in thread
From: Fabian Frederick @ 2014-09-15 18:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Fabian Frederick, Andrew Morton, Joe Perches

Adding patch subject uniqueness check in checkpatch --strict mode.
See Documentation/SubmittingPatches/globally-unique identifier.

Inspired-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
 scripts/checkpatch.pl | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 0c520f7..2be06c9 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1810,6 +1810,7 @@ sub process {
 
 	our $clean = 1;
 	my $signoff = 0;
+	my $git_samesubjects = 0;
 	my $is_patch = 0;
 
 	my $in_header_lines = $file ? 0 : 1;
@@ -2047,6 +2048,15 @@ sub process {
 			}
 		}
 
+# Check patch subject on subjective/strict check mode
+		if ($check && $line =~ /^Subject: \[(.*)\](.*)/) {
+			my $subject = $2;
+			if ($quiet == 0) {
+				print "Looking for patches with the same subject in git repository ...\n";
+			}
+			$git_samesubjects = `git log --oneline | grep -m1 "$subject\$" | wc  -l`;
+		}
+
 # Check the patch for a signoff:
 		if ($line =~ /^\s*signed-off-by:/i) {
 			$signoff++;
@@ -5091,6 +5101,10 @@ sub process {
 		ERROR("MISSING_SIGN_OFF",
 		      "Missing Signed-off-by: line(s)\n");
 	}
+	if ($is_patch && $git_samesubjects > 0)  {
+		WARN("NOT_UNIQUE_SUBJECT",
+		     "similar subjects found in git repository\n");
+	}
 
 	print report_dump();
 	if ($summary && !($clean == 1 && $quiet == 1)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-15 18:43 [PATCH 1/1] checkpatch: check for subject uniqueness in git repository Fabian Frederick
@ 2014-09-15 21:02 ` Joe Perches
  2014-09-16  3:22 ` Joe Perches
  1 sibling, 0 replies; 7+ messages in thread
From: Joe Perches @ 2014-09-15 21:02 UTC (permalink / raw)
  To: Fabian Frederick; +Cc: linux-kernel, Andrew Morton

On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> Adding patch subject uniqueness check in checkpatch --strict mode.
> See Documentation/SubmittingPatches/globally-unique identifier.

I'm not sold on the concept as I'm not sure that
global subject uniqueness is all that important.

There are many "update defconfig" type patches that
have identical subjects.  There really isn't a need
to have more information or add a numeric sequence #
to a subject like that.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> +# Check patch subject on subjective/strict check mode
> +		if ($check && $line =~ /^Subject: \[(.*)\](.*)/) {
> +			my $subject = $2;
> +			if ($quiet == 0) {
> +				print "Looking for patches with the same subject in git repository ...\n";
> +			}
> +			$git_samesubjects = `git log --oneline | grep -m1 "$subject\$" | wc  -l`;

checkpatch doesn't require a git repository to be
successfully used.

Please add a new function, something like the existing
"sub git_commit_info", for this so that if git is not
installed, there will not be a runtime failure.

Store the output in a perl array and use perl's grep
mechanism rather than the grep command line tool.

This will be _very_ slow.

On my computer with an ssd it takes > 20 seconds
just to do:

$ git --no-pager log --nocolor --no-merges --format='%H %s'

fyi: there are already > ~3500 commits with duplicate titles.

This will not find duplicate titles in a patch series unless
the patches were created with git and the current branch is
the same one that was used to create the series.

$ git checkout -b <name>
while (more changes to make) {
  [ modify / compile / test]
  $ git commit ...
}
$ git format-patch -o <dir> --cover-letter master
$ git checkout master
$ ./scripts/checkpatch.pl <dir>/*

No problem would be found...

Now for a patch series, it would be useful/better to scan
directories for patches with duplicate titles using
something like:

$ ./scripts/checkpatch.pl <dir>/*.patch

and compare the "^Subject:" lines for duplicates without
using git at all.

> @@ -5091,6 +5101,10 @@ sub process {
>  		ERROR("MISSING_SIGN_OFF",
>  		      "Missing Signed-off-by: line(s)\n");
>  	}
> +	if ($is_patch && $git_samesubjects > 0)  {
> +		WARN("NOT_UNIQUE_SUBJECT",
> +		     "similar subjects found in git repository\n");

s/similar subjects/identical patch subjects/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-15 18:43 [PATCH 1/1] checkpatch: check for subject uniqueness in git repository Fabian Frederick
  2014-09-15 21:02 ` Joe Perches
@ 2014-09-16  3:22 ` Joe Perches
  2014-09-16 16:15   ` Fabian Frederick
  1 sibling, 1 reply; 7+ messages in thread
From: Joe Perches @ 2014-09-16  3:22 UTC (permalink / raw)
  To: Fabian Frederick; +Cc: linux-kernel, Andrew Morton

On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> Adding patch subject uniqueness check in checkpatch --strict mode.
> See Documentation/SubmittingPatches/globally-unique identifier.

Perhaps something like this?
---
 scripts/checkpatch.pl | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 0c520f7..e19c40b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -622,6 +622,27 @@ sub git_commit_info {
 	return ($id, $desc);
 }
 
+my @git_commits = ();
+my @previous_subjects = ();
+
+sub check_subject_duplication {
+	my ($subject) = @_;
+
+	if ($check && $#git_commits < 1 && which("git") && -e ".git") {
+		my $output = `git log --no-color --format='%H %s' 2>&1`;
+		$output =~ s/^\s*//gm;
+		@git_commits = split("\n", $output);
+	}
+
+	return 1 if (grep(/^[a-f0-9]{40,40} $subject$/, @git_commits));
+
+	return 2 if (grep(/^$subject$/, @previous_subjects));
+
+	push(@previous_subjects, $subject);
+
+	return 0;
+}
+
 $chk_signoff = 0 if ($file);
 
 my @rawlines = ();
@@ -1264,6 +1285,20 @@ sub raw_line {
 	return $line;
 }
 
+sub get_complete_header {
+	my ($linenr) = @_;
+
+	my $offset = $linenr - 1;
+
+	my $line = $rawlines[$offset++];
+	while (defined $rawlines[$offset] &&
+	       $rawlines[$offset] =~ /\s(\S.*)$/) {
+		$line .= $rawlines[$offset++];
+	}
+
+	return $line;
+}
+
 sub cat_vet {
 	my ($vet) = @_;
 	my ($res, $coded);
@@ -2047,6 +2082,23 @@ sub process {
 			}
 		}
 
+# Check git commit and patch subject duplication
+		if ($in_header_lines && $line =~ /^Subject:/) {
+			my $subject = get_complete_header($linenr);
+			if ($subject =~ /^Subject:\s*(?:\[[^\]]*\])?\s*(.*)$/) {
+				$subject = $1;
+			}
+
+			my $subject_dup = check_subject_duplication($subject);
+			if ($subject_dup == 1) {
+				CHK("NOT_UNIQUE_SUBJECT",
+				    "Identical git commit subject found\n" . $herecurr);
+			} elsif ($subject_dup == 2) {
+				CHK("NOT_UNIQUE_SUBJECT",
+				    "Identical patch commit subject found\n" . $herecurr);
+			}
+		}
+
 # Check the patch for a signoff:
 		if ($line =~ /^\s*signed-off-by:/i) {
 			$signoff++;



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-16  3:22 ` Joe Perches
@ 2014-09-16 16:15   ` Fabian Frederick
  2014-09-16 16:31     ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Fabian Frederick @ 2014-09-16 16:15 UTC (permalink / raw)
  To: Joe Perches; +Cc: Andrew Morton, linux-kernel



> On 16 September 2014 at 05:22 Joe Perches <joe@perches.com> wrote:
>
>
> On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> > Adding patch subject uniqueness check in checkpatch --strict mode.
> > See Documentation/SubmittingPatches/globally-unique identifier.
>
> Perhaps something like this?
> ---
>  scripts/checkpatch.pl | 52
>+++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 0c520f7..e19c40b 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -622,6 +622,27 @@ sub git_commit_info {
>       return ($id, $desc);
>  }
> 
> +my @git_commits = ();
> +my @previous_subjects = ();
> +
> +sub check_subject_duplication {
> +     my ($subject) = @_;
> +
> +     if ($check && $#git_commits < 1 && which("git") && -e ".git") {
> +             my $output = `git log --no-color --format='%H %s' 2>&1`;
> +             $output =~ s/^\s*//gm;
> +             @git_commits = split("\n", $output);
> +     }
> +
> +     return 1 if (grep(/^[a-f0-9]{40,40} $subject$/, @git_commits));
> +
> +     return 2 if (grep(/^$subject$/, @previous_subjects));
> +
> +     push(@previous_subjects, $subject);
> +
> +     return 0;
> +}
> +
>  $chk_signoff = 0 if ($file);
> 
>  my @rawlines = ();
> @@ -1264,6 +1285,20 @@ sub raw_line {
>       return $line;
>  }
> 
> +sub get_complete_header {
> +     my ($linenr) = @_;
> +
> +     my $offset = $linenr - 1;
> +
> +     my $line = $rawlines[$offset++];
> +     while (defined $rawlines[$offset] &&
> +            $rawlines[$offset] =~ /\s(\S.*)$/) {
> +             $line .= $rawlines[$offset++];
> +     }
> +
> +     return $line;
> +}
> +
>  sub cat_vet {
>       my ($vet) = @_;
>       my ($res, $coded);
> @@ -2047,6 +2082,23 @@ sub process {
>                       }
>               }
> 
> +# Check git commit and patch subject duplication
> +             if ($in_header_lines && $line =~ /^Subject:/) {
> +                     my $subject = get_complete_header($linenr);
> +                     if ($subject =~ /^Subject:\s*(?:\[[^\]]*\])?\s*(.*)$/) {
> +                             $subject = $1;
> +                     }
> +
> +                     my $subject_dup = check_subject_duplication($subject);
> +                     if ($subject_dup == 1) {
> +                             CHK("NOT_UNIQUE_SUBJECT",
> +                                 "Identical git commit subject found\n" .
> $herecurr);
> +                     } elsif ($subject_dup == 2) {
> +                             CHK("NOT_UNIQUE_SUBJECT",
> +                                 "Identical patch commit subject found\n" .
> $herecurr);
> +                     }
> +             }
> +
>  # Check the patch for a signoff:
>               if ($line =~ /^\s*signed-off-by:/i) {
>                       $signoff++;
>
>
Perfect ! :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-16 16:15   ` Fabian Frederick
@ 2014-09-16 16:31     ` Joe Perches
  2014-09-20 10:31       ` Fabian Frederick
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2014-09-16 16:31 UTC (permalink / raw)
  To: Fabian Frederick; +Cc: Andrew Morton, linux-kernel

On Tue, 2014-09-16 at 18:15 +0200, Fabian Frederick wrote:
> > On 16 September 2014 at 05:22 Joe Perches <joe@perches.com> wrote:
> > On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> > > Adding patch subject uniqueness check in checkpatch --strict mode.
> > > See Documentation/SubmittingPatches/globally-unique identifier.
> >
> > Perhaps something like this?
[]
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> > +     if ($check && $#git_commits < 1 && which("git") && -e ".git") {
> > +             my $output = `git log --no-color --format='%H %s' 2>&1`;
> > +             $output =~ s/^\s*//gm;
> > +             @git_commits = split("\n", $output);
> > +     }

> Perfect ! :)

Except for that _really, really_ long time to do the
git log of all commits...

Maybe the git log should be enabled only with another
command-line option.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-16 16:31     ` Joe Perches
@ 2014-09-20 10:31       ` Fabian Frederick
  2014-09-20 16:07         ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Fabian Frederick @ 2014-09-20 10:31 UTC (permalink / raw)
  To: Joe Perches; +Cc: Andrew Morton, linux-kernel



> On 16 September 2014 at 18:31 Joe Perches <joe@perches.com> wrote:
>
>
> On Tue, 2014-09-16 at 18:15 +0200, Fabian Frederick wrote:
> > > On 16 September 2014 at 05:22 Joe Perches <joe@perches.com> wrote:
> > > On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> > > > Adding patch subject uniqueness check in checkpatch --strict mode.
> > > > See Documentation/SubmittingPatches/globally-unique identifier.
> > >
> > > Perhaps something like this?
> []
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
> > > +     if ($check && $#git_commits < 1 && which("git") && -e ".git") {
> > > +             my $output = `git log --no-color --format='%H %s' 2>&1`;
> > > +             $output =~ s/^\s*//gm;
> > > +             @git_commits = split("\n", $output);
> > > +     }
>
> > Perfect ! :)
>
> Except for that _really, really_ long time to do the
> git log of all commits...
>
> Maybe the git log should be enabled only with another
> command-line option.
>
Hello Joe,

checkpatck does already execute git log in seed_camelcase_includes and
git_commit_info when .git exists
so we could add some specific option instead like --complete --verbose
--checkduplicates or something for this test ?

Regards,
Fabian
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] checkpatch: check for subject uniqueness in git repository.
  2014-09-20 10:31       ` Fabian Frederick
@ 2014-09-20 16:07         ` Joe Perches
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Perches @ 2014-09-20 16:07 UTC (permalink / raw)
  To: Fabian Frederick; +Cc: Andrew Morton, linux-kernel

On Sat, 2014-09-20 at 12:31 +0200, Fabian Frederick wrote:
> > On 16 September 2014 at 18:31 Joe Perches <joe@perches.com> wrote:
> > On Tue, 2014-09-16 at 18:15 +0200, Fabian Frederick wrote:
> > > > On 16 September 2014 at 05:22 Joe Perches <joe@perches.com> wrote:
> > > > On Mon, 2014-09-15 at 20:43 +0200, Fabian Frederick wrote:
> > > > > Adding patch subject uniqueness check in checkpatch --strict mode.
> > > > > See Documentation/SubmittingPatches/globally-unique identifier.
> > > >
> > > > Perhaps something like this?
> > []
> > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > []
> > > > +     if ($check && $#git_commits < 1 && which("git") && -e ".git") {
> > > > +             my $output = `git log --no-color --format='%H %s' 2>&1`;
> > > > +             $output =~ s/^\s*//gm;
> > > > +             @git_commits = split("\n", $output);
> > > > +     }
> >
> > > Perfect ! :)
> >
> > Except for that _really, really_ long time to do the
> > git log of all commits...
> >
> > Maybe the git log should be enabled only with another
> > command-line option.
> >
> Hello Joe,

Dag Fabian:

> checkpatck does already execute git log in seed_camelcase_includes and
> git_commit_info when .git exists

which takes almost no time

$ time git log --no-merges --pretty=format:"%h%n" -1 -- include
40182b1

real	0m0.013s
user	0m0.008s
sys	0m0.000s

and

$ time git log --no-color --format='%H %s' -1 ^HEAD 2>&1

real	0m0.013s
user	0m0.004s
sys	0m0.004s

vs

$ time git log --no-merges --pretty=format:"%H %s" > /dev/null 2>&1

real	0m19.386s
user	0m17.264s
sys	0m1.008s

> so we could add some specific option instead like --complete --verbose
> --checkduplicates or something for this test ?

Maybe.

I'm still dubious about both the git use because of
the runtime and the checkpatch use as it's after the
fact.

The concept seems more suited to a git commit-msg hook
to me.  The runtime there is still a consideration too.

In a bit more detail:

If this modified checkpatch was run on a branch where
the patch was committed, in this sort of sequence:

	git checkout -b [some_branch]
	[make changes]
	git commit [etc]
	git format-patch -1
	./scripts/checkpatch.pl <patch from git format-patch>

then this new warning would trigger which seems senseless.

And in the case where multiple patches are done and then
scanned with checkpatch in a single block, the damage has
already been done and the defect was not found when it
occurred.

Andrew? Do you have an opinion?


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-20 16:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15 18:43 [PATCH 1/1] checkpatch: check for subject uniqueness in git repository Fabian Frederick
2014-09-15 21:02 ` Joe Perches
2014-09-16  3:22 ` Joe Perches
2014-09-16 16:15   ` Fabian Frederick
2014-09-16 16:31     ` Joe Perches
2014-09-20 10:31       ` Fabian Frederick
2014-09-20 16:07         ` Joe Perches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.