* get_maintainer.pl produces non-deterministic results @ 2019-12-10 13:47 Dmitry Vyukov 2019-12-11 0:01 ` Joe Perches 0 siblings, 1 reply; 4+ messages in thread From: Dmitry Vyukov @ 2019-12-10 13:47 UTC (permalink / raw) To: Joe Perches, Vegard Nossum, LKML, Michael Ellerman Hi Joe, scripts/get_maintainer.pl fs/proc/task_mmu.c non-deterministically gives me from 13 to 16 results, different number every time (on upstream 6794862a). Perl v5.28.1. Michael confirmed this with v5.28.2. Vergard suggested to check PERL_HASH_SEED=0. Indeed it fixes non-determinism. But I guess it's not the right solution, there should be some logical problem. My perl-fo is weak, I appreciate if somebody with proper perl-fo takes a look. Thanks ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: get_maintainer.pl produces non-deterministic results 2019-12-10 13:47 get_maintainer.pl produces non-deterministic results Dmitry Vyukov @ 2019-12-11 0:01 ` Joe Perches 2019-12-11 6:41 ` Vegard Nossum 0 siblings, 1 reply; 4+ messages in thread From: Joe Perches @ 2019-12-11 0:01 UTC (permalink / raw) To: Dmitry Vyukov, Vegard Nossum, LKML, Michael Ellerman On Tue, 2019-12-10 at 14:47 +0100, Dmitry Vyukov wrote: > Hi Joe, > > scripts/get_maintainer.pl fs/proc/task_mmu.c > non-deterministically gives me from 13 to 16 results, different number > every time (on upstream 6794862a). Perl v5.28.1. Michael confirmed > this with v5.28.2. > Vergard suggested to check PERL_HASH_SEED=0. Indeed it fixes > non-determinism. But I guess it's not the right solution, there should > be some logical problem. > My perl-fo is weak, I appreciate if somebody with proper perl-fo takes a look. > > Thanks https://lkml.org/lkml/2017/7/13/789 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: get_maintainer.pl produces non-deterministic results 2019-12-11 0:01 ` Joe Perches @ 2019-12-11 6:41 ` Vegard Nossum 2019-12-12 0:12 ` Joe Perches 0 siblings, 1 reply; 4+ messages in thread From: Vegard Nossum @ 2019-12-11 6:41 UTC (permalink / raw) To: Joe Perches; +Cc: Dmitry Vyukov, LKML, Michael Ellerman On Wed, 11 Dec 2019 at 01:02, Joe Perches <joe@perches.com> wrote: > > On Tue, 2019-12-10 at 14:47 +0100, Dmitry Vyukov wrote: > > Hi Joe, > > > > scripts/get_maintainer.pl fs/proc/task_mmu.c > > non-deterministically gives me from 13 to 16 results, different number > > every time (on upstream 6794862a). Perl v5.28.1. Michael confirmed > > this with v5.28.2. > > Vergard suggested to check PERL_HASH_SEED=0. Indeed it fixes > > non-determinism. But I guess it's not the right solution, there should > > be some logical problem. > > My perl-fo is weak, I appreciate if somebody with proper perl-fo takes a look. > > > > Thanks > > https://lkml.org/lkml/2017/7/13/789 Right, so you can make it reproducible if you add a tie-break to the sorting: diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index 34085d146fa2c..109d9fb134dad 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -2179,7 +2179,7 @@ sub vcs_assign { $hash{$_}++ for @lines; # sort -rn - foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) { + foreach my $line (sort {$hash{$b} <=> $hash{$a} || $a cmp $b} keys %hash) { my $sign_offs = $hash{$line}; my $percent = $sign_offs * 100 / $divisor; This would actually favour names that start with early letters (A, B, ...) over late letters (..., Y, Z), which might also be a bad thing. I think to fix that you could include everybody who has the same number of signoffs at the cutoff: diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index 34085d146fa2c..80d3ed2ee6d70 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -2179,7 +2179,8 @@ sub vcs_assign { $hash{$_}++ for @lines; # sort -rn - foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) { + my $prev_sign_offs = -1; + foreach my $line (sort {$hash{$b} <=> $hash{$a} || $a cmp $b} keys %hash) { my $sign_offs = $hash{$line}; my $percent = $sign_offs * 100 / $divisor; @@ -2187,7 +2188,7 @@ sub vcs_assign { next if (ignore_email_address($line)); $count++; last if ($sign_offs < $email_git_min_signatures || - $count > $email_git_max_maintainers || + ($prev_sign_offs != $sign_offs && $count > $email_git_max_maintainers) || $percent < $email_git_min_percent); push_email_address($line, ''); if ($output_rolestats) { @@ -2196,6 +2197,8 @@ sub vcs_assign { } else { add_role($line, $role); } + + $prev_sign_offs = $sign_offs; } } These patches are probably horribly whitespace damaged, hopefully you get the gist of it though... Vegard ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: get_maintainer.pl produces non-deterministic results 2019-12-11 6:41 ` Vegard Nossum @ 2019-12-12 0:12 ` Joe Perches 0 siblings, 0 replies; 4+ messages in thread From: Joe Perches @ 2019-12-12 0:12 UTC (permalink / raw) To: Vegard Nossum; +Cc: Dmitry Vyukov, LKML, Michael Ellerman On Wed, 2019-12-11 at 07:41 +0100, Vegard Nossum wrote: > On Wed, 11 Dec 2019 at 01:02, Joe Perches <joe@perches.com> wrote: > > On Tue, 2019-12-10 at 14:47 +0100, Dmitry Vyukov wrote: > > > Hi Joe, > > > > > > scripts/get_maintainer.pl fs/proc/task_mmu.c > > > non-deterministically gives me from 13 to 16 results, different number > > > every time (on upstream 6794862a). Perl v5.28.1. Michael confirmed > > > this with v5.28.2. > > > Vergard suggested to check PERL_HASH_SEED=0. Indeed it fixes > > > non-determinism. But I guess it's not the right solution, there should > > > be some logical problem. > > > My perl-fo is weak, I appreciate if somebody with proper perl-fo takes a look. > > > > > > Thanks > > > > https://lkml.org/lkml/2017/7/13/789 > > Right, so you can make it reproducible if you add a tie-break to the sorting: > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > index 34085d146fa2c..109d9fb134dad 100755 > --- a/scripts/get_maintainer.pl > +++ b/scripts/get_maintainer.pl > @@ -2179,7 +2179,7 @@ sub vcs_assign { > $hash{$_}++ for @lines; > > # sort -rn > - foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) { > + foreach my $line (sort {$hash{$b} <=> $hash{$a} || $a cmp $b} keys %hash) { > my $sign_offs = $hash{$line}; > my $percent = $sign_offs * 100 / $divisor; > > This would actually favour names that start with early letters (A, B, > ...) over late letters (..., Y, Z), which might also be a bad thing. I > think to fix that you could include everybody who has the same number > of signoffs at the cutoff: > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > index 34085d146fa2c..80d3ed2ee6d70 100755 > --- a/scripts/get_maintainer.pl > +++ b/scripts/get_maintainer.pl > @@ -2179,7 +2179,8 @@ sub vcs_assign { > $hash{$_}++ for @lines; > > # sort -rn > - foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) { > + my $prev_sign_offs = -1; > + foreach my $line (sort {$hash{$b} <=> $hash{$a} || $a cmp $b} keys %hash) { > my $sign_offs = $hash{$line}; > my $percent = $sign_offs * 100 / $divisor; > > @@ -2187,7 +2188,7 @@ sub vcs_assign { > next if (ignore_email_address($line)); > $count++; > last if ($sign_offs < $email_git_min_signatures || > - $count > $email_git_max_maintainers || > + ($prev_sign_offs != $sign_offs && $count > > $email_git_max_maintainers) || > $percent < $email_git_min_percent); > push_email_address($line, ''); > if ($output_rolestats) { > @@ -2196,6 +2197,8 @@ sub vcs_assign { > } else { > add_role($line, $role); > } > + > + $prev_sign_offs = $sign_offs; > } > } > > These patches are probably horribly whitespace damaged, hopefully you > get the gist of it though... I get the gist, but I think it's also not particularly important to be repeatable. I think it's more important to get more pattern coverage of the MAINTAINERS file as that is more important than any use of git history. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-12 0:13 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-10 13:47 get_maintainer.pl produces non-deterministic results Dmitry Vyukov 2019-12-11 0:01 ` Joe Perches 2019-12-11 6:41 ` Vegard Nossum 2019-12-12 0:12 ` Joe Perches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).