* [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() @ 2022-09-16 8:47 Janne Grunau 2022-09-16 15:51 ` Martin Povišer 2022-09-17 14:11 ` Joe Perches 0 siblings, 2 replies; 6+ messages in thread From: Janne Grunau @ 2022-09-16 8:47 UTC (permalink / raw) To: Joe Perches; +Cc: linux-kernel Extend the regexp matching name characters to cover Unicode blocks Latin Extended-A and Extended-B. Fixes 'scripts/get_maintainer.pl -f' for 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. Signed-off-by: Janne Grunau <j@jannau.net> --- This still excludes Greek and Cyrilic characters which should be expected in names as well. I tried to use '\p{L}' to match all Unicode letters but couldn't get it to work. Feel free understand this as bug report with an incomplete fix. best regards, Janne --- scripts/get_maintainer.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index ab123b498fd9..7c06f06dcbfa 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -442,7 +442,7 @@ sub maintainers_in_file { my $text = do { local($/) ; <$f> }; close($f); - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } } @@ -2460,7 +2460,7 @@ sub clean_file_emails { $name = ""; } - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); + my @nw = split(/[^A-Za-zÀ-ɏ\'\,\.\+-]/, $name); if (@nw > 2) { my $first = $nw[@nw - 3]; my $middle = $nw[@nw - 2]; -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() 2022-09-16 8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau @ 2022-09-16 15:51 ` Martin Povišer 2022-09-17 14:11 ` Joe Perches 1 sibling, 0 replies; 6+ messages in thread From: Martin Povišer @ 2022-09-16 15:51 UTC (permalink / raw) To: Janne Grunau; +Cc: Joe Perches, linux-kernel > On 16. 9. 2022, at 10:47, Janne Grunau <j@jannau.net> wrote: > > Extend the regexp matching name characters to cover Unicode blocks Latin > Extended-A and Extended-B. > Fixes 'scripts/get_maintainer.pl -f' for > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > Signed-off-by: Janne Grunau <j@jannau.net> Applauded-and-tested-by: Martin Povišer <povik+lin@cutebit.org> On behalf of those not wanting to mangle our names to appease software, let me thank you. > This still excludes Greek and Cyrilic characters which should be > expected in names as well. I tried to use '\p{L}' to match all Unicode > letters but couldn't get it to work. Feel free understand this as bug > report with an incomplete fix. > > best regards, > Janne ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() 2022-09-16 8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau 2022-09-16 15:51 ` Martin Povišer @ 2022-09-17 14:11 ` Joe Perches 2022-09-18 17:03 ` Joe Perches 1 sibling, 1 reply; 6+ messages in thread From: Joe Perches @ 2022-09-17 14:11 UTC (permalink / raw) To: Janne Grunau; +Cc: linux-kernel On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > Extend the regexp matching name characters to cover Unicode blocks Latin > Extended-A and Extended-B. > Fixes 'scripts/get_maintainer.pl -f' for > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > Signed-off-by: Janne Grunau <j@jannau.net> > > --- > This still excludes Greek and Cyrilic characters which should be > expected in names as well. I tried to use '\p{L}' to match all Unicode > letters but couldn't get it to work. Feel free understand this as bug > report with an incomplete fix. Maybe use \p{XPosixAlpha} ? but I don't know what version of perl introduced this. > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl [] > @@ -442,7 +442,7 @@ sub maintainers_in_file { > my $text = do { local($/) ; <$f> }; > close($f); > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; ? > push(@file_emails, clean_file_emails(@poss_addr)); > } > } > @@ -2460,7 +2460,7 @@ sub clean_file_emails { > $name = ""; > } > > - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); > + my @nw = split(/[^A-Za-zÀ-ɏ\'\,\.\+-]/, $name); Maybe here too > + my @nw = split(/[^\p{XPosixAlpha}\'\,\.\+-]/, $name); Dunno haven't tested. Maybe you care to test? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() 2022-09-17 14:11 ` Joe Perches @ 2022-09-18 17:03 ` Joe Perches 2022-09-18 20:32 ` Janne Grunau 0 siblings, 1 reply; 6+ messages in thread From: Joe Perches @ 2022-09-18 17:03 UTC (permalink / raw) To: Janne Grunau; +Cc: linux-kernel, Martin Povišer On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote: > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > > Extend the regexp matching name characters to cover Unicode blocks Latin > > Extended-A and Extended-B. > > Fixes 'scripts/get_maintainer.pl -f' for > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > > > Signed-off-by: Janne Grunau <j@jannau.net> > > > > --- > > This still excludes Greek and Cyrilic characters which should be > > expected in names as well. I tried to use '\p{L}' to match all Unicode > > letters but couldn't get it to work. Feel free understand this as bug > > report with an incomplete fix. > > Maybe use \p{XPosixAlpha} ? > > but I don't know what version of perl introduced this. > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > [] > > @@ -442,7 +442,7 @@ sub maintainers_in_file { > > my $text = do { local($/) ; <$f> }; > > close($f); > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; Using variations of \p{posix} doesn't seem to work for at least perl 5.34. \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml, but I don't know how fragile it is. \p{print} might be too greedy... --- scripts/get_maintainer.pl | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index ab123b498fd9..790112c3e1d7 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -442,7 +442,7 @@ sub maintainers_in_file { my $text = do { local($/) ; <$f> }; close($f); - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; + my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } } @@ -2456,11 +2456,12 @@ sub clean_file_emails { foreach my $email (@file_emails) { $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g; my ($name, $address) = parse_email($email); + $name =~ s/^\p{space}*\p{punct}*\p{space}*//; if ($name eq '"[,\.]"') { $name = ""; } - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); + my @nw = split(/[^\p{print}\'\,\.\+-]/, $name); if (@nw > 2) { my $first = $nw[@nw - 3]; my $middle = $nw[@nw - 2]; ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() 2022-09-18 17:03 ` Joe Perches @ 2022-09-18 20:32 ` Janne Grunau 2022-09-18 21:40 ` Joe Perches 0 siblings, 1 reply; 6+ messages in thread From: Janne Grunau @ 2022-09-18 20:32 UTC (permalink / raw) To: Joe Perches; +Cc: linux-kernel, Martin Povišer On 2022-09-18 10:03:17 -0700, Joe Perches wrote: > On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote: > > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > > > Extend the regexp matching name characters to cover Unicode blocks Latin > > > Extended-A and Extended-B. > > > Fixes 'scripts/get_maintainer.pl -f' for > > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. > > > > > > Signed-off-by: Janne Grunau <j@jannau.net> > > > > > > --- > > > This still excludes Greek and Cyrilic characters which should be > > > expected in names as well. I tried to use '\p{L}' to match all Unicode > > > letters but couldn't get it to work. Feel free understand this as bug > > > report with an incomplete fix. > > > > Maybe use \p{XPosixAlpha} ? > > > > but I don't know what version of perl introduced this. > > > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > > [] > > > @@ -442,7 +442,7 @@ sub maintainers_in_file { > > > my $text = do { local($/) ; <$f> }; > > > close($f); > > > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > Using variations of \p{posix} doesn't seem to work for at least perl 5.34. > > \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml, > but I don't know how fragile it is. > > \p{print} might be too greedy... It is, it produces following diff (checking all files in Documentation/devicetree/bindings): -Lubomir Rintel <lkundrak@v3.sk> (in file) +"Copyright 2019,2020 Lubomir Rintel" <lkundrak@v3.sk> (in file) There are multiple hits of this form. The main issue is that \p{print} includes space. That however fixes many names with 3 parts. It still fails for "Rafał Miłecki <rafal@milecki.pl>" which my change handles correctly. I'm testing with perl 5.36 > --- > scripts/get_maintainer.pl | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > index ab123b498fd9..790112c3e1d7 100755 > --- a/scripts/get_maintainer.pl > +++ b/scripts/get_maintainer.pl > @@ -442,7 +442,7 @@ sub maintainers_in_file { > my $text = do { local($/) ; <$f> }; > close($f); > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > + my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > push(@file_emails, clean_file_emails(@poss_addr)); > } > } > @@ -2456,11 +2456,12 @@ sub clean_file_emails { > foreach my $email (@file_emails) { > $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g; > my ($name, $address) = parse_email($email); > + $name =~ s/^\p{space}*\p{punct}*\p{space}*//; This change is useful independently of the name regexp as it rejects '- <email@addr.ess>' (yaml list items) as valid name, email combination. Janne ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() 2022-09-18 20:32 ` Janne Grunau @ 2022-09-18 21:40 ` Joe Perches 0 siblings, 0 replies; 6+ messages in thread From: Joe Perches @ 2022-09-18 21:40 UTC (permalink / raw) To: Janne Grunau; +Cc: linux-kernel, Martin Povišer On Sun, 2022-09-18 at 22:32 +0200, Janne Grunau wrote: > On 2022-09-18 10:03:17 -0700, Joe Perches wrote: > > On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote: > > > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote: > > > > Extend the regexp matching name characters to cover Unicode blocks Latin > > > > Extended-A and Extended-B. > > > > Fixes 'scripts/get_maintainer.pl -f' for > > > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'. [] > > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl > > > [] > > > > @@ -442,7 +442,7 @@ sub maintainers_in_file { > > > > my $text = do { local($/) ; <$f> }; > > > > close($f); > > > > > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > + my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > > > my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > Using variations of \p{posix} doesn't seem to work for at least perl 5.34. > > > > \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml, > > but I don't know how fragile it is. > > > > \p{print} might be too greedy... > > It is, it produces following diff (checking all files in > Documentation/devicetree/bindings): > -Lubomir Rintel <lkundrak@v3.sk> (in file) > +"Copyright 2019,2020 Lubomir Rintel" <lkundrak@v3.sk> (in file) > > There are multiple hits of this form. The main issue is that \p{print} > includes space. That however fixes many names with 3 parts. right > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl [] > > @@ -2456,11 +2456,12 @@ sub clean_file_emails { > > foreach my $email (@file_emails) { > > $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g; > > my ($name, $address) = parse_email($email); > > + $name =~ s/^\p{space}*\p{punct}*\p{space}*//; > > This change is useful independently of the name regexp as it rejects > '- <email@addr.ess>' (yaml list items) as valid name, email combination. Good. The below might be a bit better too: $name =~ s/(?:\p{space}|\p{punct})*//; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-18 21:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-16 8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau 2022-09-16 15:51 ` Martin Povišer 2022-09-17 14:11 ` Joe Perches 2022-09-18 17:03 ` Joe Perches 2022-09-18 20:32 ` Janne Grunau 2022-09-18 21:40 ` Joe Perches
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.