All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
@ 2022-09-16  8:47 Janne Grunau
  2022-09-16 15:51 ` Martin Povišer
  2022-09-17 14:11 ` Joe Perches
  0 siblings, 2 replies; 6+ messages in thread
From: Janne Grunau @ 2022-09-16  8:47 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel

Extend the regexp matching name characters to cover Unicode blocks Latin
Extended-A and Extended-B.
Fixes 'scripts/get_maintainer.pl -f' for
'Documentation/devicetree/bindings/clock/apple,nco.yaml'.

Signed-off-by: Janne Grunau <j@jannau.net>

---
This still excludes Greek and Cyrilic characters which should be
expected in names as well. I tried to use '\p{L}' to match all Unicode
letters but couldn't get it to work. Feel free understand this as bug
report with an incomplete fix.

best regards,
Janne

---
 scripts/get_maintainer.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index ab123b498fd9..7c06f06dcbfa 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -442,7 +442,7 @@ sub maintainers_in_file {
 	my $text = do { local($/) ; <$f> };
 	close($f);
 
-	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
+	my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
 	push(@file_emails, clean_file_emails(@poss_addr));
     }
 }
@@ -2460,7 +2460,7 @@ sub clean_file_emails {
 	    $name = "";
 	}
 
-	my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name);
+	my @nw = split(/[^A-Za-zÀ-ɏ\'\,\.\+-]/, $name);
 	if (@nw > 2) {
 	    my $first = $nw[@nw - 3];
 	    my $middle = $nw[@nw - 2];
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
  2022-09-16  8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau
@ 2022-09-16 15:51 ` Martin Povišer
  2022-09-17 14:11 ` Joe Perches
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Povišer @ 2022-09-16 15:51 UTC (permalink / raw)
  To: Janne Grunau; +Cc: Joe Perches, linux-kernel


> On 16. 9. 2022, at 10:47, Janne Grunau <j@jannau.net> wrote:
> 
> Extend the regexp matching name characters to cover Unicode blocks Latin
> Extended-A and Extended-B.
> Fixes 'scripts/get_maintainer.pl -f' for
> 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
> 
> Signed-off-by: Janne Grunau <j@jannau.net>

Applauded-and-tested-by: Martin Povišer <povik+lin@cutebit.org>

On behalf of those not wanting to mangle our names to appease software,
let me thank you.

> This still excludes Greek and Cyrilic characters which should be
> expected in names as well. I tried to use '\p{L}' to match all Unicode
> letters but couldn't get it to work. Feel free understand this as bug
> report with an incomplete fix.
> 
> best regards,
> Janne


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
  2022-09-16  8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau
  2022-09-16 15:51 ` Martin Povišer
@ 2022-09-17 14:11 ` Joe Perches
  2022-09-18 17:03   ` Joe Perches
  1 sibling, 1 reply; 6+ messages in thread
From: Joe Perches @ 2022-09-17 14:11 UTC (permalink / raw)
  To: Janne Grunau; +Cc: linux-kernel

On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> Extend the regexp matching name characters to cover Unicode blocks Latin
> Extended-A and Extended-B.
> Fixes 'scripts/get_maintainer.pl -f' for
> 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
> 
> Signed-off-by: Janne Grunau <j@jannau.net>
> 
> ---
> This still excludes Greek and Cyrilic characters which should be
> expected in names as well. I tried to use '\p{L}' to match all Unicode
> letters but couldn't get it to work. Feel free understand this as bug
> report with an incomplete fix.

Maybe use \p{XPosixAlpha} ?

but I don't know what version of perl introduced this.

> diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
[]
> @@ -442,7 +442,7 @@ sub maintainers_in_file {
>  	my $text = do { local($/) ; <$f> };
>  	close($f);
>  
> -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> +	my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;

	my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;

?

>  	push(@file_emails, clean_file_emails(@poss_addr));
>      }
>  }
> @@ -2460,7 +2460,7 @@ sub clean_file_emails {
>  	    $name = "";
>  	}
>  
> -	my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name);
> +	my @nw = split(/[^A-Za-zÀ-ɏ\'\,\.\+-]/, $name);

Maybe here too

> +	my @nw = split(/[^\p{XPosixAlpha}\'\,\.\+-]/, $name);

Dunno haven't tested.  Maybe you care to test?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
  2022-09-17 14:11 ` Joe Perches
@ 2022-09-18 17:03   ` Joe Perches
  2022-09-18 20:32     ` Janne Grunau
  0 siblings, 1 reply; 6+ messages in thread
From: Joe Perches @ 2022-09-18 17:03 UTC (permalink / raw)
  To: Janne Grunau; +Cc: linux-kernel, Martin Povišer

On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote:
> On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> > Extend the regexp matching name characters to cover Unicode blocks Latin
> > Extended-A and Extended-B.
> > Fixes 'scripts/get_maintainer.pl -f' for
> > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
> > 
> > Signed-off-by: Janne Grunau <j@jannau.net>
> > 
> > ---
> > This still excludes Greek and Cyrilic characters which should be
> > expected in names as well. I tried to use '\p{L}' to match all Unicode
> > letters but couldn't get it to work. Feel free understand this as bug
> > report with an incomplete fix.
> 
> Maybe use \p{XPosixAlpha} ?
> 
> but I don't know what version of perl introduced this.
> 
> > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> []
> > @@ -442,7 +442,7 @@ sub maintainers_in_file {
> >  	my $text = do { local($/) ; <$f> };
> >  	close($f);
> >  
> > -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > +	my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> 
> 	my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;

Using variations of \p{posix} doesn't seem to work for at least perl 5.34.

\p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml,
but I don't know how fragile it is.

\p{print} might be too greedy...

---
 scripts/get_maintainer.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index ab123b498fd9..790112c3e1d7 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -442,7 +442,7 @@ sub maintainers_in_file {
 	my $text = do { local($/) ; <$f> };
 	close($f);
 
-	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
+	my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
 	push(@file_emails, clean_file_emails(@poss_addr));
     }
 }
@@ -2456,11 +2456,12 @@ sub clean_file_emails {
     foreach my $email (@file_emails) {
 	$email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
 	my ($name, $address) = parse_email($email);
+	$name =~ s/^\p{space}*\p{punct}*\p{space}*//;
 	if ($name eq '"[,\.]"') {
 	    $name = "";
 	}
 
-	my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name);
+	my @nw = split(/[^\p{print}\'\,\.\+-]/, $name);
 	if (@nw > 2) {
 	    my $first = $nw[@nw - 3];
 	    my $middle = $nw[@nw - 2];

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
  2022-09-18 17:03   ` Joe Perches
@ 2022-09-18 20:32     ` Janne Grunau
  2022-09-18 21:40       ` Joe Perches
  0 siblings, 1 reply; 6+ messages in thread
From: Janne Grunau @ 2022-09-18 20:32 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, Martin Povišer

On 2022-09-18 10:03:17 -0700, Joe Perches wrote:
> On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote:
> > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> > > Extend the regexp matching name characters to cover Unicode blocks Latin
> > > Extended-A and Extended-B.
> > > Fixes 'scripts/get_maintainer.pl -f' for
> > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
> > > 
> > > Signed-off-by: Janne Grunau <j@jannau.net>
> > > 
> > > ---
> > > This still excludes Greek and Cyrilic characters which should be
> > > expected in names as well. I tried to use '\p{L}' to match all Unicode
> > > letters but couldn't get it to work. Feel free understand this as bug
> > > report with an incomplete fix.
> > 
> > Maybe use \p{XPosixAlpha} ?
> > 
> > but I don't know what version of perl introduced this.
> > 
> > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> > []
> > > @@ -442,7 +442,7 @@ sub maintainers_in_file {
> > >  	my $text = do { local($/) ; <$f> };
> > >  	close($f);
> > >  
> > > -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > +	my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > 
> > 	my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> 
> Using variations of \p{posix} doesn't seem to work for at least perl 5.34.
> 
> \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml,
> but I don't know how fragile it is.
> 
> \p{print} might be too greedy...

It is, it produces following diff (checking all files in 
Documentation/devicetree/bindings):
-Lubomir Rintel <lkundrak@v3.sk> (in file)
+"Copyright 2019,2020 Lubomir Rintel" <lkundrak@v3.sk> (in file)

There are multiple hits of this form. The main issue is that \p{print} 
includes space. That however fixes many names with 3 parts.

It still fails for "Rafał Miłecki <rafal@milecki.pl>" which my change 
handles correctly.

I'm testing with perl 5.36

> ---
>  scripts/get_maintainer.pl | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> index ab123b498fd9..790112c3e1d7 100755
> --- a/scripts/get_maintainer.pl
> +++ b/scripts/get_maintainer.pl
> @@ -442,7 +442,7 @@ sub maintainers_in_file {
>  	my $text = do { local($/) ; <$f> };
>  	close($f);
>  
> -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> +	my @poss_addr = $text =~ m$[\p{print}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
>  	push(@file_emails, clean_file_emails(@poss_addr));
>      }
>  }
> @@ -2456,11 +2456,12 @@ sub clean_file_emails {
>      foreach my $email (@file_emails) {
>  	$email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
>  	my ($name, $address) = parse_email($email);
> +	$name =~ s/^\p{space}*\p{punct}*\p{space}*//;

This change is useful independently of the name regexp as it rejects
'- <email@addr.ess>' (yaml list items) as valid name, email combination.

Janne


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file()
  2022-09-18 20:32     ` Janne Grunau
@ 2022-09-18 21:40       ` Joe Perches
  0 siblings, 0 replies; 6+ messages in thread
From: Joe Perches @ 2022-09-18 21:40 UTC (permalink / raw)
  To: Janne Grunau; +Cc: linux-kernel, Martin Povišer

On Sun, 2022-09-18 at 22:32 +0200, Janne Grunau wrote:
> On 2022-09-18 10:03:17 -0700, Joe Perches wrote:
> > On Sat, 2022-09-17 at 07:11 -0700, Joe Perches wrote:
> > > On Fri, 2022-09-16 at 10:47 +0200, Janne Grunau wrote:
> > > > Extend the regexp matching name characters to cover Unicode blocks Latin
> > > > Extended-A and Extended-B.
> > > > Fixes 'scripts/get_maintainer.pl -f' for
> > > > 'Documentation/devicetree/bindings/clock/apple,nco.yaml'.
[]
> > > > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
> > > []
> > > > @@ -442,7 +442,7 @@ sub maintainers_in_file {
> > > >  	my $text = do { local($/) ; <$f> };
> > > >  	close($f);
> > > >  
> > > > -	my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > > +	my @poss_addr = $text =~ m$[A-Za-zÀ-ɏ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > > 
> > > 	my @poss_addr = $text =~ m$[\p{XPosixAlpha}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g;
> > 
> > Using variations of \p{posix} doesn't seem to work for at least perl 5.34.
> > 
> > \p{print} seems to work for Documentation/devicetree/bindings/clock/apple,nco.yaml,
> > but I don't know how fragile it is.
> > 
> > \p{print} might be too greedy...
> 
> It is, it produces following diff (checking all files in 
> Documentation/devicetree/bindings):
> -Lubomir Rintel <lkundrak@v3.sk> (in file)
> +"Copyright 2019,2020 Lubomir Rintel" <lkundrak@v3.sk> (in file)
> 
> There are multiple hits of this form. The main issue is that \p{print} 
> includes space. That however fixes many names with 3 parts.

right

> > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
[]
> > @@ -2456,11 +2456,12 @@ sub clean_file_emails {
> >      foreach my $email (@file_emails) {
> >  	$email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g;
> >  	my ($name, $address) = parse_email($email);
> > +	$name =~ s/^\p{space}*\p{punct}*\p{space}*//;
> 
> This change is useful independently of the name regexp as it rejects
> '- <email@addr.ess>' (yaml list items) as valid name, email combination.

Good.  The below might be a bit better too:

	$name =~ s/(?:\p{space}|\p{punct})*//;


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-18 21:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-16  8:47 [PATCH] get_maintainer: Extend matched name characters in maintainers_in_file() Janne Grunau
2022-09-16 15:51 ` Martin Povišer
2022-09-17 14:11 ` Joe Perches
2022-09-18 17:03   ` Joe Perches
2022-09-18 20:32     ` Janne Grunau
2022-09-18 21:40       ` Joe Perches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.