[RFC,v3] scripts: kernel-doc: fix typedef support for struct/union parsing
diff mbox series

Message ID 20210225145033.11431-1-yashsri421@gmail.com
State In Next
Commit 4754eeb04933bfc856588ddc60a5ba01f75eaa91
Headers show
Series
  • [RFC,v3] scripts: kernel-doc: fix typedef support for struct/union parsing
Related show

Commit Message

Aditya Srivastava Feb. 25, 2021, 2:50 p.m. UTC
Currently, there are ~1290 occurrences in 447 files in the kernel tree
'typedef struct/union' syntax for defining some struct/union. However,
kernel-doc currently does not support that syntax. Of the ~1290
occurrences, there are four occurrences in ./include/linux/zstd.h with
typedef struct/union syntax and a preceding kernel-doc; all other
occurrences have no preceding kernel-doc.

Add support for parsing struct/union following this syntax.

Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
---
Changes in v3:
- Modify commit message

Changes in v2:
- Split recurring regex into multiple variables
- Modify commit message

 scripts/kernel-doc | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Jonathan Corbet March 1, 2021, 9:51 p.m. UTC | #1
Aditya Srivastava <yashsri421@gmail.com> writes:

> Currently, there are ~1290 occurrences in 447 files in the kernel tree
> 'typedef struct/union' syntax for defining some struct/union. However,
> kernel-doc currently does not support that syntax. Of the ~1290
> occurrences, there are four occurrences in ./include/linux/zstd.h with
> typedef struct/union syntax and a preceding kernel-doc; all other
> occurrences have no preceding kernel-doc.
>
> Add support for parsing struct/union following this syntax.
>
> Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>

Applied, thanks.

jon
Matthew Wilcox March 6, 2021, 4:35 a.m. UTC | #2
On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
> +++ b/scripts/kernel-doc
> @@ -1201,12 +1201,23 @@ sub dump_union($$) {
>  sub dump_struct($$) {
>      my $x = shift;
>      my $file = shift;
> +    my $decl_type;
> +    my $members;
> +    my $type = qr{struct|union};
> +    # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
> +    my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
> -    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> -	my $decl_type = $1;
> +    if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
> +	$decl_type = $1;
>  	$declaration_name = $2;
> -	my $members = $3;
> +	$members = $3;
> +    } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
> +	$decl_type = $1;
> +	$declaration_name = $3;
> +	$members = $2;
> +    }

In the same spirit as dump_function, would something like this work?

-    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
+    $x =~ s/__packed +//;
+    $x =~ s/__aligned +//;
+    $x =~ s/____cacheline_aligned_in_smp +//;
+    $x =~ s/____cacheline_aligned +//;
+    $x =~ s/__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)//;
+
+    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*)*/) {
Lukas Bulwahn March 6, 2021, 6:25 a.m. UTC | #3
On Sat, Mar 6, 2021 at 5:35 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
> > +++ b/scripts/kernel-doc
> > @@ -1201,12 +1201,23 @@ sub dump_union($$) {
> >  sub dump_struct($$) {
> >      my $x = shift;
> >      my $file = shift;
> > +    my $decl_type;
> > +    my $members;
> > +    my $type = qr{struct|union};
> > +    # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
> > +    my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
> > -    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> > -     my $decl_type = $1;
> > +    if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
> > +     $decl_type = $1;
> >       $declaration_name = $2;
> > -     my $members = $3;
> > +     $members = $3;
> > +    } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
> > +     $decl_type = $1;
> > +     $declaration_name = $3;
> > +     $members = $2;
> > +    }
>
> In the same spirit as dump_function, would something like this work?
>

I agree. That might be a suitable clean-up to keep the code for
functions and struct/union parsing similar in style/spirit.

Aditya, would you like to create a patch for that?

Lukas
Aditya Srivastava March 6, 2021, 7:48 a.m. UTC | #4
On 6/3/21 11:55 am, Lukas Bulwahn wrote:
> On Sat, Mar 6, 2021 at 5:35 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
>>> +++ b/scripts/kernel-doc
>>> @@ -1201,12 +1201,23 @@ sub dump_union($$) {
>>>  sub dump_struct($$) {
>>>      my $x = shift;
>>>      my $file = shift;
>>> +    my $decl_type;
>>> +    my $members;
>>> +    my $type = qr{struct|union};
>>> +    # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
>>> +    my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
>>> -    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
>>> -     my $decl_type = $1;
>>> +    if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
>>> +     $decl_type = $1;
>>>       $declaration_name = $2;
>>> -     my $members = $3;
>>> +     $members = $3;
>>> +    } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
>>> +     $decl_type = $1;
>>> +     $declaration_name = $3;
>>> +     $members = $2;
>>> +    }
>>
>> In the same spirit as dump_function, would something like this work?
>>
> 
> I agree. That might be a suitable clean-up to keep the code for
> functions and struct/union parsing similar in style/spirit.
> 
> Aditya, would you like to create a patch for that?
> 

Sure Lukas.
I have a doubt though, Can't we use a single expression separated by
"|" here, instead of multiple lines? i.e.,

$x =~
s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;


Probably we could do something similar for dump_function, i.e.,
-    $prototype =~ s/^static +//;
-    $prototype =~ s/^extern +//;
-    $prototype =~ s/^asmlinkage +//;
-    $prototype =~ s/^inline +//;
-    $prototype =~ s/^__inline__ +//;
-    $prototype =~ s/^__inline +//;
-    $prototype =~ s/^__always_inline +//;
-    $prototype =~ s/^noinline +//;

+    $prototype =~
s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
And so on for other regexps.

What do you think?

Thanks
Aditya
Matthew Wilcox March 6, 2021, 3:20 p.m. UTC | #5
On Sat, Mar 06, 2021 at 01:18:38PM +0530, Aditya wrote:
> On 6/3/21 11:55 am, Lukas Bulwahn wrote:
> > I agree. That might be a suitable clean-up to keep the code for
> > functions and struct/union parsing similar in style/spirit.
> > 
> > Aditya, would you like to create a patch for that?
> 
> Sure Lukas.
> I have a doubt though, Can't we use a single expression separated by
> "|" here, instead of multiple lines? i.e.,
> 
> $x =~
> s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;
> 
> 
> Probably we could do something similar for dump_function, i.e.,
> -    $prototype =~ s/^static +//;
> -    $prototype =~ s/^extern +//;
> -    $prototype =~ s/^asmlinkage +//;
> -    $prototype =~ s/^inline +//;
> -    $prototype =~ s/^__inline__ +//;
> -    $prototype =~ s/^__inline +//;
> -    $prototype =~ s/^__always_inline +//;
> -    $prototype =~ s/^noinline +//;
> 
> +    $prototype =~
> s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
> And so on for other regexps.
> 
> What do you think?

I think there's a tradeoff between speed / compactness and readability.
As someone who doesn't know perl particularly well, I can look at the
series of lines and say "Oh, it's stripping out these unwanted things".
Your one line, while undoubtedly more efficient, is considerably less
easy to understand.

Maybe there's another way to do it that's more efficient while not
sacrificing the readability?

Also, would your suggestion work for 'static inline void foo(void)'?
I think it needs to remove multiple occurrences of the things in the
regex.  But maybe that's what the ?: on the beginning is for?
Aditya Srivastava March 7, 2021, 7:36 a.m. UTC | #6
On 6/3/21 8:50 pm, Matthew Wilcox wrote:
> On Sat, Mar 06, 2021 at 01:18:38PM +0530, Aditya wrote:
>> On 6/3/21 11:55 am, Lukas Bulwahn wrote:
>>> I agree. That might be a suitable clean-up to keep the code for
>>> functions and struct/union parsing similar in style/spirit.
>>>
>>> Aditya, would you like to create a patch for that?
>>
>> Sure Lukas.
>> I have a doubt though, Can't we use a single expression separated by
>> "|" here, instead of multiple lines? i.e.,
>>
>> $x =~
>> s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;
>>
>>
>> Probably we could do something similar for dump_function, i.e.,
>> -    $prototype =~ s/^static +//;
>> -    $prototype =~ s/^extern +//;
>> -    $prototype =~ s/^asmlinkage +//;
>> -    $prototype =~ s/^inline +//;
>> -    $prototype =~ s/^__inline__ +//;
>> -    $prototype =~ s/^__inline +//;
>> -    $prototype =~ s/^__always_inline +//;
>> -    $prototype =~ s/^noinline +//;
>>
>> +    $prototype =~
>> s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
>> And so on for other regexps.
>>
>> What do you think?
> 
> I think there's a tradeoff between speed / compactness and readability.
> As someone who doesn't know perl particularly well, I can look at the
> series of lines and say "Oh, it's stripping out these unwanted things".
> Your one line, while undoubtedly more efficient, is considerably less
> easy to understand.
> 
> Maybe there's another way to do it that's more efficient while not
> sacrificing the readability?
> 
> Also, would your suggestion work for 'static inline void foo(void)'?
> I think it needs to remove multiple occurrences of the things in the
> regex.  

Ah, I get it.

But maybe that's what the ?: on the beginning is for?
> 

No, "?:" is just to use this regex for matching, without capturing.
So, the regex will just remove any of those 'starting' occurrences,
consequently, "static inline" occurrence will probably not be removed.
I think the reason for using multiple lines for substitution in
dump_function is for the same reason, ie, subsequent substitution.

But for dump_struct, it is probably not desired, i.e., subsequent
substitution/removal. Also, for the same reason, using it with
dump_struct may cause unwelcomed discrepancies, or cause the user to
understand that here multiple substitutions are required, but is not so.
So, I think that we should use a single expression substitution for
dump_struct, at best. But am not sure if that is required, as
currently also we are not capturing this part of the regex, as well
as, matching only at certain position of the definition expression.

But I am curious to what others think about this.
Will be happy to work on this patch :)

Thanks
Aditya

Patch
diff mbox series

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 8b5bc7bf4bb8..68df17877384 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1201,12 +1201,23 @@  sub dump_union($$) {
 sub dump_struct($$) {
     my $x = shift;
     my $file = shift;
+    my $decl_type;
+    my $members;
+    my $type = qr{struct|union};
+    # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
+    my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
 
-    if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
-	my $decl_type = $1;
+    if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
+	$decl_type = $1;
 	$declaration_name = $2;
-	my $members = $3;
+	$members = $3;
+    } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
+	$decl_type = $1;
+	$declaration_name = $3;
+	$members = $2;
+    }
 
+    if ($members) {
 	if ($identifier ne $declaration_name) {
 	    print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
 	    return;