linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] duplicate #include check for build system
@ 2006-02-21  1:48 Herbert Poetzl
  2006-02-21  6:10 ` Sam Ravnborg
  2006-02-21  7:29 ` Daniel Barkalow
  0 siblings, 2 replies; 11+ messages in thread
From: Herbert Poetzl @ 2006-02-21  1:48 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Linux Kernel ML


Hi Sam! Folks!

recently had the idea to utilize cpp or sparse to
do some automated #include checking, and I came up
with the following proof of concept:

I just replaced the sparse binary by the following
script (basically hijacking the make C=1 system)

it would allow kernel developers to easily identify
duplicate includes, which in turn, might reduce
dependancies and thus build time ...


----------------
#!/bin/bash

while [ $# -gt 1 ]; do
  case $1 in
        -D*) DEF="$DEF $1";;
        -W*) ;;
          *) OPT="$OPT $1";;
  esac
  shift
done

# ( $CPP $DEF -H -dI $OPT $1 1>&2 ) 2>&1 | grep "^[#.]"
$CPP $DEF -H  $OPT $1 2>&1 >/dev/null | gawk -vFILE="$1" '

BEGIN	{ I[0]=FILE; }

/^[.]+/ { D=length($1);

	  for (i=0; i<D; i++)
    	    C[i,$2]++; 
	    
	  I[D]=$2;

	  for (i=0; i<D; i++)
	    M[i,$2]=I[i];

	  if (C[D-1,$2]>1) {
	    printf "··· %s in %s ",$2,I[D-1];

	    for (i=D; M[i,$2]; i++) 
	      printf "%c%s", (i==D)?"[":"·", M[i,$2];

	    printf (i>D) ? "]\n" :
	    	((X[D,$2]==I[D-1]) ? "(dup)\n" : "\n");
	  }

          X[D,$2]=I[D-1];
        }
' 1>&2

true
----------------

of course, most of it would not be required if
there was support in the kernel build system,
and, if there is any preference for perl over
bash/gawk it could be easily rewritten ...

please let me know what you think of this and if
you could imagine adding something similar to the
build system 

TIA,
Herbert


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-21  1:48 [RFC] duplicate #include check for build system Herbert Poetzl
@ 2006-02-21  6:10 ` Sam Ravnborg
  2006-02-21  7:29 ` Daniel Barkalow
  1 sibling, 0 replies; 11+ messages in thread
From: Sam Ravnborg @ 2006-02-21  6:10 UTC (permalink / raw)
  To: Linux Kernel ML

On Tue, Feb 21, 2006 at 02:48:24AM +0100, Herbert Poetzl wrote:
> 
> Hi Sam! Folks!
> 
> recently had the idea to utilize cpp or sparse to
> do some automated #include checking, and I came up
> with the following proof of concept:
Nice way - looks better than randomly parsing a lot of files.
 
> I just replaced the sparse binary by the following
> script (basically hijacking the make C=1 system)
Thats just fine. It is made general to support more than sparse if
feasible.
 
> of course, most of it would not be required if
> there was support in the kernel build system,
I do not see what you think could improve things on the kbuild side.
Please elaborate a bit more.

> and, if there is any preference for perl over
> bash/gawk it could be easily rewritten ...
For tools that is run only now and then you are fine to use perl if you
prefer. We have though until now avoided it as a strict dependency.
Also a few more comments would be nice so even I understand the script.

	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-21  1:48 [RFC] duplicate #include check for build system Herbert Poetzl
  2006-02-21  6:10 ` Sam Ravnborg
@ 2006-02-21  7:29 ` Daniel Barkalow
  2006-02-21 17:52   ` Sam Ravnborg
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2006-02-21  7:29 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: Sam Ravnborg, Linux Kernel ML

On Tue, 21 Feb 2006, Herbert Poetzl wrote:

> Hi Sam! Folks!
> 
> recently had the idea to utilize cpp or sparse to
> do some automated #include checking, and I came up
> with the following proof of concept:
> 
> I just replaced the sparse binary by the following
> script (basically hijacking the make C=1 system)
> 
> it would allow kernel developers to easily identify
> duplicate includes, which in turn, might reduce
> dependancies and thus build time ...

I think the kernel style is to encourage duplicate includes, rather than 
removing them. Removing duplicate includes won't remove any dependancies 
(since the includes that they duplicate will remain). And it makes it 
harder to remove unnecessary includes (which does reduce dependancies), 
because when header A stops needing header B, various other code could 
expect that including header A means they get header B, and these places 
have to be found and the formerly-duplicate include put back. So you 
actually do best to have lots of duplicate includes and aggressively prune 
unnecessary includes.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-21  7:29 ` Daniel Barkalow
@ 2006-02-21 17:52   ` Sam Ravnborg
  2006-02-22  0:11     ` Herbert Poetzl
  2006-02-24  7:23     ` Jan Engelhardt
  0 siblings, 2 replies; 11+ messages in thread
From: Sam Ravnborg @ 2006-02-21 17:52 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Herbert Poetzl, Linux Kernel ML

On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> On Tue, 21 Feb 2006, Herbert Poetzl wrote:
 
> I think the kernel style is to encourage duplicate includes, rather than 
> removing them. Removing duplicate includes won't remove any dependancies 
> (since the includes that they duplicate will remain).
The style as I have understood it is that each .h file in include/linux/
are supposed to be self-contained. So it includes what is needs, and the
'what it needs' are kept small.

Keeping the 'what it needs' part small is a challenge resulting in
smaller .h files. But also a good way to keep related things together.

	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-21 17:52   ` Sam Ravnborg
@ 2006-02-22  0:11     ` Herbert Poetzl
  2006-02-22  0:18       ` Randy.Dunlap
  2006-02-22  5:57       ` Sam Ravnborg
  2006-02-24  7:23     ` Jan Engelhardt
  1 sibling, 2 replies; 11+ messages in thread
From: Herbert Poetzl @ 2006-02-22  0:11 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Daniel Barkalow, Linux Kernel ML

On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
>  
> > I think the kernel style is to encourage duplicate includes, rather than 
> > removing them. Removing duplicate includes won't remove any dependancies 
> > (since the includes that they duplicate will remain).

> The style as I have understood it is that each .h file in include/linux/
> are supposed to be self-contained. So it includes what is needs, and the
> 'what it needs' are kept small.
> 
> Keeping the 'what it needs' part small is a challenge resulting in
> smaller .h files. But also a good way to keep related things together.

glad that I stimulated a philosophical discussion
about the kernel header files and what they should
include or not ... 

but the idea was more to give the developers an
instrument to verify that they are not including
stuff several times, and that's actually in .h
and .c files, because it seems that often the same
header file is included twice in the _same_ file

anyway, was this a positive or negative reply?

TIA,
Herbert
 
> 	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-22  0:11     ` Herbert Poetzl
@ 2006-02-22  0:18       ` Randy.Dunlap
  2006-02-22  0:26         ` Herbert Poetzl
  2006-02-22  5:57       ` Sam Ravnborg
  1 sibling, 1 reply; 11+ messages in thread
From: Randy.Dunlap @ 2006-02-22  0:18 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: Sam Ravnborg, Daniel Barkalow, Linux Kernel ML

On Wed, 22 Feb 2006, Herbert Poetzl wrote:

> On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> >
> > > I think the kernel style is to encourage duplicate includes, rather than
> > > removing them. Removing duplicate includes won't remove any dependancies
> > > (since the includes that they duplicate will remain).
>
> > The style as I have understood it is that each .h file in include/linux/
> > are supposed to be self-contained. So it includes what is needs, and the
> > 'what it needs' are kept small.
> >
> > Keeping the 'what it needs' part small is a challenge resulting in
> > smaller .h files. But also a good way to keep related things together.
>
> glad that I stimulated a philosophical discussion
> about the kernel header files and what they should
> include or not ...
>
> but the idea was more to give the developers an
> instrument to verify that they are not including
> stuff several times, and that's actually in .h
> and .c files, because it seems that often the same
> header file is included twice in the _same_ file
>
> anyway, was this a positive or negative reply?

Hi Herbert,

The goal is not to remove the most possible #includes.

E.g., if sched.h already sucks in kernel.h,
kernel.h still should be #included if the source (.c)
files uses any APIs or extern data from kernel.h.

Does that help?

-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-22  0:18       ` Randy.Dunlap
@ 2006-02-22  0:26         ` Herbert Poetzl
  2006-02-22  0:28           ` Randy.Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Herbert Poetzl @ 2006-02-22  0:26 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Sam Ravnborg, Daniel Barkalow, Linux Kernel ML

On Tue, Feb 21, 2006 at 04:18:57PM -0800, Randy.Dunlap wrote:
> On Wed, 22 Feb 2006, Herbert Poetzl wrote:
> 
> > On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> > >
> > > > I think the kernel style is to encourage duplicate includes,
> > > > rather than removing them. Removing duplicate includes won't
> > > > remove any dependancies (since the includes that they duplicate
> > > > will remain).
> >
> > > The style as I have understood it is that each .h file in
> > > include/linux/ are supposed to be self-contained. So it includes
> > > what is needs, and the 'what it needs' are kept small.
> > >
> > > Keeping the 'what it needs' part small is a challenge resulting
> > > in smaller .h files. But also a good way to keep related things
> > > together.
> >
> > glad that I stimulated a philosophical discussion
> > about the kernel header files and what they should
> > include or not ...
> >
> > but the idea was more to give the developers an
> > instrument to verify that they are not including
> > stuff several times, and that's actually in .h
> > and .c files, because it seems that often the same
> > header file is included twice in the _same_ file
> >
> > anyway, was this a positive or negative reply?
> 
> Hi Herbert,
> 
> The goal is not to remove the most possible #includes.

which I totally agree with ...

but a) how is that related to _having_ a tool to
check for duplicate includes, and b) how is it
related to removing duplicate includes in general?

let me give a simple example here:

 #include <linux/pm.h>
 #include <linux/sched.h>
 #include <linux/proc_fs.h>
-#include <linux/pm.h>

do you think the second one is really desired?

> E.g., if sched.h already sucks in kernel.h,
> kernel.h still should be #included if the source (.c)
> files uses any APIs or extern data from kernel.h.
> 
> Does that help?

no, sorry, doesn't help here ...

nevertheless thanks,
Herbert

> -- 
> ~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-22  0:26         ` Herbert Poetzl
@ 2006-02-22  0:28           ` Randy.Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: Randy.Dunlap @ 2006-02-22  0:28 UTC (permalink / raw)
  To: Herbert Poetzl
  Cc: Randy.Dunlap, Sam Ravnborg, Daniel Barkalow, Linux Kernel ML

On Wed, 22 Feb 2006, Herbert Poetzl wrote:

> On Tue, Feb 21, 2006 at 04:18:57PM -0800, Randy.Dunlap wrote:
> > On Wed, 22 Feb 2006, Herbert Poetzl wrote:
> >
> > > On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > > > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> > > >
> > > > > I think the kernel style is to encourage duplicate includes,
> > > > > rather than removing them. Removing duplicate includes won't
> > > > > remove any dependancies (since the includes that they duplicate
> > > > > will remain).
> > >
> > > > The style as I have understood it is that each .h file in
> > > > include/linux/ are supposed to be self-contained. So it includes
> > > > what is needs, and the 'what it needs' are kept small.
> > > >
> > > > Keeping the 'what it needs' part small is a challenge resulting
> > > > in smaller .h files. But also a good way to keep related things
> > > > together.
> > >
> > > glad that I stimulated a philosophical discussion
> > > about the kernel header files and what they should
> > > include or not ...
> > >
> > > but the idea was more to give the developers an
> > > instrument to verify that they are not including
> > > stuff several times, and that's actually in .h
> > > and .c files, because it seems that often the same
> > > header file is included twice in the _same_ file
> > >
> > > anyway, was this a positive or negative reply?
> >
> > Hi Herbert,
> >
> > The goal is not to remove the most possible #includes.
>
> which I totally agree with ...
>
> but a) how is that related to _having_ a tool to
> check for duplicate includes, and b) how is it
> related to removing duplicate includes in general?
>
> let me give a simple example here:
>
>  #include <linux/pm.h>
>  #include <linux/sched.h>
>  #include <linux/proc_fs.h>
> -#include <linux/pm.h>
>
> do you think the second one is really desired?

Of course not.

> > E.g., if sched.h already sucks in kernel.h,
> > kernel.h still should be #included if the source (.c)
> > files uses any APIs or extern data from kernel.h.
> >
> > Does that help?
>
> no, sorry, doesn't help here ...
>
> nevertheless thanks,
> Herbert

-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-22  0:11     ` Herbert Poetzl
  2006-02-22  0:18       ` Randy.Dunlap
@ 2006-02-22  5:57       ` Sam Ravnborg
  1 sibling, 0 replies; 11+ messages in thread
From: Sam Ravnborg @ 2006-02-22  5:57 UTC (permalink / raw)
  To: Daniel Barkalow, Linux Kernel ML

On Wed, Feb 22, 2006 at 01:11:53AM +0100, Herbert Poetzl wrote:
> > 
> > Keeping the 'what it needs' part small is a challenge resulting in
> > smaller .h files. But also a good way to keep related things together.
> 
> glad that I stimulated a philosophical discussion
> about the kernel header files and what they should
> include or not ... 
> 
> but the idea was more to give the developers an
> instrument to verify that they are not including
> stuff several times, and that's actually in .h
> and .c files, because it seems that often the same
> header file is included twice in the _same_ file
> 
> anyway, was this a positive or negative reply?

If you can restrict the tool to locate the cases where
a header file is included twice in the same file - then
this is a positive reply.
If you limit the check to the top-level file this is fine,
but I would not be suprised is one or a few .h files
has duplicated includes too.

	Sam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-21 17:52   ` Sam Ravnborg
  2006-02-22  0:11     ` Herbert Poetzl
@ 2006-02-24  7:23     ` Jan Engelhardt
  2006-02-25  2:59       ` Randy.Dunlap
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Engelhardt @ 2006-02-24  7:23 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Daniel Barkalow, Herbert Poetzl, Linux Kernel ML

>> I think the kernel style is to encourage duplicate includes, rather than 
>> removing them. Removing duplicate includes won't remove any dependancies 
>> (since the includes that they duplicate will remain).
>The style as I have understood it is that each .h file in include/linux/
>are supposed to be self-contained. So it includes what is needs, and the
>'what it needs' are kept small.
>
>Keeping the 'what it needs' part small is a challenge resulting in
>smaller .h files. But also a good way to keep related things together.
>
How far does this go? Consider the following hypothetical case:

---dcache.h---
struct dentry {
   ...
};
---fs.h---
#include "dcache.h"
struct inode {
    struct dentry *de;
};

Since only a pointer to struct dentry is involved, I would compress it to:

---fs.h---
struct dentry;
struct inode {
    struct dentry *de;
};

The fs.h file still "compiles" (gcc -xc fs.h), and there is one file less 
to be read. And since dcache.h in this case here should anyway be included 
in the .c file if *DE is dereferenced, I do not see a problem with this.
Objections?



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] duplicate #include check for build system
  2006-02-24  7:23     ` Jan Engelhardt
@ 2006-02-25  2:59       ` Randy.Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: Randy.Dunlap @ 2006-02-25  2:59 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: sam, barkalow, herbert, linux-kernel

On Fri, 24 Feb 2006 08:23:23 +0100 (MET) Jan Engelhardt wrote:

> >> I think the kernel style is to encourage duplicate includes, rather than 
> >> removing them. Removing duplicate includes won't remove any dependancies 
> >> (since the includes that they duplicate will remain).
> >The style as I have understood it is that each .h file in include/linux/
> >are supposed to be self-contained. So it includes what is needs, and the
> >'what it needs' are kept small.
> >
> >Keeping the 'what it needs' part small is a challenge resulting in
> >smaller .h files. But also a good way to keep related things together.
> >
> How far does this go? Consider the following hypothetical case:
> 
> ---dcache.h---
> struct dentry {
>    ...
> };
> ---fs.h---
> #include "dcache.h"
> struct inode {
>     struct dentry *de;
> };
> 
> Since only a pointer to struct dentry is involved, I would compress it to:
> 
> ---fs.h---
> struct dentry;
> struct inode {
>     struct dentry *de;
> };
> 
> The fs.h file still "compiles" (gcc -xc fs.h), and there is one file less 
> to be read. And since dcache.h in this case here should anyway be included 
> in the .c file if *DE is dereferenced, I do not see a problem with this.
> Objections?

Nope, your method is good & correct AFAIAC.

---
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-02-25  2:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-21  1:48 [RFC] duplicate #include check for build system Herbert Poetzl
2006-02-21  6:10 ` Sam Ravnborg
2006-02-21  7:29 ` Daniel Barkalow
2006-02-21 17:52   ` Sam Ravnborg
2006-02-22  0:11     ` Herbert Poetzl
2006-02-22  0:18       ` Randy.Dunlap
2006-02-22  0:26         ` Herbert Poetzl
2006-02-22  0:28           ` Randy.Dunlap
2006-02-22  5:57       ` Sam Ravnborg
2006-02-24  7:23     ` Jan Engelhardt
2006-02-25  2:59       ` Randy.Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).