dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Can't persuade pahole to see through forward declarations
       [not found]   ` <20090617170217.GB21530-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
@ 2009-06-17 17:25     ` Zack Weinberg
       [not found]       ` <20090617102506.34aaf8e2-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Zack Weinberg @ 2009-06-17 17:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves-u79uwXL29TY76Z2rM5mHXA

Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Em Mon, Jun 15, 2009 at 05:24:09PM -0700, Zack Weinberg escreveu:
> > I've been experimenting with your 'pahole' utility to analyze some
> > of the data structures in Mozilla.  I've run into trouble because
> > pahole seems to give up when it hits a forward declaration (a type
> > DIE with DW_AT_declaration=1).  For instance, this is what -E -M
> > mode prints for one of the classes of interest:
> > 
> > class nsFrame : public nsBox, public nsIFrameDebug {
> > public:
> > 	/* class nsBox {
> > 	} <ancestor>; */              /*     0     0 */
> > 
> > 	/* XXX 64 bytes hole, try to pack */
> > 
> > 	/* --- cacheline 1 boundary (64 bytes) --- */
> > 	/* class nsIFrameDebug {
> > 		int ()(void) * *_vptr.nsIFrameDebug; 
> > 			/*	64     8 */ 
> > 	} <ancestor>; */ /*    64     8 */
> > 
> > 	/* size: 72, cachelines: 2, members: 2 */
> > 	/* sum members: 8, holes: 1, sum holes: 64 */
> > 	/* last cacheline: 8 bytes */
> > };
> > 
> > As you can see, because it only sees the forward declaration for
> > class nsBox, it thinks there's a gigantic hole in this class.  This
> > makes it very hard to look for real problems.
> > 
> > I tried to patch pahole to look through forward declarations
> > whenever possible (using cus__find_struct_by_name) but could not
> > get it to work, even after switching the main loop from a load
> > stealer to an iteration over all CUs after they're all loaded.  I
> > get gibberish:
> > 
> > class nsFrame : public nsBox, public nsIFrameDebug {
> > public:
> > 	/* class nsBox : public M-0M-^I^D {
> > 	public:
> > 		/* PLDHashNumber      (<ancestor>)(PLDHashTable *,
> > 		const void  *); */    /*     0     0 */
> > 	} <ancestor>; */ /*     0     0 */
> > 	/* XXX 64 bytes hole, try to pack */
> > ...
> > 
> > I'm attaching the code changes I made.  Any advice you could give me
> > would be greatly appreciated.  Note that there are two other changes
> > in there -- I had to take out the cmake test for libebl, whatever
> > that is (nothing fails to link without it), and I also changed -M
> > mode to not print vtables.
> 
> I'm trying to figure this out, but the xulrunner debuginfo file I have
> doesn't have the nsIFrameDebug class as an ancestor for nsFrame:

This is probably because it was configured out.  I am working with a 
development build, which has all sorts of extra debugging mechanisms
configured in.

> Anyway, it shows nsBox as a zero sized class, which it isn't. Problem
> is that if we process _just_ this CU we will never know what is the
> size for nsBox... but should be doable for multi-CU files where we
> can try to find another CU with the missing definition.

Right, that's what I figured was the problem.

> This will require a bit more coding, so that we can first go on
> discarding CUs till we find class nsFrame, then if we find forward
> decls in it, we must steal the CU and set it aside, then continue to
> the other CUs, till we find the one with the class nsBox definition,
> then process the first CU with nsFrame and when not finding
> definitions for some of its ancestors, look on the other CU where we
> found the nsBox definition.

What if the CU that has the nsBox definition has already gone past when
we hit the one with nsFrame?  It seems to me that this would only work
for some link orders.

> Look below for a coment explaining the bug in your code.

Thanks!  That makes sense.  Meantime I've written code to post-process
the incomplete output and fill in the missing classes, though.


Tangentially, I would really like to be able to pass a *list* of
structure/class names to -C (or have a separate option that reads a
list from a file).  The full analysis I'm doing needs to look at 200 or
so of the thousands of classes in xulrunner; currently I need to run
pahole once for each, which is really slow.

Also, I wonder if you could see your way clear to adding an alternative
output format that is easily machine-parseable?
Approximation-to-C-source format is nice for humans but I've spent the
past day and a bit writing a sed script to turn it into something that
I can do programmed analysis on and it was no fun.

zw
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Can't persuade pahole to see through forward declarations
       [not found]       ` <20090617102506.34aaf8e2-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
@ 2009-06-17 20:56         ` Arnaldo Carvalho de Melo
  2009-06-18 18:36         ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-06-17 20:56 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: dwarves-u79uwXL29TY76Z2rM5mHXA

Em Wed, Jun 17, 2009 at 10:25:06AM -0700, Zack Weinberg escreveu:
> Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Em Mon, Jun 15, 2009 at 05:24:09PM -0700, Zack Weinberg escreveu:
> > > I've been experimenting with your 'pahole' utility to analyze some
> > > of the data structures in Mozilla.  I've run into trouble because
> > > pahole seems to give up when it hits a forward declaration (a type
> > > DIE with DW_AT_declaration=1).  For instance, this is what -E -M
> > > mode prints for one of the classes of interest:
> > > 
> > > class nsFrame : public nsBox, public nsIFrameDebug {
> > > public:
> > > 	/* class nsBox {
> > > 	} <ancestor>; */              /*     0     0 */
> > > 
> > > 	/* XXX 64 bytes hole, try to pack */
> > > 
> > > 	/* --- cacheline 1 boundary (64 bytes) --- */
> > > 	/* class nsIFrameDebug {
> > > 		int ()(void) * *_vptr.nsIFrameDebug; 
> > > 			/*	64     8 */ 
> > > 	} <ancestor>; */ /*    64     8 */
> > > 
> > > 	/* size: 72, cachelines: 2, members: 2 */
> > > 	/* sum members: 8, holes: 1, sum holes: 64 */
> > > 	/* last cacheline: 8 bytes */
> > > };
> > > 
> > > As you can see, because it only sees the forward declaration for
> > > class nsBox, it thinks there's a gigantic hole in this class.  This
> > > makes it very hard to look for real problems.
> > > 
> > > I tried to patch pahole to look through forward declarations
> > > whenever possible (using cus__find_struct_by_name) but could not
> > > get it to work, even after switching the main loop from a load
> > > stealer to an iteration over all CUs after they're all loaded.  I
> > > get gibberish:
> > > 
> > > class nsFrame : public nsBox, public nsIFrameDebug {
> > > public:
> > > 	/* class nsBox : public M-0M-^I^D {
> > > 	public:
> > > 		/* PLDHashNumber      (<ancestor>)(PLDHashTable *,
> > > 		const void  *); */    /*     0     0 */
> > > 	} <ancestor>; */ /*     0     0 */
> > > 	/* XXX 64 bytes hole, try to pack */
> > > ...
> > > 
> > > I'm attaching the code changes I made.  Any advice you could give me
> > > would be greatly appreciated.  Note that there are two other changes
> > > in there -- I had to take out the cmake test for libebl, whatever
> > > that is (nothing fails to link without it), and I also changed -M
> > > mode to not print vtables.
> > 
> > I'm trying to figure this out, but the xulrunner debuginfo file I have
> > doesn't have the nsIFrameDebug class as an ancestor for nsFrame:
> 
> This is probably because it was configured out.  I am working with a 
> development build, which has all sorts of extra debugging mechanisms
> configured in.
> 
> > Anyway, it shows nsBox as a zero sized class, which it isn't. Problem
> > is that if we process _just_ this CU we will never know what is the
> > size for nsBox... but should be doable for multi-CU files where we
> > can try to find another CU with the missing definition.
> 
> Right, that's what I figured was the problem.
> 
> > This will require a bit more coding, so that we can first go on
> > discarding CUs till we find class nsFrame, then if we find forward
> > decls in it, we must steal the CU and set it aside, then continue to
> > the other CUs, till we find the one with the class nsBox definition,
> > then process the first CU with nsFrame and when not finding
> > definitions for some of its ancestors, look on the other CU where we
> > found the nsBox definition.
> 
> What if the CU that has the nsBox definition has already gone past when
> we hit the one with nsFrame?  It seems to me that this would only work
> for some link orders.
> 
> > Look below for a coment explaining the bug in your code.
> 
> Thanks!  That makes sense.  Meantime I've written code to post-process
> the incomplete output and fill in the missing classes, though.
> 
> 
> Tangentially, I would really like to be able to pass a *list* of
> structure/class names to -C (or have a separate option that reads a
> list from a file).  The full analysis I'm doing needs to look at 200 or
> so of the thousands of classes in xulrunner; currently I need to run
> pahole once for each, which is really slow.

Sure, I'll do that, getting the list thru -C and also from a filename,
then on each CU I'll just try to find every entry in this list, removing
from the list the ones I find/print and stopping the whole process after
no more entries are in the list.
 
> Also, I wonder if you could see your way clear to adding an alternative
> output format that is easily machine-parseable?
> Approximation-to-C-source format is nice for humans but I've spent the
> past day and a bit writing a sed script to turn it into something that
> I can do programmed analysis on and it was no fun.

What do you suggest? Removing \n and comments (that already can be
removed thru conf knobs)?

I'm open to whatever changes makes these utilities more useful.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Can't persuade pahole to see through forward declarations
       [not found]       ` <20090617102506.34aaf8e2-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
  2009-06-17 20:56         ` Arnaldo Carvalho de Melo
@ 2009-06-18 18:36         ` Arnaldo Carvalho de Melo
       [not found]           ` <20090618183634.GE21530-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-06-18 18:36 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: dwarves-u79uwXL29TY76Z2rM5mHXA

Em Wed, Jun 17, 2009 at 10:25:06AM -0700, Zack Weinberg escreveu:
> Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Em Mon, Jun 15, 2009 at 05:24:09PM -0700, Zack Weinberg escreveu:
> > > I've been experimenting with your 'pahole' utility to analyze some
> > > of the data structures in Mozilla.  I've run into trouble because
> > > pahole seems to give up when it hits a forward declaration (a type
> > > DIE with DW_AT_declaration=1).  For instance, this is what -E -M
> > > mode prints for one of the classes of interest:
> > > 
> > > class nsFrame : public nsBox, public nsIFrameDebug {
> > > public:
> > > 	/* class nsBox {
> > > 	} <ancestor>; */              /*     0     0 */
> > > 
> > > 	/* XXX 64 bytes hole, try to pack */
> > > 
> > > 	/* --- cacheline 1 boundary (64 bytes) --- */
> > > 	/* class nsIFrameDebug {
> > > 		int ()(void) * *_vptr.nsIFrameDebug; 
> > > 			/*	64     8 */ 
> > > 	} <ancestor>; */ /*    64     8 */
> > > 
> > > 	/* size: 72, cachelines: 2, members: 2 */
> > > 	/* sum members: 8, holes: 1, sum holes: 64 */
> > > 	/* last cacheline: 8 bytes */
> > > };
> > > 
> > > As you can see, because it only sees the forward declaration for
> > > class nsBox, it thinks there's a gigantic hole in this class.  This
> > > makes it very hard to look for real problems.
> > > 
> > > I tried to patch pahole to look through forward declarations
> > > whenever possible (using cus__find_struct_by_name) but could not
> > > get it to work, even after switching the main loop from a load
> > > stealer to an iteration over all CUs after they're all loaded.  I
> > > get gibberish:
> > > 
> > > class nsFrame : public nsBox, public nsIFrameDebug {
> > > public:
> > > 	/* class nsBox : public M-0M-^I^D {
> > > 	public:
> > > 		/* PLDHashNumber      (<ancestor>)(PLDHashTable *,
> > > 		const void  *); */    /*     0     0 */
> > > 	} <ancestor>; */ /*     0     0 */
> > > 	/* XXX 64 bytes hole, try to pack */
> > > ...
> > > 
> > > I'm attaching the code changes I made.  Any advice you could give me
> > > would be greatly appreciated.  Note that there are two other changes
> > > in there -- I had to take out the cmake test for libebl, whatever
> > > that is (nothing fails to link without it), and I also changed -M
> > > mode to not print vtables.
> > 
> > I'm trying to figure this out, but the xulrunner debuginfo file I have
> > doesn't have the nsIFrameDebug class as an ancestor for nsFrame:
> 
> This is probably because it was configured out.  I am working with a 
> development build, which has all sorts of extra debugging mechanisms
> configured in.

OK, can you please send me a multi-cu file that has the above
definitions so that I can use it as the test case for this new feature?
 
> > Anyway, it shows nsBox as a zero sized class, which it isn't. Problem
> > is that if we process _just_ this CU we will never know what is the
> > size for nsBox... but should be doable for multi-CU files where we
> > can try to find another CU with the missing definition.
> 
> Right, that's what I figured was the problem.
> 
> > This will require a bit more coding, so that we can first go on
> > discarding CUs till we find class nsFrame, then if we find forward
> > decls in it, we must steal the CU and set it aside, then continue to
> > the other CUs, till we find the one with the class nsBox definition,
> > then process the first CU with nsFrame and when not finding
> > definitions for some of its ancestors, look on the other CU where we
> > found the nsBox definition.
> 
> What if the CU that has the nsBox definition has already gone past when
> we hit the one with nsFrame?  It seems to me that this would only work
> for some link orders.

Yeah, probably requires a two pass strategy, i.e. but perhaps I'm trying
to optimize this a bit too much, we'll see when I try.

> > Look below for a coment explaining the bug in your code.
> 
> Thanks!  That makes sense.  Meantime I've written code to post-process
> the incomplete output and fill in the missing classes, though.
> 
> 
> Tangentially, I would really like to be able to pass a *list* of
> structure/class names to -C (or have a separate option that reads a
> list from a file).  The full analysis I'm doing needs to look at 200 or
> so of the thousands of classes in xulrunner; currently I need to run
> pahole once for each, which is really slow.

Look at the last commit :-)

http://git.kernel.org/?p=linux/kernel/git/acme/pahole.git;a=commitdiff;h=519d1d3d9691ca94f458853c4710d501fb33720f
 
> Also, I wonder if you could see your way clear to adding an alternative
> output format that is easily machine-parseable?
> Approximation-to-C-source format is nice for humans but I've spent the
> past day and a bit writing a sed script to turn it into something that
> I can do programmed analysis on and it was no fun.

How would it look like?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Can't persuade pahole to see through forward declarations
       [not found]           ` <20090618183634.GE21530-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
@ 2009-06-18 20:28             ` Zack Weinberg
       [not found]               ` <20090618132820.2eb7371a-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Zack Weinberg @ 2009-06-18 20:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves-u79uwXL29TY76Z2rM5mHXA

Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > I'm trying to figure this out, but the xulrunner debuginfo file I
> > > have doesn't have the nsIFrameDebug class as an ancestor for
> > > nsFrame:
> > 
> > This is probably because it was configured out.  I am working with
> > a development build, which has all sorts of extra debugging
> > mechanisms configured in.
> 
> OK, can you please send me a multi-cu file that has the above
> definitions so that I can use it as the test case for this new
> feature?

It's 42MB bzipped.  You can download it from
http://www.owlfolio.org/libgklayout.so.bz2 but I'm going to delete it
after a week.

> > Tangentially, I would really like to be able to pass a *list* of
> > structure/class names to -C (or have a separate option that reads a
> > list from a file).  The full analysis I'm doing needs to look at
> > 200 or so of the thousands of classes in xulrunner; currently I
> > need to run pahole once for each, which is really slow.
> 
> Look at the last commit :-)
> 
> http://git.kernel.org/?p=linux/kernel/git/acme/pahole.git;a=commitdiff;h=519d1d3d9691ca94f458853c4710d501fb33720f

Perfect, thanks.

> > Also, I wonder if you could see your way clear to adding an
> > alternative output format that is easily machine-parseable?
> > Approximation-to-C-source format is nice for humans but I've spent
> > the past day and a bit writing a sed script to turn it into
> > something that I can do programmed analysis on and it was no fun.
> 
> How would it look like?

For the analysis I'm doing, the ideal format would be very flat and
line-oriented.  Consider this structure definition:

struct Foo {
  union {
    struct { int x; int y; } a;
    struct { float z; short y; } b;
    double c;
    void* d;
  } u;
  char n[4];
  void (*ptr)(int);
  void (*ptrs[2])(int);
  int bf:12;
  short bg:3;
};

I would like to get something like this (assuming LP64):

name|type|bytes|bits|byteoff|bitoff|cacheline
Foo|struct Foo|48|0|0|0|0
Foo.u|union|8|0|0|0|0
Foo.u.a|struct|8|0|0|0|0
Foo.u.a.x|int|4|0|0|0|0
Foo.u.a.y|int|4|0|4|0|0
Foo.u.b|struct|8|0|0|0|0
Foo.u.b.z|float|4|0|0|0|0
Foo.u.b.y|short int|2|0|4|0|0
Foo.u.b.<pad>|pad|2|0|6|0|0
Foo.u.c|double|8|0|0|0|0
Foo.u.d|void *|8|0|0|0|0
Foo.n|char[4]|4|0|8|0|0
Foo.<hole1>|pad|4|0|12|0|0
Foo.ptr|void(*)(int)|8|0|16|0|0
Foo.ptrs|void(*[2])(int)|16|0|24|0|0
Foo.bf|int|1|4|40|0|0
Foo.bg|short int|0|3|41|4|0
Foo.<pad>|pad|6|1|41|7|0

I suggest "|" for the field separator because I'm pretty sure it can't
appear in a C/C++ "abstract declaration" (i.e. the "type" field).  Tabs
are visually confusable with the spaces that you do occasionally need
in an abstract declaration.

The key properties of this are:

 - There is only one kind of record to process.
 - Each line can be examined in isolation, if you don't care about the
   nesting structure.
 - You do not have to process C declaration syntax to find the name of
   each field.
 - There is never missing data; in many cases pahole currently will
   omit the offset in its annotation of a full nested structure,
   for instance, which is fine for humans but really bad for machine
   processing.
 - Padding at the end of a structure is explicit, always.  (The current
   pahole output doesn't call it out at all for the 'b' struct inside
   the union.)
 - Bitfields are not special: the structure is treated as a linear
   array of bits, within which every field starts at bit
   (byteoff*8+bitoff) and continues for (bytes*8+bits) bits.
   The bitoff and bits columns are always in the range 0..7.
   This saves some fiddly math.

zw
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Formatting "drivers" was Re: Can't persuade pahole to see through forward declarations
       [not found]               ` <20090618132820.2eb7371a-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
@ 2009-06-18 20:50                 ` Arnaldo Carvalho de Melo
       [not found]                   ` <20090618205053.GA4258-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-06-18 20:50 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Ilpo Järvinen, dwarves-u79uwXL29TY76Z2rM5mHXA

Em Thu, Jun 18, 2009 at 01:28:20PM -0700, Zack Weinberg escreveu:
> Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > I'm trying to figure this out, but the xulrunner debuginfo file I
> > > > have doesn't have the nsIFrameDebug class as an ancestor for
> > > > nsFrame:
> > > 
> > > This is probably because it was configured out.  I am working with
> > > a development build, which has all sorts of extra debugging
> > > mechanisms configured in.
> > 
> > OK, can you please send me a multi-cu file that has the above
> > definitions so that I can use it as the test case for this new
> > feature?
> 
> It's 42MB bzipped.  You can download it from
> http://www.owlfolio.org/libgklayout.so.bz2 but I'm going to delete it
> after a week.

I'm downloading it now
 
> > > Tangentially, I would really like to be able to pass a *list* of
> > > structure/class names to -C (or have a separate option that reads a
> > > list from a file).  The full analysis I'm doing needs to look at
> > > 200 or so of the thousands of classes in xulrunner; currently I
> > > need to run pahole once for each, which is really slow.
> > 
> > Look at the last commit :-)
> > 
> > http://git.kernel.org/?p=linux/kernel/git/acme/pahole.git;a=commitdiff;h=519d1d3d9691ca94f458853c4710d501fb33720f
> 
> Perfect, thanks.
> 
> > > Also, I wonder if you could see your way clear to adding an
> > > alternative output format that is easily machine-parseable?
> > > Approximation-to-C-source format is nice for humans but I've spent
> > > the past day and a bit writing a sed script to turn it into
> > > something that I can do programmed analysis on and it was no fun.
> > 
> > How would it look like?
> 
> For the analysis I'm doing, the ideal format would be very flat and
> line-oriented.  Consider this structure definition:
> 
> struct Foo {
>   union {
>     struct { int x; int y; } a;
>     struct { float z; short y; } b;
>     double c;
>     void* d;
>   } u;
>   char n[4];
>   void (*ptr)(int);
>   void (*ptrs[2])(int);
>   int bf:12;
>   short bg:3;
> };
> 
> I would like to get something like this (assuming LP64):
> 
> name|type|bytes|bits|byteoff|bitoff|cacheline
> Foo|struct Foo|48|0|0|0|0
> Foo.u|union|8|0|0|0|0
> Foo.u.a|struct|8|0|0|0|0
> Foo.u.a.x|int|4|0|0|0|0
> Foo.u.a.y|int|4|0|4|0|0
> Foo.u.b|struct|8|0|0|0|0
> Foo.u.b.z|float|4|0|0|0|0
> Foo.u.b.y|short int|2|0|4|0|0
> Foo.u.b.<pad>|pad|2|0|6|0|0
> Foo.u.c|double|8|0|0|0|0
> Foo.u.d|void *|8|0|0|0|0
> Foo.n|char[4]|4|0|8|0|0
> Foo.<hole1>|pad|4|0|12|0|0
> Foo.ptr|void(*)(int)|8|0|16|0|0
> Foo.ptrs|void(*[2])(int)|16|0|24|0|0
> Foo.bf|int|1|4|40|0|0
> Foo.bg|short int|0|3|41|4|0
> Foo.<pad>|pad|6|1|41|7|0
> 
> I suggest "|" for the field separator because I'm pretty sure it can't
> appear in a C/C++ "abstract declaration" (i.e. the "type" field).  Tabs
> are visually confusable with the spaces that you do occasionally need
> in an abstract declaration.
> 
> The key properties of this are:
> 
>  - There is only one kind of record to process.
>  - Each line can be examined in isolation, if you don't care about the
>    nesting structure.
>  - You do not have to process C declaration syntax to find the name of
>    each field.
>  - There is never missing data; in many cases pahole currently will
>    omit the offset in its annotation of a full nested structure,
>    for instance, which is fine for humans but really bad for machine
>    processing.

Annoying "simplification", I'll put the offset there explicitely, just
worried that Ilpo may be using it in his sed scripts... Ilpo?

>  - Padding at the end of a structure is explicit, always.  (The current
>    pahole output doesn't call it out at all for the 'b' struct inside
>    the union.)

This one is a bug, I'll fix it.

>  - Bitfields are not special: the structure is treated as a linear
>    array of bits, within which every field starts at bit
>    (byteoff*8+bitoff) and continues for (bytes*8+bits) bits.
>    The bitoff and bits columns are always in the range 0..7.
>    This saves some fiddly math.

Well, here the CTFication of the core will give a dividend :-) We
already treat everything as bit_offsets, see struct class_member.

My first reaction is that dwarf_fprintf would need a "fprintf_ops"
struct and that then the current set of functions called from
tag__fprintf would be the first formatter, and the second one that will
just do as you suggest.

I'll investigate that idea.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Formatting "drivers" was Re: Can't persuade pahole to see through forward declarations
       [not found]                   ` <20090618205053.GA4258-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
@ 2009-06-24  6:29                     ` Ilpo Järvinen
       [not found]                       ` <Pine.LNX.4.64.0906240918170.26611-tOJ5Hk0ALFH+1EwMtL+0ZheBoQapMCRCVQQcQy+6Uvc@public.gmane.org>
  2009-06-24  6:42                     ` Zack Weinberg
  1 sibling, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2009-06-24  6:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Zack Weinberg, dwarves-u79uwXL29TY76Z2rM5mHXA

On Thu, 18 Jun 2009, Arnaldo Carvalho de Melo wrote:

> Em Thu, Jun 18, 2009 at 01:28:20PM -0700, Zack Weinberg escreveu:
> > Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > > Also, I wonder if you could see your way clear to adding an
> > > > alternative output format that is easily machine-parseable?
> > > > Approximation-to-C-source format is nice for humans but I've spent
> > > > the past day and a bit writing a sed script to turn it into
> > > > something that I can do programmed analysis on and it was no fun.
> > > 
> > > How would it look like?
> > 
> > For the analysis I'm doing, the ideal format would be very flat and
> > line-oriented.  Consider this structure definition:
> > 
> > struct Foo {
> >   union {
> >     struct { int x; int y; } a;
> >     struct { float z; short y; } b;
> >     double c;
> >     void* d;
> >   } u;
> >   char n[4];
> >   void (*ptr)(int);
> >   void (*ptrs[2])(int);
> >   int bf:12;
> >   short bg:3;
> > };
> > 
> > I would like to get something like this (assuming LP64):
> > 
> > name|type|bytes|bits|byteoff|bitoff|cacheline
> > Foo|struct Foo|48|0|0|0|0
> > Foo.u|union|8|0|0|0|0
> > Foo.u.a|struct|8|0|0|0|0
> > Foo.u.a.x|int|4|0|0|0|0
> > Foo.u.a.y|int|4|0|4|0|0
> > Foo.u.b|struct|8|0|0|0|0
> > Foo.u.b.z|float|4|0|0|0|0
> > Foo.u.b.y|short int|2|0|4|0|0
> > Foo.u.b.<pad>|pad|2|0|6|0|0
> > Foo.u.c|double|8|0|0|0|0
> > Foo.u.d|void *|8|0|0|0|0
> > Foo.n|char[4]|4|0|8|0|0
> > Foo.<hole1>|pad|4|0|12|0|0
> > Foo.ptr|void(*)(int)|8|0|16|0|0
> > Foo.ptrs|void(*[2])(int)|16|0|24|0|0
> > Foo.bf|int|1|4|40|0|0
> > Foo.bg|short int|0|3|41|4|0
> > Foo.<pad>|pad|6|1|41|7|0
> > 
> > I suggest "|" for the field separator because I'm pretty sure it can't
> > appear in a C/C++ "abstract declaration" (i.e. the "type" field).  Tabs
> > are visually confusable with the spaces that you do occasionally need
> > in an abstract declaration.
> > 
> > The key properties of this are:
> > 
> >  - There is only one kind of record to process.
> >  - Each line can be examined in isolation, if you don't care about the
> >    nesting structure.
> >  - You do not have to process C declaration syntax to find the name of
> >    each field.

Good point, this is one of the most complex tasks in scripts to get 
general case right. Besides typedefs, especially those function pointers 
which appear in return value and arguments make it a task too hairy 
for any sane people, not that I'd say it is impossible... :-)

> >  - There is never missing data; in many cases pahole currently will
> >    omit the offset in its annotation of a full nested structure,
> >    for instance, which is fine for humans but really bad for machine
> >    processing.
> 
> Annoying "simplification", I'll put the offset there explicitely, just
> worried that Ilpo may be using it in his sed scripts... Ilpo?

No, I'm not. I usually try to avoid trusting such things anyway, unless I 
really have to. You just don't parse anything c-like with newlines / 
spaces as significant :-).


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Formatting "drivers" was Re: Can't persuade pahole to see through forward declarations
       [not found]                       ` <Pine.LNX.4.64.0906240918170.26611-tOJ5Hk0ALFH+1EwMtL+0ZheBoQapMCRCVQQcQy+6Uvc@public.gmane.org>
@ 2009-06-24  6:40                         ` Zack Weinberg
       [not found]                           ` <20090623234018.61f1ce02-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Zack Weinberg @ 2009-06-24  6:40 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Arnaldo Carvalho de Melo, dwarves-u79uwXL29TY76Z2rM5mHXA

"Ilpo Järvinen" <ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g@public.gmane.org> wrote:
> On Thu, 18 Jun 2009, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 18, 2009 at 01:28:20PM -0700, Zack Weinberg escreveu:
> > >  - You do not have to process C declaration syntax to find the
> > > name of each field.
> 
> Good point, this is one of the most complex tasks in scripts to get 
> general case right. Besides typedefs, especially those function
> pointers which appear in return value and arguments make it a task
> too hairy for any sane people, not that I'd say it is
> impossible... :-)

I used to hack gcc.  I know *exactly* how hard it is to parse C
declarations. :-)

I had been doing okay, for the limited thing I am trying to do, with
sed scripts to munge the pahole output into something that could be
relatively easily parsed by a Python script, but then I ran into this
construct:

  struct S {
    ...
    struct T {
      int a;
    } tee[2];
  };

No hint that there are two copies of T embedded in S, here, until it's
far too late to do anything about it (if you're a sed script).

> > >  - There is never missing data; in many cases pahole currently
> > > will omit the offset in its annotation of a full nested structure,
> > >    for instance, which is fine for humans but really bad for
> > > machine processing.
> > 
> > Annoying "simplification", I'll put the offset there explicitely,
> > just worried that Ilpo may be using it in his sed scripts... Ilpo?
> 
> No, I'm not. I usually try to avoid trusting such things anyway,
> unless I really have to. You just don't parse anything c-like with
> newlines / spaces as significant :-).

I actually have a patch for this one now :-)  I looked harder at my
data set and realized it only happens with unions.  There's also a 
small typo fix in here, I was getting "classnsThing *" a lot...

diff --git a/dwarves_fprintf.c b/dwarves_fprintf.c
index e3e621f..bbc5dd6 100644
--- a/dwarves_fprintf.c
+++ b/dwarves_fprintf.c
@@ -319,7 +319,7 @@ static const char *tag__prefix(const struct cu *cu, const uint32_t tag)
 	case DW_TAG_structure_type:
 		return cu->language == DW_LANG_C_plus_plus ? "class " :
 							     "struct ";
-	case DW_TAG_class_type:		return "class";
+	case DW_TAG_class_type:		return "class ";
 	case DW_TAG_union_type:		return "union ";
 	case DW_TAG_pointer_type:	return " *";
 	case DW_TAG_reference_type:	return " &";
@@ -679,6 +679,7 @@ static size_t union_member__fprintf(struct class_member *self,
 				    const struct conf_fprintf *conf, FILE *fp)
 {
 	const size_t size = self->byte_size;
+	const size_t offset = conf->base_offset;
 	size_t printed = type__fprintf(type, cu, s(cu, self->name), conf, fp);
 
 	if ((tag__is_union(type) || tag__is_struct(type) ||
@@ -693,17 +694,17 @@ static size_t union_member__fprintf(struct class_member *self,
 			 * '} member_name;' last line of the type printed in the
 			 * above call to type__fprintf.
 			 */
-			printed += fprintf(fp, ";%*s/* %11zd */",
+			printed += fprintf(fp, ";%*s/* %5zd %5zd */",
 					   (conf->type_spacing +
-					    conf->name_spacing - slen - 3), " ", size);
+					    conf->name_spacing - slen - 3), " ", offset, size);
 		}
 	} else {
 		printed += fprintf(fp, ";");
 
 		if (!conf->suppress_offset_comment) {
 			const int spacing = conf->type_spacing + conf->name_spacing - printed;
-			printed += fprintf(fp, "%*s/* %11zd */",
-					   spacing > 0 ? spacing : 0, " ", size);
+			printed += fprintf(fp, "%*s/* %5zd %5zd */",
+					   spacing > 0 ? spacing : 0, " ", offset, size);
 		}
 	}
 
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Formatting "drivers" was Re: Can't persuade pahole to see through forward declarations
       [not found]                   ` <20090618205053.GA4258-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
  2009-06-24  6:29                     ` Ilpo Järvinen
@ 2009-06-24  6:42                     ` Zack Weinberg
  1 sibling, 0 replies; 9+ messages in thread
From: Zack Weinberg @ 2009-06-24  6:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ilpo Järvinen, dwarves-u79uwXL29TY76Z2rM5mHXA

Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Em Thu, Jun 18, 2009 at 01:28:20PM -0700, Zack Weinberg escreveu:
> >  - Padding at the end of a structure is explicit, always.  (The
> > current pahole output doesn't call it out at all for the 'b' struct
> > inside the union.)
> 
> This one is a bug, I'll fix it.

Cool.

> >  - Bitfields are not special: the structure is treated as a linear
> >    array of bits, within which every field starts at bit
> >    (byteoff*8+bitoff) and continues for (bytes*8+bits) bits.
> >    The bitoff and bits columns are always in the range 0..7.
> >    This saves some fiddly math.
> 
> Well, here the CTFication of the core will give a dividend :-) We
> already treat everything as bit_offsets, see struct class_member.

I don't know what CTF is, and -z doesn't seem to do anything... but
it's good to know the internal representation has no trouble here.

> My first reaction is that dwarf_fprintf would need a "fprintf_ops"
> struct and that then the current set of functions called from
> tag__fprintf would be the first formatter, and the second one that
> will just do as you suggest.

Yes, that sounds like a good strategy.

zw
--
To unsubscribe from this list: send the line "unsubscribe dwarves" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Formatting "drivers" was Re: Can't persuade pahole to see through forward declarations
       [not found]                           ` <20090623234018.61f1ce02-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
@ 2009-06-24  6:58                             ` Ilpo Järvinen
  0 siblings, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2009-06-24  6:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Arnaldo Carvalho de Melo, dwarves-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1451 bytes --]

On Tue, 23 Jun 2009, Zack Weinberg wrote:

> "Ilpo Järvinen" <ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g@public.gmane.org> wrote:
> > On Thu, 18 Jun 2009, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Jun 18, 2009 at 01:28:20PM -0700, Zack Weinberg escreveu:
> > > >  - You do not have to process C declaration syntax to find the
> > > > name of each field.
> > 
> > Good point, this is one of the most complex tasks in scripts to get 
> > general case right. Besides typedefs, especially those function
> > pointers which appear in return value and arguments make it a task
> > too hairy for any sane people, not that I'd say it is
> > impossible... :-)
> 
> I used to hack gcc.  I know *exactly* how hard it is to parse C
> declarations. :-)
> 
> I had been doing okay, for the limited thing I am trying to do, with
> sed scripts to munge the pahole output into something that could be
> relatively easily parsed by a Python script, but then I ran into this
> construct:
> 
>   struct S {
>     ...
>     struct T {
>       int a;
>     } tee[2];
>   };
> 
> No hint that there are two copies of T embedded in S, here, until it's
> far too late to do anything about it (if you're a sed script).

Agreed, those kind of constructs are quite annoying. I'd probably do it so 
that I'd tac the input and preprocess it into something I can do line by 
line after another tac (but that of course has some performance penalty
if the input is very large).

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-24  6:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20090615172409.6f0f322b@mozilla.com>
     [not found] ` <20090617170217.GB21530@ghostprotocols.net>
     [not found]   ` <20090617170217.GB21530-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
2009-06-17 17:25     ` Can't persuade pahole to see through forward declarations Zack Weinberg
     [not found]       ` <20090617102506.34aaf8e2-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
2009-06-17 20:56         ` Arnaldo Carvalho de Melo
2009-06-18 18:36         ` Arnaldo Carvalho de Melo
     [not found]           ` <20090618183634.GE21530-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
2009-06-18 20:28             ` Zack Weinberg
     [not found]               ` <20090618132820.2eb7371a-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
2009-06-18 20:50                 ` Formatting "drivers" was " Arnaldo Carvalho de Melo
     [not found]                   ` <20090618205053.GA4258-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>
2009-06-24  6:29                     ` Ilpo Järvinen
     [not found]                       ` <Pine.LNX.4.64.0906240918170.26611-tOJ5Hk0ALFH+1EwMtL+0ZheBoQapMCRCVQQcQy+6Uvc@public.gmane.org>
2009-06-24  6:40                         ` Zack Weinberg
     [not found]                           ` <20090623234018.61f1ce02-4eJtQOnFJqFBDgjK7y7TUQ@public.gmane.org>
2009-06-24  6:58                             ` Ilpo Järvinen
2009-06-24  6:42                     ` Zack Weinberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).