dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Porting pahole from dwarf_next_unit() to dwarf_get_units()
@ 2023-12-05 13:03 Dimitri John Ledkov
  2023-12-05 15:47 ` Arnaldo Carvalho de Melo
  2024-01-14 22:29 ` Mark Wielaard
  0 siblings, 2 replies; 6+ messages in thread
From: Dimitri John Ledkov @ 2023-12-05 13:03 UTC (permalink / raw)
  To: dwarves, elfutils-devel; +Cc: acme, mark

Currently pahole warns and does nothing upon hitting
DW_TAG_skeleton_unit as implemented at
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf

In elfutils, a while back a new API got added that aids with discovery
and processing of such tags -
https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=79f0e623dcde4b042bb72f636a2211d67d5c0ade

It seems to me if pahole is ported from using dwarf_next_unit() to
instead use dwarf_get_units() native support can be added for
split-dwarf (dwo) files.

I am trying to write such a port, but it is proving to be very
difficult. I am entirely unfamiliar with neither pahole nor libdw nor
the dwarf file format. Thus it is very confusing when both pahole and
dwarf library use very similar type names and structs. For example
libdw has struct Dwarf_CU and pahole has unrelated dwarf_cu struct.

What are the differences between dwarf_nextcu(), dwarf_next_unit(),
dwarf_get_units() and when should one use each one of them? (or nest
them?)

Is a port of https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/dwarf_loader.c?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
to use dwarf_get_units() a right approach and would be welcomed?

Is anyone else interested in providing any help, or guidance?

-- 
Dimitri

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Porting pahole from dwarf_next_unit() to dwarf_get_units()
  2023-12-05 13:03 Porting pahole from dwarf_next_unit() to dwarf_get_units() Dimitri John Ledkov
@ 2023-12-05 15:47 ` Arnaldo Carvalho de Melo
  2023-12-05 16:11   ` Dimitri John Ledkov
  2024-01-14 22:29 ` Mark Wielaard
  1 sibling, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-12-05 15:47 UTC (permalink / raw)
  To: Dimitri John Ledkov; +Cc: dwarves, elfutils-devel, mark

Em Tue, Dec 05, 2023 at 01:03:01PM +0000, Dimitri John Ledkov escreveu:
> Currently pahole warns and does nothing upon hitting
> DW_TAG_skeleton_unit as implemented at
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> 
> In elfutils, a while back a new API got added that aids with discovery
> and processing of such tags -
> https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=79f0e623dcde4b042bb72f636a2211d67d5c0ade
> 
> It seems to me if pahole is ported from using dwarf_next_unit() to
> instead use dwarf_get_units() native support can be added for
> split-dwarf (dwo) files.

That would be awesome!
 
> I am trying to write such a port, but it is proving to be very
> difficult.

I did some work on supporting split-dwarf months ago, but got
sidetracked with other work, BTF related and then the code bitrotted, I
have to go back looking at it to swap back the details into my brain:

https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=alt_dwarf

The patches after:

45c044860c2abce7 dwarf_loader: Sync with LINUX_ELFNOTE_LTO_INFO macro from kernel

Are the ones to suport alt dwarf.

> I am entirely unfamiliar with neither pahole nor libdw nor
> the dwarf file format. Thus it is very confusing when both pahole and
> dwarf library use very similar type names and structs. For example
> libdw has struct Dwarf_CU and pahole has unrelated dwarf_cu struct.
 
> What are the differences between dwarf_nextcu(), dwarf_next_unit(),
> dwarf_get_units() and when should one use each one of them? (or nest
> them?)

> Is a port of https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/dwarf_loader.c?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> to use dwarf_get_units() a right approach and would be welcomed?

Yes, we need to support DWARF5 fully.
 
> Is anyone else interested in providing any help, or guidance?

I'm interested, and I think if Mark could help it would be great as
well.

- ARnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Porting pahole from dwarf_next_unit() to dwarf_get_units()
  2023-12-05 15:47 ` Arnaldo Carvalho de Melo
@ 2023-12-05 16:11   ` Dimitri John Ledkov
  2023-12-05 17:43     ` [PATCH] HACK: port dwarf_loader to dwarf_get_units Dimitri John Ledkov
  2023-12-05 18:46     ` Porting pahole from dwarf_next_unit() to dwarf_get_units() Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 6+ messages in thread
From: Dimitri John Ledkov @ 2023-12-05 16:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: dwarves, elfutils-devel, mark

I

On Tue, 5 Dec 2023, 15:47 Arnaldo Carvalho de Melo, <acme@kernel.org> wrote:
>
> Em Tue, Dec 05, 2023 at 01:03:01PM +0000, Dimitri John Ledkov escreveu:
> > Currently pahole warns and does nothing upon hitting
> > DW_TAG_skeleton_unit as implemented at
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> >
> > In elfutils, a while back a new API got added that aids with discovery
> > and processing of such tags -
> > https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=79f0e623dcde4b042bb72f636a2211d67d5c0ade
> >
> > It seems to me if pahole is ported from using dwarf_next_unit() to
> > instead use dwarf_get_units() native support can be added for
> > split-dwarf (dwo) files.
>
> That would be awesome!
>
> > I am trying to write such a port, but it is proving to be very
> > difficult.
>
> I did some work on supporting split-dwarf months ago, but got
> sidetracked with other work, BTF related and then the code bitrotted, I
> have to go back looking at it to swap back the details into my brain:
>
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=alt_dwarf
>
> The patches after:
>
> 45c044860c2abce7 dwarf_loader: Sync with LINUX_ELFNOTE_LTO_INFO macro from kernel
>
> Are the ones to suport alt dwarf.

I will read into those thanks.

>
> > I am entirely unfamiliar with neither pahole nor libdw nor
> > the dwarf file format. Thus it is very confusing when both pahole and
> > dwarf library use very similar type names and structs. For example
> > libdw has struct Dwarf_CU and pahole has unrelated dwarf_cu struct.
>
> > What are the differences between dwarf_nextcu(), dwarf_next_unit(),
> > dwarf_get_units() and when should one use each one of them? (or nest
> > them?)
>
> > Is a port of https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/dwarf_loader.c?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> > to use dwarf_get_units() a right approach and would be welcomed?
>
> Yes, we need to support DWARF5 fully.

ack

>
> > Is anyone else interested in providing any help, or guidance?
>
> I'm interested, and I think if Mark could help it would be great as
> well.

I have something that sort of works, but then like aboarts with
invalid free's on exit - which the purist in me cares, but not sure if
it is of practical value or not.
And eu-readelf code also mentions that it deliberary leaks memory,
because life is hard.
I will try to address or warn about memory leaks to see if stuff
works, and post and RFC.

Regards,

Dimitri.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] HACK: port dwarf_loader to dwarf_get_units
  2023-12-05 16:11   ` Dimitri John Ledkov
@ 2023-12-05 17:43     ` Dimitri John Ledkov
  2023-12-05 18:46     ` Porting pahole from dwarf_next_unit() to dwarf_get_units() Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 6+ messages in thread
From: Dimitri John Ledkov @ 2023-12-05 17:43 UTC (permalink / raw)
  To: dwarves; +Cc: acme, mark

Add a trivial sample to demonstrate that loading dwarf-4, dwarf-5,
with and without split-dwarf works.

Observe that before this patch there are valgrind leaks.
Obverse that there are even more valgrind leaks after this patch.

But it does make things load all 4 sample builds, and present the same
sample pahole output for all of them.

Abosolutely no idea what pointer_size variable is, or what it was in
dwarf_next_unit() or what it is in dwarf_cu_die(), or if that's what
cu__new() needs.

Separtely if one compiles linux kernel with dwarf-5 and split-dwarf,
and tries to run this build of pahole, even on like a trivial kernel
module it segfaults eventually when trying to
ftype__recode_dwarf_types, because eventually pos becomes NULL and
NULL dereference happens. So something about the created cus is not
quite right. Strongly suspecting that pointer_size actually matters
and possibly loaidng subdie needs to be properly accounted for.

NACKed-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
---
 dwarf_loader.c | 31 ++++++++++++-------------------
 dwarves.c      |  8 ++++++++
 sample.c       | 10 ++++++++++
 sample.sh      | 18 ++++++++++++++++++
 4 files changed, 48 insertions(+), 19 deletions(-)
 create mode 100644 sample.c
 create mode 100755 sample.sh

diff --git a/dwarf_loader.c b/dwarf_loader.c
index ccf31943bb..84aebed7c5 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -2823,13 +2823,6 @@ static int die__process(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 	const uint16_t tag = dwarf_tag(die);
 
 	if (tag == DW_TAG_skeleton_unit) {
-		static bool warned;
-
-		if (!warned) {
-			fprintf(stderr, "WARNING: DW_TAG_skeleton_unit used, please look for a .dwo file and use it instead.\n"
-					"         A future version of pahole will support do this automagically.\n");
-			warned = true;
-		}
 		return 0; // so that other units can be processed
 	}
 
@@ -3068,20 +3061,22 @@ static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwa
 				   const char *filename, const unsigned char *build_id,
 				   int build_id_len, struct cu **cup, struct dwarf_cu *dcup)
 {
-	Dwarf_Off off = 0, noff, type_off;
-	size_t cuhl;
 	uint8_t pointer_size, offset_size;
-	uint64_t signature;
+
+	Dwarf_CU *dgu_cu = NULL;
+	Dwarf_Die result;
+	Dwarf_Die cudie;
+	Dwarf_Die subdie;
 
 	*cup = NULL;
 
-	while (dwarf_next_unit(dw, off, &noff, &cuhl, NULL, NULL, &pointer_size,
-			       &offset_size, &signature, &type_off)
-		== 0) {
+	while (dwarf_get_units(dw,dgu_cu,&dgu_cu,NULL,NULL,&cudie,&subdie)==0) {
 
 		if (*cup == NULL) {
 			struct cu *cu;
 
+			dwarf_cu_die (dgu_cu, &result, NULL, NULL, &pointer_size, NULL, NULL, NULL);
+
 			cu = cu__new("", pointer_size, build_id,
 				     build_id_len, filename, conf->use_obstack);
 			if (cu == NULL ||
@@ -3100,14 +3095,12 @@ static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwa
 			*cup = cu;
 		}
 
-		Dwarf_Die die_mem;
-		Dwarf_Die *cu_die = dwarf_offdie_types(dw, off + cuhl,
-						       &die_mem);
-
-		if (die__process(cu_die, *cup, conf) != 0)
+		if (die__process(&cudie, *cup, conf) != 0)
 			return DWARF_CB_ABORT;
+		if (dwarf_tag(&subdie) != DW_TAG_invalid)
+			if (die__process(&subdie, *cup, conf) != 0)
+				return DWARF_CB_ABORT;
 
-		off = noff;
 	}
 
 	if (*cup != NULL && cu__recode_dwarf_types(*cup) != 0)
diff --git a/dwarves.c b/dwarves.c
index 9f97edaea2..2abab727fc 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -62,6 +62,10 @@ void *cu__malloc(struct cu *cu, size_t size)
 
 void cu__free(struct cu *cu, void *ptr)
 {
+	// FIXME it seems like loading split dwarf, causes multiple
+	// references to the same thing, and thus current clean up &
+	// free code aborts on double free's
+	return;
 	if (!cu->use_obstack)
 		free(ptr);
 
@@ -711,6 +715,10 @@ out_free:
 
 void cu__delete(struct cu *cu)
 {
+	// FIXME it seems like loading split dwarf, causes multiple
+	// references to the same thing, and thus current clean up &
+	// free code aborts on double free's
+	return;
 	if (cu == NULL)
 		return;
 
diff --git a/sample.c b/sample.c
new file mode 100644
index 0000000000..bff48d3139
--- /dev/null
+++ b/sample.c
@@ -0,0 +1,10 @@
+#include <stdio.h>
+struct Sample {
+  char sample_string[50];
+  int  sample_integer;
+  float sample_float;
+};
+void main() {
+	struct Sample sample = {.sample_integer = 3};
+	printf("%i\n", sample.sample_integer);
+}
diff --git a/sample.sh b/sample.sh
new file mode 100755
index 0000000000..241d4adec8
--- /dev/null
+++ b/sample.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+set -x
+rc=0
+gcc -gdwarf-5 -gsplit-dwarf ./sample.c
+valgrind ./pahole ./a.out
+rc=$((rc + $?))
+gcc -gdwarf-4 -gsplit-dwarf ./sample.c
+valgrind ./pahole ./a.out
+rc=$((rc + $?))
+gcc -gdwarf-5 ./sample.c
+valgrind ./pahole ./a.out
+rc=$((rc + $?))
+gcc -gdwarf-4 ./sample.c
+valgrind ./pahole ./a.out
+rc=$((rc + $?))
+rm -f a.out
+exit $rc
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Porting pahole from dwarf_next_unit() to dwarf_get_units()
  2023-12-05 16:11   ` Dimitri John Ledkov
  2023-12-05 17:43     ` [PATCH] HACK: port dwarf_loader to dwarf_get_units Dimitri John Ledkov
@ 2023-12-05 18:46     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-12-05 18:46 UTC (permalink / raw)
  To: Dimitri John Ledkov; +Cc: dwarves, elfutils-devel, mark

Em Tue, Dec 05, 2023 at 04:11:06PM +0000, Dimitri John Ledkov escreveu:
> I
> 
> On Tue, 5 Dec 2023, 15:47 Arnaldo Carvalho de Melo, <acme@kernel.org> wrote:
> >
> > Em Tue, Dec 05, 2023 at 01:03:01PM +0000, Dimitri John Ledkov escreveu:
> > > Currently pahole warns and does nothing upon hitting
> > > DW_TAG_skeleton_unit as implemented at
> > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> > >
> > > In elfutils, a while back a new API got added that aids with discovery
> > > and processing of such tags -
> > > https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=79f0e623dcde4b042bb72f636a2211d67d5c0ade
> > >
> > > It seems to me if pahole is ported from using dwarf_next_unit() to
> > > instead use dwarf_get_units() native support can be added for
> > > split-dwarf (dwo) files.
> >
> > That would be awesome!
> >
> > > I am trying to write such a port, but it is proving to be very
> > > difficult.
> >
> > I did some work on supporting split-dwarf months ago, but got
> > sidetracked with other work, BTF related and then the code bitrotted, I
> > have to go back looking at it to swap back the details into my brain:
> >
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=alt_dwarf
> >
> > The patches after:
> >
> > 45c044860c2abce7 dwarf_loader: Sync with LINUX_ELFNOTE_LTO_INFO macro from kernel
> >
> > Are the ones to suport alt dwarf.
> 
> I will read into those thanks.

Hopefully what you've been work ends up less convoluted by using
new elfutils functions, but it may be useful to help understand how the
internal pahole code deals with Dwarf offsets to reduce them to 32 bits
for conversion to CTF/BTF (where it is not really that important as
libbpf does this work).
 
> > > I am entirely unfamiliar with neither pahole nor libdw nor
> > > the dwarf file format. Thus it is very confusing when both pahole and
> > > dwarf library use very similar type names and structs. For example
> > > libdw has struct Dwarf_CU and pahole has unrelated dwarf_cu struct.
> >
> > > What are the differences between dwarf_nextcu(), dwarf_next_unit(),
> > > dwarf_get_units() and when should one use each one of them? (or nest
> > > them?)
> >
> > > Is a port of https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/dwarf_loader.c?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> > > to use dwarf_get_units() a right approach and would be welcomed?
> >
> > Yes, we need to support DWARF5 fully.

> ack
 
> > > Is anyone else interested in providing any help, or guidance?
> >
> > I'm interested, and I think if Mark could help it would be great as
> > well.
> 
> I have something that sort of works, but then like aboarts with
> invalid free's on exit - which the purist in me cares, but not sure if
> it is of practical value or not.
> And eu-readelf code also mentions that it deliberary leaks memory,
> because life is hard.

Off course ;-) Frees on exit are interesting to try to evaluate if
things that are not just frees on exit are leaking, when we're sure that
is the case, don't free on exit, as its just overhead.

> I will try to address or warn about memory leaks to see if stuff
> works, and post and RFC.

Great, thanks a lot for working on this!

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Porting pahole from dwarf_next_unit() to dwarf_get_units()
  2023-12-05 13:03 Porting pahole from dwarf_next_unit() to dwarf_get_units() Dimitri John Ledkov
  2023-12-05 15:47 ` Arnaldo Carvalho de Melo
@ 2024-01-14 22:29 ` Mark Wielaard
  1 sibling, 0 replies; 6+ messages in thread
From: Mark Wielaard @ 2024-01-14 22:29 UTC (permalink / raw)
  To: Dimitri John Ledkov; +Cc: dwarves, elfutils-devel, acme

Hi Dimitri,

Sorry, this arrived before my vacation and then the new year happened.

On Tue, Dec 05, 2023 at 01:03:01PM +0000, Dimitri John Ledkov wrote:
> Currently pahole warns and does nothing upon hitting
> DW_TAG_skeleton_unit as implemented at
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=0135ccd632796ab3aff65b7c99b374c4682c2bcf
> 
> In elfutils, a while back a new API got added that aids with discovery
> and processing of such tags -
> https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=79f0e623dcde4b042bb72f636a2211d67d5c0ade
> 
> It seems to me if pahole is ported from using dwarf_next_unit() to
> instead use dwarf_get_units() native support can be added for
> split-dwarf (dwo) files.
> 
> I am trying to write such a port, but it is proving to be very
> difficult. I am entirely unfamiliar with neither pahole nor libdw nor
> the dwarf file format. Thus it is very confusing when both pahole and
> dwarf library use very similar type names and structs. For example
> libdw has struct Dwarf_CU and pahole has unrelated dwarf_cu struct.
> 
> What are the differences between dwarf_nextcu(), dwarf_next_unit(),
> dwarf_get_units() and when should one use each one of them? (or nest
> them?)

The dwarf_nextcu was the original way to iterate over the CUs from
.debug_info. Then dwarf_next_unit was added when type units could come
from a .debug_types section. Both functions use and return offsets to
iterate through the section and then get the CU DIE using dwarf_offdie
(or dwarf_offdie_types). This requires the user to know beforehand
where to DIE data is stored (in the .debug_info or .debug_types
section).  For type units one also needs to use the type offset to
create the actual type DIE. In DWARF5 DIEs can come from even more
data locations. And there are also skeleton units which require the
user to find the associated split compile unit DIE (which would come
from a different file).
    
The new dwarf_get_units function simplifies iterating over the units
in a DWARF file. It doesn't require the user to know where the DIE
data is stored, it will automagically iterate over all know data
sources (sections) returning the Dwarf_CU and the associated Dwarf_Die
if requested. If the user requests to know the associated "subdie" it
will also be resolved.
    
A subdie is either a type DIE for a type unit or a split unit DIE for
a skeleton unit. The same (and some more) info about DWARF_CUs can
also be gotten through the dwarf_cu_info function.

You should either use dwarf_nextcu or dwarf_next_unit with
dwarf_offdie to get the (top-level) DIE. Or use dwarf_get_units and
possibly dwarf_cu_info. In general you shouldn't mix them.

Hope this helps and let me know if you need more info.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-14 22:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05 13:03 Porting pahole from dwarf_next_unit() to dwarf_get_units() Dimitri John Ledkov
2023-12-05 15:47 ` Arnaldo Carvalho de Melo
2023-12-05 16:11   ` Dimitri John Ledkov
2023-12-05 17:43     ` [PATCH] HACK: port dwarf_loader to dwarf_get_units Dimitri John Ledkov
2023-12-05 18:46     ` Porting pahole from dwarf_next_unit() to dwarf_get_units() Arnaldo Carvalho de Melo
2024-01-14 22:29 ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).