All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
@ 2024-02-28  6:21 Philip Lorenz
  2024-02-28  6:21 ` [RFC PATCH 1/1] " Philip Lorenz
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-02-28  6:21 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

With the introduction of debuginfod ([1]), providing debug symbols to
developers has been greatly simplified. Initial support for spawning a
debuginfod server is already available as part of poky.

However, this relies on debuginfod scraping the debug packages for their
build IDs. This is not only inefficient (as all packages need to be
extracted again), but it also does not scale well when covering a large
number of builds.

To mitigate this, we are currently working on an approach to extract the
metadata needed to provide debug symbols as part of the bitbake build.
This metadata includes the mapping of the GNU build ID to the package
holding the debug symbol. The metadata will be treated as another build
artifact and can be consumed by a daemon implementing the debuginfod
HTTP API to serve debug symbol file requests from the package feed
produced by the bitbake build.

Initially, we considered implementing the generation of debug metadata
directly as part of emit_pkgdata() in package.bbclass (disabled by
default). However, we discarded this idea as introducing a configuration
option would increase maintenance effort for a feature that would
potentially only be enabled in very few builds.  Instead, we opted to
extend package.bbclass to expose the minimal information needed to
reliably identify debug symbol files, which can then be consumed by a
packaging hook.

Is this extension something that is viable to be merged? We are
considering open-sourcing the other parts needed to implement the setup
described above, but as those parts are still in the prototyping phase,
it will require some more time.

[1] https://sourceware.org/elfutils/Debuginfod.html

Philip Lorenz (1):
  package.bbclass: Expose list of split out debug files

 meta/classes-global/package.bbclass |  4 ++++
 meta/lib/oe/package.py              | 19 ++++++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

-- 
2.43.2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/1] package.bbclass: Expose list of split out debug files
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
@ 2024-02-28  6:21 ` Philip Lorenz
  2024-02-28  7:41 ` [OE-core] [RFC PATCH 0/1] " Alexander Kanavin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-02-28  6:21 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

A packaging hook installed via PACKAGEFUNCS may want to access the list
of debug symbol files produced during packaging.

Correctly determining the list of debug files based on existing
variables is non-trivial, so this patch introduces the PKGDEBUGFILES
variable which holds the path to all files generated during stripping.

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
---
 meta/classes-global/package.bbclass |  4 ++++
 meta/lib/oe/package.py              | 19 ++++++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/meta/classes-global/package.bbclass b/meta/classes-global/package.bbclass
index aa1eb5e901..f2d358459f 100644
--- a/meta/classes-global/package.bbclass
+++ b/meta/classes-global/package.bbclass
@@ -67,6 +67,10 @@ PACKAGE_DEPENDS += "rpm-native dwarfsrcfiles-native"
 # tools at rootfs build time.
 PACKAGE_WRITE_DEPS ??= ""
 
+# List of files containing debug symbols. The paths are rooted at their
+# destination path (e.g. /usr/lib/.debug instead of ${PKGD}/usr/lib/.debug)
+PKGDEBUGFILES = ""
+
 def legitimize_package_name(s):
     return oe.package.legitimize_package_name(s)
 
diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
index 587810bdaf..921a958ed1 100644
--- a/meta/lib/oe/package.py
+++ b/meta/lib/oe/package.py
@@ -781,7 +781,7 @@ def splitdebuginfo(file, dvar, dv, d):
     # target system binary, the other contains any debugging information. The
     # two files are linked to reference each other.
     #
-    # return a mapping of files:debugsources
+    # return a mapping of files:debugfile:debugsources
 
     src = file[len(dvar):]
     dest = dv["libdir"] + os.path.dirname(src) + dv["dir"] + "/" + os.path.basename(src) + dv["append"]
@@ -791,7 +791,7 @@ def splitdebuginfo(file, dvar, dv, d):
     if file.endswith(".ko") and file.find("/lib/modules/") != -1:
         if oe.package.is_kernel_module_signed(file):
             bb.debug(1, "Skip strip on signed module %s" % file)
-            return (file, sources)
+            return (file, file, sources)
 
     # Split the file...
     bb.utils.mkdirhier(os.path.dirname(debugfile))
@@ -821,7 +821,7 @@ def splitdebuginfo(file, dvar, dv, d):
     if newmode:
         os.chmod(file, origmode)
 
-    return (file, sources)
+    return (file, debugfile, sources)
 
 def splitstaticdebuginfo(file, dvar, dv, d):
     # Unlike the function above, there is no way to split a static library
@@ -830,7 +830,7 @@ def splitstaticdebuginfo(file, dvar, dv, d):
     # We will then strip (preserving symbols) the static library in the
     # typical location.
     #
-    # return a mapping of files:debugsources
+    # return a mapping of files:debugfile:debugsources
 
     src = file[len(dvar):]
     dest = dv["staticlibdir"] + os.path.dirname(src) + dv["staticdir"] + "/" + os.path.basename(src) + dv["staticappend"]
@@ -861,7 +861,7 @@ def splitstaticdebuginfo(file, dvar, dv, d):
     if newmode:
         os.chmod(file, origmode)
 
-    return (file, sources)
+    return (file, debugfile, sources)
 
 def inject_minidebuginfo(file, dvar, dv, d):
     # Extract just the symbols from debuginfo into minidebuginfo,
@@ -1175,13 +1175,14 @@ def process_split_and_strip_files(d):
                 results = oe.utils.multiprocess_launch(splitstaticdebuginfo, staticlibs, d, extraargs=(dvar, dv, d))
             else:
                 for file in staticlibs:
-                    results.append( (file,source_info(file, d)) )
+                    results.append( (file,file,source_info(file, d)) )
 
-        d.setVar("PKGDEBUGSOURCES", {strip_pkgd_prefix(f): sorted(s) for f, s in results})
+        d.setVar("PKGDEBUGSOURCES", {strip_pkgd_prefix(f): sorted(s) for f, _, s in results})
+        d.setVar("PKGDEBUGFILES", [strip_pkgd_prefix(d) for _, d, _ in results])
 
         sources = set()
-        for r in results:
-            sources.update(r[1])
+        for _, _, sourcefile in results:
+            sources.update(sourcefile)
 
         # Hardlink our debug symbols to the other hardlink copies
         for ref in inodes:
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
  2024-02-28  6:21 ` [RFC PATCH 1/1] " Philip Lorenz
@ 2024-02-28  7:41 ` Alexander Kanavin
  2024-02-28 15:22   ` Philip Lorenz
  2024-02-28  9:14 ` Richard Purdie
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Alexander Kanavin @ 2024-02-28  7:41 UTC (permalink / raw)
  To: Philip Lorenz; +Cc: openembedded-core

On Wed, 28 Feb 2024 at 07:22, Philip Lorenz <philip.lorenz@bmw.de> wrote:
> However, this relies on debuginfod scraping the debug packages for their
> build IDs. This is not only inefficient (as all packages need to be
> extracted again), but it also does not scale well when covering a large
> number of builds.

Is it possible to see numbers behind this claim? When there is a
proposal to increase code complexity, that needs to be justified in a
way that can be locally observed.

> Is this extension something that is viable to be merged? We are
> considering open-sourcing the other parts needed to implement the setup
> described above, but as those parts are still in the prototyping phase,
> it will require some more time.

The patch looks okay, but it's not useful without those other parts,
so you need to get them ready and submit the whole set.

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
  2024-02-28  6:21 ` [RFC PATCH 1/1] " Philip Lorenz
  2024-02-28  7:41 ` [OE-core] [RFC PATCH 0/1] " Alexander Kanavin
@ 2024-02-28  9:14 ` Richard Purdie
  2024-02-28 15:41   ` Philip Lorenz
  2024-03-05 16:18 ` [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging Philip Lorenz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Purdie @ 2024-02-28  9:14 UTC (permalink / raw)
  To: Philip Lorenz, openembedded-core

On Wed, 2024-02-28 at 07:21 +0100, Philip Lorenz wrote:
> With the introduction of debuginfod ([1]), providing debug symbols to
> developers has been greatly simplified. Initial support for spawning a
> debuginfod server is already available as part of poky.
> 
> However, this relies on debuginfod scraping the debug packages for their
> build IDs. This is not only inefficient (as all packages need to be
> extracted again), but it also does not scale well when covering a large
> number of builds.
> 
> To mitigate this, we are currently working on an approach to extract the
> metadata needed to provide debug symbols as part of the bitbake build.
> This metadata includes the mapping of the GNU build ID to the package
> holding the debug symbol. The metadata will be treated as another build
> artifact and can be consumed by a daemon implementing the debuginfod
> HTTP API to serve debug symbol file requests from the package feed
> produced by the bitbake build.
> 
> Initially, we considered implementing the generation of debug metadata
> directly as part of emit_pkgdata() in package.bbclass (disabled by
> default). However, we discarded this idea as introducing a configuration
> option would increase maintenance effort for a feature that would
> potentially only be enabled in very few builds.  Instead, we opted to
> extend package.bbclass to expose the minimal information needed to
> reliably identify debug symbol files, which can then be consumed by a
> packaging hook.
> 
> Is this extension something that is viable to be merged? We are
> considering open-sourcing the other parts needed to implement the setup
> described above, but as those parts are still in the prototyping phase,
> it will require some more time.
> 
> [1] https://sourceware.org/elfutils/Debuginfod.html

I think this is the kind of direction we've wanted to go in. I'm not
sure the patch as it stands is that useful as it just lists files which
you could just as easily obtain with a os.walk on the filesystem but in
principle I'd be fine with writing some extra data during do_package or
do_packagedata which saves the buildid mappings.

So yes, in principle the idea sounds good but obviously the final
decision would depend upon the patches.

I'm assuming this data wouldn't be that large or that expensive to
compute so I'd prefer not to hide it behind extra configuration options
if we can help it. That does depend on the overheads/costs though.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28  7:41 ` [OE-core] [RFC PATCH 0/1] " Alexander Kanavin
@ 2024-02-28 15:22   ` Philip Lorenz
  0 siblings, 0 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-02-28 15:22 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: openembedded-core

Hi Alex,

On 28.02.24 08:41, Alexander Kanavin wrote:
> On Wed, 28 Feb 2024 at 07:22, Philip Lorenz <philip.lorenz@bmw.de> wrote:
>> However, this relies on debuginfod scraping the debug packages for their
>> build IDs. This is not only inefficient (as all packages need to be
>> extracted again), but it also does not scale well when covering a large
>> number of builds.
> Is it possible to see numbers behind this claim? When there is a
> proposal to increase code complexity, that needs to be justified in a
> way that can be locally observed.

Let me provide some numbers based on both an internal medium-sized build 
based on kirkstone as well as a core-image-minimal build based on master.

Kirkstone:

> find -name "*.ipk" | wc
>    8415    8415  615076
> du -h -c
> 3.5G    total

> time /bin/sh -c 'for f in */*.ipk; do ar p $f data.tar.xz | tar -tJ > 
> /dev/null; done'
>
> real    5m13.629s
> user    4m56.653s
> sys     1m41.578s

master (core-image-minimal):

> find -name "*.ipk" | wc
>    4553    4553  287890
> du -h -c
> 2.1G    total

> time /bin/sh -c 'for f in */*.ipk; do ar p $f data.tar.zst | tar 
> --zstd -t > /dev/null; done'
>
> real    1m2.521s
> user    0m40.876s
> sys     1m8.232s
Exact figures of course vary and this can be further optimized by 
introducing parallelism. However, given that the artifacts are available 
uncompressed during packaging and the packaging step is also the one 
responsible for splitting out the debug symbols so limiting build ID 
extraction to the files that are known to contain debug symbols also is 
an efficiency win (and one also avoid implementing any kind of 
heuristics to determine which files actually contain the debug symbols).

>
>> Is this extension something that is viable to be merged? We are
>> considering open-sourcing the other parts needed to implement the setup
>> described above, but as those parts are still in the prototyping phase,
>> it will require some more time.
> The patch looks okay, but it's not useful without those other parts,
> so you need to get them ready and submit the whole set.

I'll answer this as part of my reply to Richard. I'd be more than happy 
to share the tooling we use to produce the build ID metadata and this 
was more an issue of where to actually place it. The only thing that is 
not yet in a state ready for public consumption is our daemon that 
consumes this metadata and then transparently fulfills any incoming 
debuginfo requests by retrieving the debug file from the corresponding 
package.

Br,

Philip

-- 
Philip Lorenz
BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm
-------------------------------------------------------------------------
BMW Car IT GmbH
Management: Chris Brandt and Michael Böttrich
Domicile and Court of Registry: München HRB 134810
-------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28  9:14 ` Richard Purdie
@ 2024-02-28 15:41   ` Philip Lorenz
  2024-02-28 17:40     ` Alexander Kanavin
  0 siblings, 1 reply; 14+ messages in thread
From: Philip Lorenz @ 2024-02-28 15:41 UTC (permalink / raw)
  To: Richard Purdie; +Cc: openembedded-core

Hi Richard,

On 28.02.24 10:14, Richard Purdie wrote:
> On Wed, 2024-02-28 at 07:21 +0100, Philip Lorenz wrote:
>> With the introduction of debuginfod ([1]), providing debug symbols to
>> developers has been greatly simplified. Initial support for spawning a
>> debuginfod server is already available as part of poky.
>>
>> However, this relies on debuginfod scraping the debug packages for their
>> build IDs. This is not only inefficient (as all packages need to be
>> extracted again), but it also does not scale well when covering a large
>> number of builds.
>>
>> To mitigate this, we are currently working on an approach to extract the
>> metadata needed to provide debug symbols as part of the bitbake build.
>> This metadata includes the mapping of the GNU build ID to the package
>> holding the debug symbol. The metadata will be treated as another build
>> artifact and can be consumed by a daemon implementing the debuginfod
>> HTTP API to serve debug symbol file requests from the package feed
>> produced by the bitbake build.
>>
>> Initially, we considered implementing the generation of debug metadata
>> directly as part of emit_pkgdata() in package.bbclass (disabled by
>> default). However, we discarded this idea as introducing a configuration
>> option would increase maintenance effort for a feature that would
>> potentially only be enabled in very few builds.  Instead, we opted to
>> extend package.bbclass to expose the minimal information needed to
>> reliably identify debug symbol files, which can then be consumed by a
>> packaging hook.
>>
>> Is this extension something that is viable to be merged? We are
>> considering open-sourcing the other parts needed to implement the setup
>> described above, but as those parts are still in the prototyping phase,
>> it will require some more time.
>>
>> [1] https://sourceware.org/elfutils/Debuginfod.html
> I think this is the kind of direction we've wanted to go in. I'm not
> sure the patch as it stands is that useful as it just lists files which
> you could just as easily obtain with a os.walk on the filesystem but in
> principle I'd be fine with writing some extra data during do_package or
> do_packagedata which saves the buildid mappings.
In one of my first iterations I placed the build ID to file mapping into 
the "extended" section of "pkgdata". We'd then consume this data after 
the build has finished to produce the debug info metadata database which 
contains the mapping from build ID to debug symbol file and the package 
containing the file. If this sounds sane to you I can clean up that 
version and share it here.
> So yes, in principle the idea sounds good but obviously the final
> decision would depend upon the patches.
>
> I'm assuming this data wouldn't be that large or that expensive to
> compute so I'd prefer not to hide it behind extra configuration options
> if we can help it. That does depend on the overheads/costs though.
>
I just executed build ID extraction on the debug packages of our medium 
sized kirkstone based distro (see my reply to Alex for more details). 
Sequentially extracting build IDs from around 8000 files took around 
1:30 minutes on my machine. While I wouldn't call this excessive, I am 
also not sure whether this is too much overhead given that I only expect 
this data to be used in some deployments.

Br,

Philip


-- 
Philip Lorenz
BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm
-------------------------------------------------------------------------
BMW Car IT GmbH
Management: Chris Brandt and Michael Böttrich
Domicile and Court of Registry: München HRB 134810
-------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28 15:41   ` Philip Lorenz
@ 2024-02-28 17:40     ` Alexander Kanavin
  2024-02-29  8:20       ` Philip Lorenz
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Kanavin @ 2024-02-28 17:40 UTC (permalink / raw)
  To: Philip Lorenz; +Cc: Richard Purdie, openembedded-core

On Wed, 28 Feb 2024 at 16:41, Philip Lorenz <philip.lorenz@bmw.de> wrote:
> > I'm assuming this data wouldn't be that large or that expensive to
> > compute so I'd prefer not to hide it behind extra configuration options
> > if we can help it. That does depend on the overheads/costs though.
> >
> I just executed build ID extraction on the debug packages of our medium
> sized kirkstone based distro (see my reply to Alex for more details).
> Sequentially extracting build IDs from around 8000 files took around
> 1:30 minutes on my machine. While I wouldn't call this excessive, I am
> also not sure whether this is too much overhead given that I only expect
> this data to be used in some deployments.

I have to object to the numbers because they were done with a
sequential shell loop. Debuginfod does it in threads and is able to
complete the scans much faster. So you need to check how quickly it
completes its job when started with oe-debuginfod rather. There might
be an improvement coming from what you are proposing, but it's most
likely not going to be as drastic.

From debuginfod manpage:

-c NUM --concurrency=NUM
Set the concurrency limit for the scanning queue threads, which work
together to process archives & files located by the traversal thread.
This important for controlling CPU-intensive operations like parsing
an ELF file and especially decompressing archives. The default is the
number of processors on the system; the minimum is 1.
https://manpages.debian.org/testing/debuginfod/debuginfod.8.en.html

There's also something else I noticed just now: there seems to be an
alternative implementation of debuginfod you want to introduce? Why?
If the original from elfutils isn't working well enough, shouldn't we
make it better?

One possibility is teaching it to mass-import pre-computed entries
into its index, so that sweeping file tree scans with archive
extractions can be avoided altogether. Or doing incremental index
imports directly from do_package.

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-28 17:40     ` Alexander Kanavin
@ 2024-02-29  8:20       ` Philip Lorenz
  2024-02-29  8:54         ` Alexander Kanavin
  0 siblings, 1 reply; 14+ messages in thread
From: Philip Lorenz @ 2024-02-29  8:20 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: Richard Purdie, openembedded-core

Hi Alex,

On 28.02.24 18:40, Alexander Kanavin wrote:
> On Wed, 28 Feb 2024 at 16:41, Philip Lorenz <philip.lorenz@bmw.de> wrote:
>>> I'm assuming this data wouldn't be that large or that expensive to
>>> compute so I'd prefer not to hide it behind extra configuration options
>>> if we can help it. That does depend on the overheads/costs though.
>>>
>> I just executed build ID extraction on the debug packages of our medium
>> sized kirkstone based distro (see my reply to Alex for more details).
>> Sequentially extracting build IDs from around 8000 files took around
>> 1:30 minutes on my machine. While I wouldn't call this excessive, I am
>> also not sure whether this is too much overhead given that I only expect
>> this data to be used in some deployments.
> I have to object to the numbers because they were done with a
> sequential shell loop. Debuginfod does it in threads and is able to
> complete the scans much faster. So you need to check how quickly it
> completes its job when started with oe-debuginfod rather. There might
> be an improvement coming from what you are proposing, but it's most
> likely not going to be as drastic.

I think there's some misunderstanding that I'd like to sort out first: 
This is in no way about deprecating or not using debuginfod. It however 
is an optimization on how build IDs are extracted which can be used by a 
variety of tools (such as debuginfod). As such a sequential scan should 
give a rough idea on how much time it takes to extract the build IDs 
during do_package (wall clock time is bound to differ). Based on this we 
can see that its not free but also not extremely expensive although I'd 
like to leave the judgement call on whether this something that should 
be enabled on all builds to someone else.

> There's also something else I noticed just now: there seems to be an
> alternative implementation of debuginfod you want to introduce? Why?
> If the original from elfutils isn't working well enough, shouldn't we
> make it better?

Let try to give you some sort of insight of how we are planning to use 
it and I hope this clarifies things:

In our case we are dealing with hundreds of bitbake builds whose 
artifacts (including package feeds) are published to some storage 
accessible via a HTTP. We would now like to offer a service that gives 
developers access to the debug files in a seamless way (i.e. we want to 
eliminate the process of manually having to download the debug packages 
matching a particular build). To accomplish this, our setup is based 
around a lightweight "gateway" daemon that translates a debuginfo HTTP 
request into a fetch of the corresponding package from the matching 
repository, extracting the debug symbol file and then serving that to 
the requesting client.

This is quite different to the way debuginfod works (which seems to be 
built around the idea of having the debug symbol files readily available 
via the file system) and I also see advantages in that approach when one 
has a fairly static set of debug symbol files one wants to serve. 
There's also some other non-functional requirements that would make 
deployment of debuginfod in our case quite difficult.

This is no way meant to be a fully fledged debuginfod reimplementation 
but a simple gateway between the debuginfod protocol and a backing 
package repository. I am not sure whether such an extension is in scope 
of the elfutils package.
> One possibility is teaching it to mass-import pre-computed entries
> into its index, so that sweeping file tree scans with archive
> extractions can be avoided altogether. Or doing incremental index
> imports directly from do_package.
Producing this data is exactly what this RFC is about. Using the 
extracted build ID information to optimize the import into debuginfod is 
one of the possible use cases but I'd also suggest to keep the extracted 
data agnostic of any concrete tooling (e.g. pkgdata).

Br,

Philip

-- 
Philip Lorenz
BMW Car IT GmbH, Software-Plattform, -Integration Connected Company, Lise-Meitner-Straße 14, 89081 Ulm
-------------------------------------------------------------------------
BMW Car IT GmbH
Management: Chris Brandt and Michael Böttrich
Domicile and Court of Registry: München HRB 134810
-------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files
  2024-02-29  8:20       ` Philip Lorenz
@ 2024-02-29  8:54         ` Alexander Kanavin
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Kanavin @ 2024-02-29  8:54 UTC (permalink / raw)
  To: Philip Lorenz; +Cc: Richard Purdie, openembedded-core

On Thu, 29 Feb 2024 at 09:20, Philip Lorenz <philip.lorenz@bmw.de> wrote:
> > One possibility is teaching it to mass-import pre-computed entries
> > into its index, so that sweeping file tree scans with archive
> > extractions can be avoided altogether. Or doing incremental index
> > imports directly from do_package.
> Producing this data is exactly what this RFC is about. Using the
> extracted build ID information to optimize the import into debuginfod is
> one of the possible use cases but I'd also suggest to keep the extracted
> data agnostic of any concrete tooling (e.g. pkgdata).

This is fair enough. But you need to think upfront about how producing
this data should be tested with just oe-core/poky and what use cases
it could have. Simple sanity check is ok, but improving debuginfod to
import the pre-computed values is much better. Maybe something else
too?

This also allows you to develop and publish the alternative service on
its own schedule and terms, if the code is not mature or BMW legal is
having a hard time signing off on making it public etc. We don't need
to see it, if there's a use case in core.

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
                   ` (2 preceding siblings ...)
  2024-02-28  9:14 ` Richard Purdie
@ 2024-03-05 16:18 ` Philip Lorenz
  2024-03-06  9:27   ` [OE-core] " Alexander Kanavin
  2024-03-05 16:18 ` [RFC PATCH v2 1/3] oe-pkgdata-util: Add read-extended command Philip Lorenz
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Philip Lorenz @ 2024-03-05 16:18 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

This is the follow up to "package.bbclass: Expose list of split out
debug files" adding the extraction of the build IDs directly into the
do_package task.

Build IDs are stored inside the "extended" section of the pkgdata and to
enable easier testing the "read-extended" command is added to
"oe-pkgdata-util".

Sequentially reading the build ID from ~8000 debug symbol files takes
approximately 90 seconds on my machine, and given that extraction
parallelises well, I deemed those figures low enough to enable this
feature by default without a configuration switch. Let me know if this
doesn't match your expectations.

Philip Lorenz (3):
  oe-pkgdata-util: Add read-extended command
  package: Expose list of split out debug files
  packagedata: Extract GNU build ID during pkgdata creation

 meta/classes-global/package.bbclass     |  4 +++
 meta/lib/oe/package.py                  | 19 ++++++-------
 meta/lib/oe/packagedata.py              | 25 +++++++++++++++++
 meta/lib/oeqa/selftest/cases/pkgdata.py | 18 +++++++++++++
 scripts/oe-pkgdata-util                 | 36 +++++++++++++++++++++++++
 5 files changed, 93 insertions(+), 9 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 1/3] oe-pkgdata-util: Add read-extended command
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
                   ` (3 preceding siblings ...)
  2024-03-05 16:18 ` [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging Philip Lorenz
@ 2024-03-05 16:18 ` Philip Lorenz
  2024-03-05 16:18 ` [RFC PATCH v2 2/3] package: Expose list of split out debug files Philip Lorenz
  2024-03-05 16:18 ` [RFC PATCH v2 3/3] packagedata: Extract GNU build ID during pkgdata creation Philip Lorenz
  6 siblings, 0 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-03-05 16:18 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

So far, reading the "extended" data of a package stored within "pkgdata"
is not supported. Extend oe-pkgdata-util to support this use case.

For symmetry to `read-value` and `package-info` it expects the runtime
package name as its package name. Passing in multiple packages is not
supported as this would require further processing by clients using this
command before the returned JSON payload can be parsed.

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
---
 meta/lib/oeqa/selftest/cases/pkgdata.py | 16 +++++++++++
 scripts/oe-pkgdata-util                 | 36 +++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/meta/lib/oeqa/selftest/cases/pkgdata.py b/meta/lib/oeqa/selftest/cases/pkgdata.py
index d786c33018d..6c5b7a84f47 100644
--- a/meta/lib/oeqa/selftest/cases/pkgdata.py
+++ b/meta/lib/oeqa/selftest/cases/pkgdata.py
@@ -4,7 +4,9 @@
 # SPDX-License-Identifier: MIT
 #
 
+import json
 import os
+import pathlib
 import tempfile
 import fnmatch
 
@@ -225,3 +227,17 @@ class OePkgdataUtilTests(OESelftestTestCase):
         self.assertEqual(result.status, 2, "Status different than 2. output: %s" % result.output)
         currpos = result.output.find('usage: oe-pkgdata-util')
         self.assertTrue(currpos != -1, msg = "Test is Failed. Help is not Displayed in %s" % result.output)
+
+    def test_read_extended(self):
+        result = runCmd('oe-pkgdata-util read-extended libz-dbg')
+        extended_data = json.loads(result.output)
+
+        self.assertIn('files_info', extended_data, "Could not find key 'files_info' in '%s'" % extended_data)
+
+        files_info = extended_data['files_info']
+        libz_file_name = next((key for key in files_info.keys() \
+            if pathlib.Path(key).name.startswith('libz')), None)
+        self.assertIsNotNone(libz_file_name, "Couldn't find libz in '%s'" % files_info)
+
+        file_info = files_info[libz_file_name]
+        self.assertIn('size', file_info, "Couldn't find key 'size' in '%s'" % file_info)
diff --git a/scripts/oe-pkgdata-util b/scripts/oe-pkgdata-util
index 44ae40549ae..50be8e0bb60 100755
--- a/scripts/oe-pkgdata-util
+++ b/scripts/oe-pkgdata-util
@@ -16,6 +16,8 @@ import fnmatch
 import re
 import argparse
 import logging
+import pathlib
+import subprocess
 from collections import defaultdict, OrderedDict
 
 scripts_path = os.path.dirname(os.path.realpath(__file__))
@@ -206,6 +208,34 @@ def read_value(args):
         else:
             logger.debug("revlink %s does not exist", revlink)
 
+def read_extended(args):
+    if not args.pkg:
+        logger.error("No package specified")
+        sys.exit(1)
+
+    logger.debug("read-extended('%s', '%s')" % (args.pkgdata_dir, args.pkg))
+
+    pkgdata_dir = pathlib.Path(args.pkgdata_dir)
+    pkg_name = args.pkg.split('_')[0]
+
+    # Map runtime package name to recipe-world
+    runtimepkgpath = pkgdata_dir / "runtime-reverse" / pkg_name
+    recipe_pkg_name = runtimepkgpath.readlink().name
+
+    extendedpath = pkgdata_dir / "extended" / ("%s.json.zstd" % recipe_pkg_name)
+
+    if not extendedpath.exists():
+        logger.error("Extended package information '%s' does not exist", extendedpath)
+        sys.exit(1)
+
+    try:
+        info = subprocess.check_output(["zstdcat", extendedpath]).decode("utf-8")
+        print(info)
+    except subprocess.CalledProcessError as exc:
+        logger.error("Failed to decompress '%s': %s", extendedpath, exc, exc_info=exc)
+        sys.exit(1)
+
+
 def lookup_pkglist(pkgs, pkgdata_dir, reverse):
     if reverse:
         mappings = OrderedDict()
@@ -586,6 +616,12 @@ def main():
     parser_read_value.add_argument('-u', '--unescape', help='Expand escapes such as \\n', action='store_true')
     parser_read_value.set_defaults(func=read_value)
 
+    parser_read_extended = subparsers.add_parser('read-extended',
+                                                   help='Read extended pkgdata for a package',
+                                                   description='Outputs the extended data content of a package')
+    parser_read_extended.add_argument('pkg', help='Package name to look up')
+    parser_read_extended.set_defaults(func=read_extended)
+
     parser_glob = subparsers.add_parser('glob',
                                           help='Expand package name glob expression',
                                           description='Expands one or more glob expressions over the packages listed in pkglistfile')
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 2/3] package: Expose list of split out debug files
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
                   ` (4 preceding siblings ...)
  2024-03-05 16:18 ` [RFC PATCH v2 1/3] oe-pkgdata-util: Add read-extended command Philip Lorenz
@ 2024-03-05 16:18 ` Philip Lorenz
  2024-03-05 16:18 ` [RFC PATCH v2 3/3] packagedata: Extract GNU build ID during pkgdata creation Philip Lorenz
  6 siblings, 0 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-03-05 16:18 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

As correctly determining the list of debug files based on existing
variables is non-trivial, this patch introduces the PKGDEBUGFILES
variable which holds the path to all files generated during stripping.

This list may then be used for further processing, such as extracting
the GNU build ID of all files containing debug symbols.

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
---
 meta/classes-global/package.bbclass |  4 ++++
 meta/lib/oe/package.py              | 19 ++++++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/meta/classes-global/package.bbclass b/meta/classes-global/package.bbclass
index aa1eb5e901c..f2d358459f8 100644
--- a/meta/classes-global/package.bbclass
+++ b/meta/classes-global/package.bbclass
@@ -67,6 +67,10 @@ PACKAGE_DEPENDS += "rpm-native dwarfsrcfiles-native"
 # tools at rootfs build time.
 PACKAGE_WRITE_DEPS ??= ""
 
+# List of files containing debug symbols. The paths are rooted at their
+# destination path (e.g. /usr/lib/.debug instead of ${PKGD}/usr/lib/.debug)
+PKGDEBUGFILES = ""
+
 def legitimize_package_name(s):
     return oe.package.legitimize_package_name(s)
 
diff --git a/meta/lib/oe/package.py b/meta/lib/oe/package.py
index 587810bdafd..921a958ed1f 100644
--- a/meta/lib/oe/package.py
+++ b/meta/lib/oe/package.py
@@ -781,7 +781,7 @@ def splitdebuginfo(file, dvar, dv, d):
     # target system binary, the other contains any debugging information. The
     # two files are linked to reference each other.
     #
-    # return a mapping of files:debugsources
+    # return a mapping of files:debugfile:debugsources
 
     src = file[len(dvar):]
     dest = dv["libdir"] + os.path.dirname(src) + dv["dir"] + "/" + os.path.basename(src) + dv["append"]
@@ -791,7 +791,7 @@ def splitdebuginfo(file, dvar, dv, d):
     if file.endswith(".ko") and file.find("/lib/modules/") != -1:
         if oe.package.is_kernel_module_signed(file):
             bb.debug(1, "Skip strip on signed module %s" % file)
-            return (file, sources)
+            return (file, file, sources)
 
     # Split the file...
     bb.utils.mkdirhier(os.path.dirname(debugfile))
@@ -821,7 +821,7 @@ def splitdebuginfo(file, dvar, dv, d):
     if newmode:
         os.chmod(file, origmode)
 
-    return (file, sources)
+    return (file, debugfile, sources)
 
 def splitstaticdebuginfo(file, dvar, dv, d):
     # Unlike the function above, there is no way to split a static library
@@ -830,7 +830,7 @@ def splitstaticdebuginfo(file, dvar, dv, d):
     # We will then strip (preserving symbols) the static library in the
     # typical location.
     #
-    # return a mapping of files:debugsources
+    # return a mapping of files:debugfile:debugsources
 
     src = file[len(dvar):]
     dest = dv["staticlibdir"] + os.path.dirname(src) + dv["staticdir"] + "/" + os.path.basename(src) + dv["staticappend"]
@@ -861,7 +861,7 @@ def splitstaticdebuginfo(file, dvar, dv, d):
     if newmode:
         os.chmod(file, origmode)
 
-    return (file, sources)
+    return (file, debugfile, sources)
 
 def inject_minidebuginfo(file, dvar, dv, d):
     # Extract just the symbols from debuginfo into minidebuginfo,
@@ -1175,13 +1175,14 @@ def process_split_and_strip_files(d):
                 results = oe.utils.multiprocess_launch(splitstaticdebuginfo, staticlibs, d, extraargs=(dvar, dv, d))
             else:
                 for file in staticlibs:
-                    results.append( (file,source_info(file, d)) )
+                    results.append( (file,file,source_info(file, d)) )
 
-        d.setVar("PKGDEBUGSOURCES", {strip_pkgd_prefix(f): sorted(s) for f, s in results})
+        d.setVar("PKGDEBUGSOURCES", {strip_pkgd_prefix(f): sorted(s) for f, _, s in results})
+        d.setVar("PKGDEBUGFILES", [strip_pkgd_prefix(d) for _, d, _ in results])
 
         sources = set()
-        for r in results:
-            sources.update(r[1])
+        for _, _, sourcefile in results:
+            sources.update(sourcefile)
 
         # Hardlink our debug symbols to the other hardlink copies
         for ref in inodes:
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 3/3] packagedata: Extract GNU build ID during pkgdata creation
  2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
                   ` (5 preceding siblings ...)
  2024-03-05 16:18 ` [RFC PATCH v2 2/3] package: Expose list of split out debug files Philip Lorenz
@ 2024-03-05 16:18 ` Philip Lorenz
  6 siblings, 0 replies; 14+ messages in thread
From: Philip Lorenz @ 2024-03-05 16:18 UTC (permalink / raw)
  To: openembedded-core; +Cc: Philip Lorenz

Extract the GNU build ID from all files containing debug symbols and
store it within the "extended" package information of pkgdata as an
attribute of the files contained within a package - e.g.

{
  "files_info": {
    "/lib/.debug/ld-linux-x86-64.so.2": {
      "gnu_build_id": "a165bf2a6c9c6a0818450293bd6bb66a316eaa4f",
      "size":1170552
    }
  }
}

Tools can then consume this data for different purposes such as building
an build ID index into the generated package feed or to preseed the
database of a debug info server such as debuginfod.

Sequentially reading out the GNU build ID of ~8000 debug symbol files
(from a core-image-minimal build) took approximately 90 seconds on my
machine. Given that the read out in a typical build will be highly
parallel, I deemed this figure low enough to simply enable it without
an additional configuration flag.

Signed-off-by: Philip Lorenz <philip.lorenz@bmw.de>
---
 meta/lib/oe/packagedata.py              | 25 +++++++++++++++++++++++++
 meta/lib/oeqa/selftest/cases/pkgdata.py |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/meta/lib/oe/packagedata.py b/meta/lib/oe/packagedata.py
index 2d1d6ddeb75..3404c2a5cd2 100644
--- a/meta/lib/oe/packagedata.py
+++ b/meta/lib/oe/packagedata.py
@@ -255,6 +255,27 @@ fi
         fd.write("PACKAGES: %s\n" % packages)
 
     pkgdebugsource = d.getVar("PKGDEBUGSOURCES") or []
+    pkgdebugfiles = d.getVar("PKGDEBUGFILES") or []
+
+    pkgd = d.getVar("PKGD")
+    readelf = d.getVar("READELF")
+
+    def extract_gnu_build_id(file):
+        import subprocess
+
+        cmd = "%s -n '%s' | grep '^    Build ID: '" % (readelf, pkgd + file)
+        try:
+            result = subprocess.check_output(cmd, shell=True)
+            # If grep hadn't matched it would've returned a non-zero exit code
+            # and the CalledProcessError would've been raised. It is therefore
+            # safe to assume that the output has the format "    Build ID: "
+            gnu_build_id = result[result.rfind(b" ") + 1:].rstrip().decode()
+            return (file, gnu_build_id)
+        except subprocess.CalledProcessError:
+            return (None, None)
+
+    pkg_debug_build_ids = { debug_file: build_id \
+        for debug_file, build_id in oe.utils.multiprocess_launch(extract_gnu_build_id, pkgdebugfiles, d) if debug_file}
 
     pn = d.getVar('PN')
     global_variants = (d.getVar('MULTILIB_GLOBAL_VARIANTS') or "").split()
@@ -300,6 +321,10 @@ fi
             if fpath in pkgdebugsource:
                 extended_data["files_info"][fpath]['debugsrc'] = pkgdebugsource[fpath]
                 del pkgdebugsource[fpath]
+            if fpath in pkg_debug_build_ids:
+                extended_data["files_info"][fpath]['gnu_build_id'] = pkg_debug_build_ids[fpath]
+                del pkg_debug_build_ids[fpath]
+
 
         d.setVar('FILES_INFO:' + pkg , json.dumps(files, sort_keys=True))
 
diff --git a/meta/lib/oeqa/selftest/cases/pkgdata.py b/meta/lib/oeqa/selftest/cases/pkgdata.py
index 6c5b7a84f47..8b993261055 100644
--- a/meta/lib/oeqa/selftest/cases/pkgdata.py
+++ b/meta/lib/oeqa/selftest/cases/pkgdata.py
@@ -241,3 +241,5 @@ class OePkgdataUtilTests(OESelftestTestCase):
 
         file_info = files_info[libz_file_name]
         self.assertIn('size', file_info, "Couldn't find key 'size' in '%s'" % file_info)
+        self.assertIn('gnu_build_id', file_info, "Couldn't find key 'gnu_build_id' in '%s'" % file_info)
+        self.assertGreater(len(file_info['gnu_build_id']), 0)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [OE-core] [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging
  2024-03-05 16:18 ` [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging Philip Lorenz
@ 2024-03-06  9:27   ` Alexander Kanavin
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Kanavin @ 2024-03-06  9:27 UTC (permalink / raw)
  To: Philip Lorenz; +Cc: openembedded-core

The patchset looks okay, but it'll probably have to wait until after LTS.

Alex

On Tue, 5 Mar 2024 at 17:19, Philip Lorenz <philip.lorenz@bmw.de> wrote:
>
> This is the follow up to "package.bbclass: Expose list of split out
> debug files" adding the extraction of the build IDs directly into the
> do_package task.
>
> Build IDs are stored inside the "extended" section of the pkgdata and to
> enable easier testing the "read-extended" command is added to
> "oe-pkgdata-util".
>
> Sequentially reading the build ID from ~8000 debug symbol files takes
> approximately 90 seconds on my machine, and given that extraction
> parallelises well, I deemed those figures low enough to enable this
> feature by default without a configuration switch. Let me know if this
> doesn't match your expectations.
>
> Philip Lorenz (3):
>   oe-pkgdata-util: Add read-extended command
>   package: Expose list of split out debug files
>   packagedata: Extract GNU build ID during pkgdata creation
>
>  meta/classes-global/package.bbclass     |  4 +++
>  meta/lib/oe/package.py                  | 19 ++++++-------
>  meta/lib/oe/packagedata.py              | 25 +++++++++++++++++
>  meta/lib/oeqa/selftest/cases/pkgdata.py | 18 +++++++++++++
>  scripts/oe-pkgdata-util                 | 36 +++++++++++++++++++++++++
>  5 files changed, 93 insertions(+), 9 deletions(-)
>
> --
> 2.44.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#196637): https://lists.openembedded.org/g/openembedded-core/message/196637
> Mute This Topic: https://lists.openembedded.org/mt/104747697/1686489
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-06  9:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28  6:21 [RFC PATCH 0/1] package.bbclass: Expose list of split out debug files Philip Lorenz
2024-02-28  6:21 ` [RFC PATCH 1/1] " Philip Lorenz
2024-02-28  7:41 ` [OE-core] [RFC PATCH 0/1] " Alexander Kanavin
2024-02-28 15:22   ` Philip Lorenz
2024-02-28  9:14 ` Richard Purdie
2024-02-28 15:41   ` Philip Lorenz
2024-02-28 17:40     ` Alexander Kanavin
2024-02-29  8:20       ` Philip Lorenz
2024-02-29  8:54         ` Alexander Kanavin
2024-03-05 16:18 ` [RFC PATCH v2 0/3] package: Extract GNU build ID during packaging Philip Lorenz
2024-03-06  9:27   ` [OE-core] " Alexander Kanavin
2024-03-05 16:18 ` [RFC PATCH v2 1/3] oe-pkgdata-util: Add read-extended command Philip Lorenz
2024-03-05 16:18 ` [RFC PATCH v2 2/3] package: Expose list of split out debug files Philip Lorenz
2024-03-05 16:18 ` [RFC PATCH v2 3/3] packagedata: Extract GNU build ID during pkgdata creation Philip Lorenz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.