All of lore.kernel.org
 help / color / mirror / Atom feed
* Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?)
@ 2022-01-21  1:31 Bill Messmer
  2022-01-21 18:17 ` Randy Dunlap
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Messmer @ 2022-01-21  1:31 UTC (permalink / raw)
  To: linux-kernel

Hello,

It has been my understanding for some time that the kernel config option CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS (and the corresponding bit 4 of the coredump filter) was, at one point, added for the purpose of ensuring that the GNU build-id of ELF objects was included in core dumps.  The config description in Kconfig.binfmt even alludes to this in its description.

I am trying to understand why in the 5.10+ kernels, there was a change in the kernel that, instead of checking whether a given memory mapping had an ELF header in order to determine whether to include the page to checking whether the inode is executable.  The change in question:

	github.com/torvalds/linux/commit/429a22e776a2b9f85a2b9c53d8e647598b553dd1

In many distributions (e.g.: Ubuntu), the shared objects in /usr/lib and elsewhere are not marked as executable.  One of the net effects here is that the first page of shared objects on these distributions are no longer captured in core dumps.

A core dump taken on Ubuntu 21.10 (with the 5.13 kernel) will, by default, not include these pages:

  LOAD           0x0000000000007000 0x00007f375855f000 0x0000000000000000
                 0x0000000000000000 0x000000000002c000  R      0x1000

   0x00007f375855f000  0x00007f375858b000  0x0000000000000000
        /usr/lib/x86_64-linux-gnu/libc.so.6

Doing a quick "sudo chmod +x /usr/lib/x86_64-linux-gnu/libc.so.6" and repeating shows that it is:

  LOAD           0x0000000000007000 0x00007fefd5282000 0x0000000000000000
                 0x0000000000001000 0x000000000002c000  R      0x1000

    0x00007fefd5282000  0x00007fefd52ae000  0x0000000000000000
        /usr/lib/x86_64-linux-gnu/libc.so.6

Prior to running with 5.10+ kernels, I was always seeing the first page of shared objects (and the contained build-id) within core dumps (assuming the proper kernel config and core dump filter bits).  Not any longer.

The reason I ask this is that, as more teams here at Microsoft have products running on Linux (or in Linux containers), we have been pushing the crash reports for those up through the same post-mortem crash analysis infrastructure that we do for Windows.  That means that what has traditionally been the Windows debugger (e.g.: WinDbg) has, for some time, been able to open, debug, and analyze various Linux post-mortem crash formats.  Part of doing this on a post-mortem basis requires finding the original images and debug information for the executables and shared objects referenced in those core dumps.  Whether we do that via our own symbol servers or via a debuginfod service -- the post-mortem debugger needs access to the build-ids of those objects.

Until recently, finding these from a core dump has been stable and working quite well.  Of late, however, we have been seeing a number of crash reports (e.g.: from Debian or Ubuntu containers) where we can no longer find images & symbols based on the core dumps because this kernel change has caused the first page of shared object files to not be captured in core dumps.  I don't know how many post-mortem Linux crash analysis solutions this is affecting...  

Was the change here really the intent...?  or is this a kernel bug?

Sincerely,

Bill Messmer
wmessmer@microsoft.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?)
  2022-01-21  1:31 Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?) Bill Messmer
@ 2022-01-21 18:17 ` Randy Dunlap
  2022-01-21 18:39   ` Jann Horn
  0 siblings, 1 reply; 3+ messages in thread
From: Randy Dunlap @ 2022-01-21 18:17 UTC (permalink / raw)
  To: Bill Messmer, linux-kernel, Jann Horn

[add the patch author, Jann]


On 1/20/22 17:31, Bill Messmer wrote:
> Hello,
> 
> It has been my understanding for some time that the kernel config option CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS (and the corresponding bit 4 of the coredump filter) was, at one point, added for the purpose of ensuring that the GNU build-id of ELF objects was included in core dumps.  The config description in Kconfig.binfmt even alludes to this in its description.
> 
> I am trying to understand why in the 5.10+ kernels, there was a change in the kernel that, instead of checking whether a given memory mapping had an ELF header in order to determine whether to include the page to checking whether the inode is executable.  The change in question:
> 
> 	github.com/torvalds/linux/commit/429a22e776a2b9f85a2b9c53d8e647598b553dd1
> 

Bill,
You should send email(s) to the relevant people if you can identify them.
LKML is a huge pipe (hose) and people don't normally browse it.  :)


> In many distributions (e.g.: Ubuntu), the shared objects in /usr/lib and elsewhere are not marked as executable.  One of the net effects here is that the first page of shared objects on these distributions are no longer captured in core dumps.
> 
> A core dump taken on Ubuntu 21.10 (with the 5.13 kernel) will, by default, not include these pages:
> 
>   LOAD           0x0000000000007000 0x00007f375855f000 0x0000000000000000
>                  0x0000000000000000 0x000000000002c000  R      0x1000
> 
>    0x00007f375855f000  0x00007f375858b000  0x0000000000000000
>         /usr/lib/x86_64-linux-gnu/libc.so.6
> 
> Doing a quick "sudo chmod +x /usr/lib/x86_64-linux-gnu/libc.so.6" and repeating shows that it is:
> 
>   LOAD           0x0000000000007000 0x00007fefd5282000 0x0000000000000000
>                  0x0000000000001000 0x000000000002c000  R      0x1000
> 
>     0x00007fefd5282000  0x00007fefd52ae000  0x0000000000000000
>         /usr/lib/x86_64-linux-gnu/libc.so.6
> 
> Prior to running with 5.10+ kernels, I was always seeing the first page of shared objects (and the contained build-id) within core dumps (assuming the proper kernel config and core dump filter bits).  Not any longer.
> 
> The reason I ask this is that, as more teams here at Microsoft have products running on Linux (or in Linux containers), we have been pushing the crash reports for those up through the same post-mortem crash analysis infrastructure that we do for Windows.  That means that what has traditionally been the Windows debugger (e.g.: WinDbg) has, for some time, been able to open, debug, and analyze various Linux post-mortem crash formats.  Part of doing this on a post-mortem basis requires finding the original images and debug information for the executables and shared objects referenced in those core dumps.  Whether we do that via our own symbol servers or via a debuginfod service -- the post-mortem debugger needs access to the build-ids of those objects.
> 
> Until recently, finding these from a core dump has been stable and working quite well.  Of late, however, we have been seeing a number of crash reports (e.g.: from Debian or Ubuntu containers) where we can no longer find images & symbols based on the core dumps because this kernel change has caused the first page of shared object files to not be captured in core dumps.  I don't know how many post-mortem Linux crash analysis solutions this is affecting...  
> 
> Was the change here really the intent...?  or is this a kernel bug?
> 
> Sincerely,
> 
> Bill Messmer
> wmessmer@microsoft.com

-- 
~Randy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?)
  2022-01-21 18:17 ` Randy Dunlap
@ 2022-01-21 18:39   ` Jann Horn
  0 siblings, 0 replies; 3+ messages in thread
From: Jann Horn @ 2022-01-21 18:39 UTC (permalink / raw)
  To: Bill Messmer; +Cc: linux-kernel, Randy Dunlap, Linus Torvalds

On Fri, Jan 21, 2022 at 7:18 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> On 1/20/22 17:31, Bill Messmer wrote:
> > Hello,
> >
> > It has been my understanding for some time that the kernel config option CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS (and the corresponding bit 4 of the coredump filter) was, at one point, added for the purpose of ensuring that the GNU build-id of ELF objects was included in core dumps.  The config description in Kconfig.binfmt even alludes to this in its description.
> >
> > I am trying to understand why in the 5.10+ kernels, there was a change in the kernel that, instead of checking whether a given memory mapping had an ELF header in order to determine whether to include the page to checking whether the inode is executable.  The change in question:
> >
> >       github.com/torvalds/linux/commit/429a22e776a2b9f85a2b9c53d8e647598b553dd1

As the commit message says, it was an attempt to avoid a deadlock
without making the code overly complicated. Clearly that didn't go as
planned...

> > In many distributions (e.g.: Ubuntu), the shared objects in /usr/lib and elsewhere are not marked as executable.

Urgh, crap. I'm looking around on my Debian box now, and I also see
that some libraries (like ld.so and libc) are marked executable, but
many others are not...

[...]
> > Was the change here really the intent...?  or is this a kernel bug?

Yeah, that's a bug. Linus suggested it as a way to simplify my
original patch (https://lore.kernel.org/all/CAHk-=wiOqR-4jXpPe-5PBKSCwQQFDaiJwkJr6ULwhcN8DJoG0A@mail.gmail.com/)
and it seemed like a good idea to me...

I guess the good news is that the original patch
https://lore.kernel.org/all/20200818061239.29091-5-jannh@google.com/
already has the code for doing it properly, so it should be pretty
straightforward to fix this by just pasting over some bits from the
old patch... I'll try to get around to that soon.

This would be so much nicer if the kernel actually knew what is a
library mapping and what isn't... oh well.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-21 18:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21  1:31 Issue With Kernel Changes To Core Dump Collection (Kernel Bug...?) Bill Messmer
2022-01-21 18:17 ` Randy Dunlap
2022-01-21 18:39   ` Jann Horn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.