Re: Segmentation fault with drgn + libkdumpfile

From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Omar Sandoval <osandov@osandov.com>, linux-debuggers@vger.kernel.org
Subject: Re: Segmentation fault with drgn + libkdumpfile
Date: Tue, 09 Jan 2024 17:40:15 -0800	[thread overview]
Message-ID: <87sf36yni8.fsf@oracle.com> (raw)
In-Reply-To: <20240109100609.4e956beb@meshulam.tesarici.cz>

Petr Tesařík <petr@tesarici.cz> writes:

> On Mon, 8 Jan 2024 21:40:08 +0100
> Petr Tesařík <petr@tesarici.cz> wrote:
>
>> On Fri, 05 Jan 2024 13:53:15 -0800
>> Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> 
>> > Petr Tesařík <petr@tesarici.cz> writes:  
>> > > On Fri, 05 Jan 2024 10:38:16 -0800
>> > > Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> > >    
>> > >> Hi Petr,
>> > >> 
>> > >> I recently encountered a segmentation fault with libkdumpfile & drgn
>> > >> which appears to be related to the cache implementation. I've included
>> > >> the stack trace at the end of this message, since it's a bit of a longer
>> > >> one. The exact issue occurred with a test vmcore that I could probably
>> > >> share with you privately if you'd like. In any case, the reproducer is
>> > >> fairly straightforward in drgn code:
>> > >> 
>> > >> for t in for_each_task(prog):
>> > >>     prog.stack_trace(t)
>> > >> for t in for_each_task(prog):
>> > >>     prog.stack_trace(t)
>> > >> 
>> > >> The repetition is required, the segfault only occurs on the second
>> > >> iteration of the loop. Which, in hindsight, is a textbook sign that the
>> > >> issue has to do with caching. I'd expect that the issue is specific to
>> > >> this vmcore, it doesn't reproduce on others.
>> > >> 
>> > >> I stuck that into a git bisect script and bisected the libkdumpfile
>> > >> commit that introduced it:
>> > >> 
>> > >> commit 487a8042ea5da580e1fdb5b8f91c8bd7cad05cd6
>> > >> Author: Petr Tesarik <petr@tesarici.cz>
>> > >> Date:   Wed Jan 11 22:53:01 2023 +0100
>> > >> 
>> > >>     Cache: Calculate eprobe in reinit_entry()
>> > >> 
>> > >>     If this function is called to reuse a ghost entry, the probe list
>> > >>     has not been walked yet, so eprobe is left uninitialized.
>> > >> 
>> > >>     This passed the test case, because the correct old value was left
>> > >>     on stack. Modify the test case to poison the stack.
>> > >> 
>> > >>     Signed-off-by: Petr Tesarik <petr@tesarici.cz>
>> > >> 
>> > >>  src/kdumpfile/cache.c      |  6 +++++-
>> > >>  src/kdumpfile/test-cache.c | 13 +++++++++++++
>> > >>  2 files changed, 18 insertions(+), 1 deletion(-)    
>> > >
>> > > This looks like a red herring to me. The cache most likely continues in
>> > > a corrupted state without this commit, which may mask the issue (until
>> > > it resurfaces later).    
>> > 
>> > I see, that makes a lot of sense.
>> >   
>> > >> I haven't yet tried to debug the logic of the cache implementation and
>> > >> create a patch. I'm totally willing to try that, but I figured I would
>> > >> send this report to you first, to see if there's something obvious that
>> > >> sticks out to your eyes.    
>> > >
>> > > No, but I should be able to recreate the issue if I get a log of the
>> > > cache API calls:
>> > >
>> > > - cache_alloc() - to know the number of elements
>> > > - cache_get_entry()
>> > > - cache_put_entry()
>> > > - cache_insert()
>> > > - cache_discard()
>> > > - cache_flush() - not likely after initialization, but...    
>> > 
>> > I went ahead and logged each of these calls as you suggested, I tried to
>> > log them at the beginning of the function call and always include the
>> > cache pointer, cache_entry, and the key. I took the resulting log and
>> > filtered it to just contain the most recently logged cache prior to the
>> > crash, compressed it, and attached it. For completeness, the patch
>> > I used is below (applies to tip branch 8254897 ("Merge pull request #78
>> > from fweimer-rh/c99")).
>> > 
>> > I'll also see if I can reproduce it based on the log.  
>> 
>> Thank you for the log. I haven't had much time to look at it, but the
>> first line is a good hint already:
>> 
>> 0x56098b68c4c0: cache_alloc(1024, 0)
>> 
>> Zero size means the data pointers are managed by the caller, so this
>> must be the cache of mmap()'ed segments. That's the only cache which
>> installs a cleanup callback with set_cache_entry_cleanup(). There is
>> only one call to the cleanup callback for evicted entries in cache.c:
>> 
>> 		/* Get an unused cached entry. */
>> 		if (cs->nuprobe != 0 &&
>> 		    (cs->nuprec == 0 || cache->nprobe + bias > cache->dprobe))
>> 			evict = evict_probe(cache, cs);
>> 		else
>> 			evict = evict_prec(cache, cs);
>> 		if (cache->entry_cleanup)
>> 			cache->entry_cleanup(cache->cleanup_data, evict);
>> 
>> The entries can be evicted from the probe partition or from the precious
>> partition. This might be relevant. Please, can you re-run and log where
>> the evict entry comes from?
>
> I found some time this morning, and it wouldn't help. Because of a bug
> in fcache_new(), the number of elements in the cache is big enough that
> cache entries are never evicted in your case. It's quite weird to hit a
> cache metadata bug after elements have been inserted. FWIW I am not
> able to reproduce the bug by replaying the logged file read pattern.
>
> Since you have a reliable reproducer, it cannot be a Heisenbug. But it
> could be caused by the other cache - the cache of decompressed pages.
> Do you know for sure that lzo1x_decompress_safe() crashes while trying
> to _read_ from the input buffer, and not while trying to _write_ to the
> output buffer?

Hi Petr,

Sorry for the delay here, I got pulled into other issues and am trying
to attend to all my work in a round-robin fashion :)

The fault is definitely in lzo1x_decompress_safe() *writing* to address
0. I fetched debuginfo for all the necessary libraries and we see the
following stack trace:

%<-----------------------
#0  0x00007fcd9adddef3 in lzo1x_decompress_safe (in=<optimized out>,
    in_len=<optimized out>, out=0x0, out_len=0x7ffdee2c1388, wrkmem=<optimized out>)
    at src/lzo1x_d.ch:120
#1  0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
#2  0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
    fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
#3  0x00007fcd9ae25e44 in diskdump_get_page (pio=0x7ffdee2c1590) at diskdump.c:647
#4  0x00007fcd9ae32be0 in get_page (pio=0x7ffdee2c1590)
    at /home/stepbren/repos/libkdumpfile/src/kdumpfile/kdumpfile-priv.h:1512
#5  0x00007fcd9ae32ed4 in get_page_xlat (pio=0x7ffdee2c1590) at read.c:126
#6  0x00007fcd9ae32f22 in get_page_maybe_xlat (pio=0x7ffdee2c1590) at read.c:137
#7  0x00007fcd9ae32fb1 in _kdumpfile_priv_read_locked (ctx=0x55745bfca8f0,
    as=KDUMP_KVADDR, addr=18446612133360081960, buffer=0x7ffdee2c17df,
    plength=0x7ffdee2c1698) at read.c:169
#8  0x00007fcd9ae330dd in kdump_read (ctx=0x55745bfca8f0, as=KDUMP_KVADDR,
    addr=18446612133360081960, buffer=0x7ffdee2c17df, plength=0x7ffdee2c1698)
    at read.c:196
#9  0x00007fcd9afb0cc4 in drgn_read_kdump (buf=0x7ffdee2c17df,
    address=18446612133360081960, count=4, offset=18446612133360081960,
    arg=0x55745bfca8f0, physical=false) at ../../libdrgn/kdump.c:73
%<-----------------------

In frame 1 where we are calling the decompressor:

%<-----------------------
(gdb) frame 1
#1  0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
584                     int ret = lzo1x_decompress_safe(fch.data, pd.size,
(gdb) list
579                     if (ret != KDUMP_OK)
580                             return ret;
581             } else if (pd.flags & DUMP_DH_COMPRESSED_LZO) {
582     #if USE_LZO
583                     lzo_uint retlen = get_page_size(ctx);
584                     int ret = lzo1x_decompress_safe(fch.data, pd.size,
585                                                     pio->chunk.data,
586                                                     &retlen,
587                                                     LZO1X_MEM_DECOMPRESS);
588                     fcache_put_chunk(&fch);
(gdb) p retlen
$7 = 0
(gdb) p pio->chunk.data
$8 = (void *) 0x0
(gdb) p fch.data
$9 = (void *) 0x7fcd7cc33da4
(gdb) p pd.size
$10 = 816
%<-----------------------

As far as I can tell, pio->chunk.data comes directly from the
cache_get_page() function in frame 2:

%<-----------------------
(gdb) up
#2  0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
    fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
69              ret = fn(pio);
(gdb) list
64              pio->chunk.data = entry->data;
65              pio->chunk.embed_fces->ce = entry;
66              if (cache_entry_valid(entry))
67                      return KDUMP_OK;
68
69              ret = fn(pio);
70              mutex_lock(&ctx->shared->cache_lock);
71              if (ret == KDUMP_OK)
72                      cache_insert(pio->chunk.embed_fces->cache, entry);
73              else
(gdb) p *entry
$11 = {key = 1045860353, state = cs_precious, next = 626, prev = 626, refcnt = 1,
  data = 0x0}
(gdb) p *pio
$12 = {ctx = 0x55745bfca8f0, addr = {addr = 1045860352, as = ADDRXLAT_MACHPHYSADDR},
  chunk = {data = 0x0, nent = 1, {embed_fces = {{data = 0xffff880ff1470788,
          len = 140728599320032, ce = 0x55745c1003d8, cache = 0x55745c0fb540}, {
          data = 0x55745bfd42f0, len = 140728599320112,
          ce = 0x7fcd9ae330ef <kdump_read+102>, cache = 0xffff88003e569c28}},
      fces = 0xffff880ff1470788}}}
%<-----------------------

And here is the cache structure:

%<-----------------------
(gdb) p *pio->chunk.embed_fces->cache
$16 = {split = 487, nprec = 1020, ngprec = 248, nprobe = 3, ngprobe = 239, dprobe = 2,
  cap = 1024, inflight = 626, ninflight = 1, hits = {number = 168473, address = 168473,
    string = 0x29219 <error: Cannot access memory at address 0x29219>,
    bitmap = 0x29219, blob = 0x29219}, misses = {number = 1913, address = 1913,
    string = 0x779 <error: Cannot access memory at address 0x779>, bitmap = 0x779,
    blob = 0x779}, elemsize = 4096, data = 0x7fcd997fe010, entry_cleanup = 0x0,
  cleanup_data = 0x0, ce = 0x55745c0fb598}
%<-----------------------

Thanks for looking into this! I'll continue investigating more as well.

Stephen