From: "Petr Tesařík" <petr@tesarici.cz>
To: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Omar Sandoval <osandov@osandov.com>, linux-debuggers@vger.kernel.org
Subject: Re: Segmentation fault with drgn + libkdumpfile
Date: Wed, 10 Jan 2024 09:36:00 +0100 [thread overview]
Message-ID: <20240110093600.61005acd@meshulam.tesarici.cz> (raw)
In-Reply-To: <87sf36yni8.fsf@oracle.com>
On Tue, 09 Jan 2024 17:40:15 -0800
Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> Petr Tesařík <petr@tesarici.cz> writes:
>
> > On Mon, 8 Jan 2024 21:40:08 +0100
> > Petr Tesařík <petr@tesarici.cz> wrote:
> >
> >> On Fri, 05 Jan 2024 13:53:15 -0800
> >> Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> >>
> >> > Petr Tesařík <petr@tesarici.cz> writes:
> >> > > On Fri, 05 Jan 2024 10:38:16 -0800
> >> > > Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> >> > >
> >> > >> Hi Petr,
> >> > >>
> >> > >> I recently encountered a segmentation fault with libkdumpfile & drgn
> >> > >> which appears to be related to the cache implementation. I've included
> >> > >> the stack trace at the end of this message, since it's a bit of a longer
> >> > >> one. The exact issue occurred with a test vmcore that I could probably
> >> > >> share with you privately if you'd like. In any case, the reproducer is
> >> > >> fairly straightforward in drgn code:
> >> > >>
> >> > >> for t in for_each_task(prog):
> >> > >> prog.stack_trace(t)
> >> > >> for t in for_each_task(prog):
> >> > >> prog.stack_trace(t)
> >> > >>
> >> > >> The repetition is required, the segfault only occurs on the second
> >> > >> iteration of the loop. Which, in hindsight, is a textbook sign that the
> >> > >> issue has to do with caching. I'd expect that the issue is specific to
> >> > >> this vmcore, it doesn't reproduce on others.
> >> > >>
> >> > >> I stuck that into a git bisect script and bisected the libkdumpfile
> >> > >> commit that introduced it:
> >> > >>
> >> > >> commit 487a8042ea5da580e1fdb5b8f91c8bd7cad05cd6
> >> > >> Author: Petr Tesarik <petr@tesarici.cz>
> >> > >> Date: Wed Jan 11 22:53:01 2023 +0100
> >> > >>
> >> > >> Cache: Calculate eprobe in reinit_entry()
> >> > >>
> >> > >> If this function is called to reuse a ghost entry, the probe list
> >> > >> has not been walked yet, so eprobe is left uninitialized.
> >> > >>
> >> > >> This passed the test case, because the correct old value was left
> >> > >> on stack. Modify the test case to poison the stack.
> >> > >>
> >> > >> Signed-off-by: Petr Tesarik <petr@tesarici.cz>
> >> > >>
> >> > >> src/kdumpfile/cache.c | 6 +++++-
> >> > >> src/kdumpfile/test-cache.c | 13 +++++++++++++
> >> > >> 2 files changed, 18 insertions(+), 1 deletion(-)
> >> > >
> >> > > This looks like a red herring to me. The cache most likely continues in
> >> > > a corrupted state without this commit, which may mask the issue (until
> >> > > it resurfaces later).
> >> >
> >> > I see, that makes a lot of sense.
> >> >
> >> > >> I haven't yet tried to debug the logic of the cache implementation and
> >> > >> create a patch. I'm totally willing to try that, but I figured I would
> >> > >> send this report to you first, to see if there's something obvious that
> >> > >> sticks out to your eyes.
> >> > >
> >> > > No, but I should be able to recreate the issue if I get a log of the
> >> > > cache API calls:
> >> > >
> >> > > - cache_alloc() - to know the number of elements
> >> > > - cache_get_entry()
> >> > > - cache_put_entry()
> >> > > - cache_insert()
> >> > > - cache_discard()
> >> > > - cache_flush() - not likely after initialization, but...
> >> >
> >> > I went ahead and logged each of these calls as you suggested, I tried to
> >> > log them at the beginning of the function call and always include the
> >> > cache pointer, cache_entry, and the key. I took the resulting log and
> >> > filtered it to just contain the most recently logged cache prior to the
> >> > crash, compressed it, and attached it. For completeness, the patch
> >> > I used is below (applies to tip branch 8254897 ("Merge pull request #78
> >> > from fweimer-rh/c99")).
> >> >
> >> > I'll also see if I can reproduce it based on the log.
> >>
> >> Thank you for the log. I haven't had much time to look at it, but the
> >> first line is a good hint already:
> >>
> >> 0x56098b68c4c0: cache_alloc(1024, 0)
> >>
> >> Zero size means the data pointers are managed by the caller, so this
> >> must be the cache of mmap()'ed segments. That's the only cache which
> >> installs a cleanup callback with set_cache_entry_cleanup(). There is
> >> only one call to the cleanup callback for evicted entries in cache.c:
> >>
> >> /* Get an unused cached entry. */
> >> if (cs->nuprobe != 0 &&
> >> (cs->nuprec == 0 || cache->nprobe + bias > cache->dprobe))
> >> evict = evict_probe(cache, cs);
> >> else
> >> evict = evict_prec(cache, cs);
> >> if (cache->entry_cleanup)
> >> cache->entry_cleanup(cache->cleanup_data, evict);
> >>
> >> The entries can be evicted from the probe partition or from the precious
> >> partition. This might be relevant. Please, can you re-run and log where
> >> the evict entry comes from?
> >
> > I found some time this morning, and it wouldn't help. Because of a bug
> > in fcache_new(), the number of elements in the cache is big enough that
> > cache entries are never evicted in your case. It's quite weird to hit a
> > cache metadata bug after elements have been inserted. FWIW I am not
> > able to reproduce the bug by replaying the logged file read pattern.
> >
> > Since you have a reliable reproducer, it cannot be a Heisenbug. But it
> > could be caused by the other cache - the cache of decompressed pages.
> > Do you know for sure that lzo1x_decompress_safe() crashes while trying
> > to _read_ from the input buffer, and not while trying to _write_ to the
> > output buffer?
>
> Hi Petr,
>
> Sorry for the delay here, I got pulled into other issues and am trying
> to attend to all my work in a round-robin fashion :)
Hi Stephen,
that's fine, I also work on this only as time permits. ;-)
> The fault is definitely in lzo1x_decompress_safe() *writing* to address
> 0. I fetched debuginfo for all the necessary libraries and we see the
> following stack trace:
>
> %<-----------------------
> #0 0x00007fcd9adddef3 in lzo1x_decompress_safe (in=<optimized out>,
> in_len=<optimized out>, out=0x0, out_len=0x7ffdee2c1388, wrkmem=<optimized out>)
> at src/lzo1x_d.ch:120
> #1 0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
> #2 0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
> fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
> #3 0x00007fcd9ae25e44 in diskdump_get_page (pio=0x7ffdee2c1590) at diskdump.c:647
> #4 0x00007fcd9ae32be0 in get_page (pio=0x7ffdee2c1590)
> at /home/stepbren/repos/libkdumpfile/src/kdumpfile/kdumpfile-priv.h:1512
> #5 0x00007fcd9ae32ed4 in get_page_xlat (pio=0x7ffdee2c1590) at read.c:126
> #6 0x00007fcd9ae32f22 in get_page_maybe_xlat (pio=0x7ffdee2c1590) at read.c:137
> #7 0x00007fcd9ae32fb1 in _kdumpfile_priv_read_locked (ctx=0x55745bfca8f0,
> as=KDUMP_KVADDR, addr=18446612133360081960, buffer=0x7ffdee2c17df,
> plength=0x7ffdee2c1698) at read.c:169
> #8 0x00007fcd9ae330dd in kdump_read (ctx=0x55745bfca8f0, as=KDUMP_KVADDR,
> addr=18446612133360081960, buffer=0x7ffdee2c17df, plength=0x7ffdee2c1698)
> at read.c:196
> #9 0x00007fcd9afb0cc4 in drgn_read_kdump (buf=0x7ffdee2c17df,
> address=18446612133360081960, count=4, offset=18446612133360081960,
> arg=0x55745bfca8f0, physical=false) at ../../libdrgn/kdump.c:73
> %<-----------------------
>
> In frame 1 where we are calling the decompressor:
>
> %<-----------------------
> (gdb) frame 1
> #1 0x00007fcd9ae25be1 in diskdump_read_page (pio=0x7ffdee2c1590) at diskdump.c:584
> 584 int ret = lzo1x_decompress_safe(fch.data, pd.size,
> (gdb) list
> 579 if (ret != KDUMP_OK)
> 580 return ret;
> 581 } else if (pd.flags & DUMP_DH_COMPRESSED_LZO) {
> 582 #if USE_LZO
> 583 lzo_uint retlen = get_page_size(ctx);
> 584 int ret = lzo1x_decompress_safe(fch.data, pd.size,
> 585 pio->chunk.data,
> 586 &retlen,
> 587 LZO1X_MEM_DECOMPRESS);
> 588 fcache_put_chunk(&fch);
> (gdb) p retlen
> $7 = 0
This is a bit weird. Looking at liblzo sources, it seems to me that
the output length is not changed until right before returning from
lzo1x_decompress_safe().
> (gdb) p pio->chunk.data
> $8 = (void *) 0x0
OK, here's our immediate root cause. ;-)
> (gdb) p fch.data
> $9 = (void *) 0x7fcd7cc33da4
This looks sane.
> (gdb) p pd.size
> $10 = 816
This also looks sane.
> %<-----------------------
>
> As far as I can tell, pio->chunk.data comes directly from the
> cache_get_page() function in frame 2:
>
> %<-----------------------
> (gdb) up
> #2 0x00007fcd9ae32d4d in _kdumpfile_priv_cache_get_page (pio=0x7ffdee2c1590,
> fn=0x7fcd9ae257ae <diskdump_read_page>) at read.c:69
> 69 ret = fn(pio);
> (gdb) list
> 64 pio->chunk.data = entry->data;
> 65 pio->chunk.embed_fces->ce = entry;
> 66 if (cache_entry_valid(entry))
> 67 return KDUMP_OK;
> 68
> 69 ret = fn(pio);
> 70 mutex_lock(&ctx->shared->cache_lock);
> 71 if (ret == KDUMP_OK)
> 72 cache_insert(pio->chunk.embed_fces->cache, entry);
> 73 else
> (gdb) p *entry
> $11 = {key = 1045860353, state = cs_precious, next = 626, prev = 626, refcnt = 1,
> data = 0x0}
The key (0x3e569000 | ADDRXLAT_MACHPHYSADDR) corresponds to the
requested virtual address 0xffff88003e569c28.
> (gdb) p *pio
> $12 = {ctx = 0x55745bfca8f0, addr = {addr = 1045860352, as = ADDRXLAT_MACHPHYSADDR},
> chunk = {data = 0x0, nent = 1, {embed_fces = {{data = 0xffff880ff1470788,
> len = 140728599320032, ce = 0x55745c1003d8, cache = 0x55745c0fb540}, {
> data = 0x55745bfd42f0, len = 140728599320112,
> ce = 0x7fcd9ae330ef <kdump_read+102>, cache = 0xffff88003e569c28}},
> fces = 0xffff880ff1470788}}}
> %<-----------------------
Looking at pio->chunk->embed_fces->ce, struct cache_entry is at
0x55745c1003d8. Assuming that sizeof(struct cache_entry) == 32 on your
system, this is element 626 in the cache entry array. The next and
prev indices are also 626, which looks good, because cache->ninflight
is 1, so this is the only element in the (circular) in-flight list.
Since state is cs_precious, but the data was discarded, this cache
entry has just been recovered from a ghost partition, evicting another
entry, and _that_ entry had a NULL data pointer.
It would be really helpful if I could get the log for this cache
instead of the one you posted earlier.
>
> And here is the cache structure:
>
> %<-----------------------
> (gdb) p *pio->chunk.embed_fces->cache
> $16 = {split = 487, nprec = 1020, ngprec = 248, nprobe = 3, ngprobe = 239, dprobe = 2,
> cap = 1024, inflight = 626, ninflight = 1, hits = {number = 168473, address = 168473,
> string = 0x29219 <error: Cannot access memory at address 0x29219>,
> bitmap = 0x29219, blob = 0x29219}, misses = {number = 1913, address = 1913,
> string = 0x779 <error: Cannot access memory at address 0x779>, bitmap = 0x779,
> blob = 0x779}, elemsize = 4096, data = 0x7fcd997fe010, entry_cleanup = 0x0,
> cleanup_data = 0x0, ce = 0x55745c0fb598}
> %<-----------------------
BTW with a 98.9% hit ratio (168473 hits, 1913 misses), the cache proves
to be quite effective for your workload. ;-)
Petr T
> Thanks for looking into this! I'll continue investigating more as well.
>
> Stephen
>
next prev parent reply other threads:[~2024-01-10 8:36 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-05 18:38 Segmentation fault with drgn + libkdumpfile Stephen Brennan
2024-01-05 19:23 ` Petr Tesařík
2024-01-05 21:53 ` Stephen Brennan
2024-01-08 20:40 ` Petr Tesařík
2024-01-09 9:06 ` Petr Tesařík
2024-01-10 1:40 ` Stephen Brennan
2024-01-10 8:36 ` Petr Tesařík [this message]
2024-01-10 13:49 ` Petr Tesařík
2024-01-10 18:03 ` Petr Tesařík
2024-01-10 19:48 ` Stephen Brennan
2024-01-10 19:58 ` Petr Tesařík
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240110093600.61005acd@meshulam.tesarici.cz \
--to=petr@tesarici.cz \
--cc=linux-debuggers@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=stephen.s.brennan@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).