qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Eryu Guan <eguan@linux.alibaba.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
	Julia Suvorova <jusual@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
Date: Mon, 13 Jan 2020 16:38:55 +0000	[thread overview]
Message-ID: <20200113163855.GC103384@stefanha-x1.localdomain> (raw)
In-Reply-To: <20200109045806.GB79586@e18g06458.et15sqa>

[-- Attachment #1: Type: text/plain, Size: 7682 bytes --]

On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> > >
> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > > think it's because io completion hits use-after-free when device is
> > > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > > the git log but didn't find anything obvious).
> > > > > >
> > > > > > gdb backtrace is:
> > > > > >
> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > 903        return obj->class;
> > > > > > (gdb) bt
> > > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > > #7  0x0000000000000000 in ?? ()
> > > > > >
> > > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > > parent BusState was already freed & set to NULL.
> > > > > >
> > > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > > device? Like the following patch proposed?
> > > > > >
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > > >
> > > > > > If more info is needed, please let me know.
> > > > >
> > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > > >
> > > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > > crash for me). Thanks!
> > >
> > > After applying this patch, I don't see the original segfaut and
> > > backtrace, but I see this crash
> > >
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > > Program terminated with signal 11, Segmentation fault.
> > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > > (gdb) bt
> > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> > >     at /usr/src/debug/qemu-4.0/memory.c:568
> > > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> > >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> > >
> > > And I searched and found
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > > bug.
> > >
> > > But I can still hit the bug even after applying the commit. Do I miss
> > > anything?
> > 
> > Hi Eryu,
> > This backtrace seems to be caused by this bug (there were two bugs in
> > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> > Although the solution hasn't been tested on virtio-blk yet, you may
> > want to apply this patch:
> >     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> > Let me know if this works.
> 
> Unfortunately, I still see the same segfault & backtrace after applying
> commit 421afd2fe8dd ("virtio: reset region cache when on queue
> deletion")
> 
> Anything I can help to debug?

Please post the QEMU command-line and the QMP commands use to remove the
device.

The backtrace shows a vcpu thread submitting a request.  The device
seems to be partially destroyed.  That's surprising because the monitor
and the vcpu thread should use the QEMU global mutex to avoid race
conditions.  Maybe seeing the QMP commands will make it clearer...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-01-13 16:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-31 10:34 [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device Eryu Guan
2019-12-31 10:51 ` Igor Mammedov
2020-01-02  2:08   ` Eryu Guan
2020-01-07 13:06     ` Eryu Guan
2020-01-07 14:01       ` Julia Suvorova
2020-01-08  1:54         ` Eryu Guan
2020-01-09  4:58         ` Eryu Guan
2020-01-13 16:38           ` Stefan Hajnoczi [this message]
2020-01-14  2:50             ` Eryu Guan
2020-01-14 16:16               ` Stefan Hajnoczi
2020-01-19  8:13                 ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200113163855.GC103384@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=eguan@linux.alibaba.com \
    --cc=imammedo@redhat.com \
    --cc=jusual@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).