qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
@ 2019-12-31 10:34 Eryu Guan
  2019-12-31 10:51 ` Igor Mammedov
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2019-12-31 10:34 UTC (permalink / raw)
  To: qemu-devel

Hi,

I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
think it's because io completion hits use-after-free when device is
already gone. Is this a known bug that has been fixed? (I went through
the git log but didn't find anything obvious).

gdb backtrace is:

Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
Program terminated with signal 11, Segmentation fault.
#0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
903        return obj->class;
(gdb) bt
#0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
    vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
    opaque=0x558a2f2fd420, ret=0)
    at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
    at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
    i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
#6  0x00007fff9ed75780 in ?? ()
#7  0x0000000000000000 in ?? ()

It seems like qemu was completing a discard/write_zero request, but
parent BusState was already freed & set to NULL.

Do we need to drain all pending request before unrealizing virtio-blk
device? Like the following patch proposed?

https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html

If more info is needed, please let me know.

Thanks,
Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2019-12-31 10:34 [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device Eryu Guan
@ 2019-12-31 10:51 ` Igor Mammedov
  2020-01-02  2:08   ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Igor Mammedov @ 2019-12-31 10:51 UTC (permalink / raw)
  To: Eryu Guan; +Cc: qemu-devel

On Tue, 31 Dec 2019 18:34:34 +0800
Eryu Guan <eguan@linux.alibaba.com> wrote:

> Hi,
> 
> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> think it's because io completion hits use-after-free when device is
> already gone. Is this a known bug that has been fixed? (I went through
> the git log but didn't find anything obvious).
> 
> gdb backtrace is:
> 
> Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> Program terminated with signal 11, Segmentation fault.
> #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> 903        return obj->class;
> (gdb) bt
> #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
>     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
>     opaque=0x558a2f2fd420, ret=0)
>     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
>     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
>     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> #6  0x00007fff9ed75780 in ?? ()
> #7  0x0000000000000000 in ?? ()
> 
> It seems like qemu was completing a discard/write_zero request, but
> parent BusState was already freed & set to NULL.
> 
> Do we need to drain all pending request before unrealizing virtio-blk
> device? Like the following patch proposed?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> 
> If more info is needed, please let me know.

may be this will help: https://patchwork.kernel.org/patch/11213047/


> 
> Thanks,
> Eryu
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2019-12-31 10:51 ` Igor Mammedov
@ 2020-01-02  2:08   ` Eryu Guan
  2020-01-07 13:06     ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2020-01-02  2:08 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel

On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> On Tue, 31 Dec 2019 18:34:34 +0800
> Eryu Guan <eguan@linux.alibaba.com> wrote:
> 
> > Hi,
> > 
> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > think it's because io completion hits use-after-free when device is
> > already gone. Is this a known bug that has been fixed? (I went through
> > the git log but didn't find anything obvious).
> > 
> > gdb backtrace is:
> > 
> > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > Program terminated with signal 11, Segmentation fault.
> > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > 903        return obj->class;
> > (gdb) bt
> > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> >     opaque=0x558a2f2fd420, ret=0)
> >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > #6  0x00007fff9ed75780 in ?? ()
> > #7  0x0000000000000000 in ?? ()
> > 
> > It seems like qemu was completing a discard/write_zero request, but
> > parent BusState was already freed & set to NULL.
> > 
> > Do we need to drain all pending request before unrealizing virtio-blk
> > device? Like the following patch proposed?
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > 
> > If more info is needed, please let me know.
> 
> may be this will help: https://patchwork.kernel.org/patch/11213047/

Yeah, this looks promising! I'll try it out (though it's a one-time
crash for me). Thanks!

Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-02  2:08   ` Eryu Guan
@ 2020-01-07 13:06     ` Eryu Guan
  2020-01-07 14:01       ` Julia Suvorova
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2020-01-07 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Igor Mammedov, Julia Suvorova

On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > On Tue, 31 Dec 2019 18:34:34 +0800
> > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > 
> > > Hi,
> > > 
> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > think it's because io completion hits use-after-free when device is
> > > already gone. Is this a known bug that has been fixed? (I went through
> > > the git log but didn't find anything obvious).
> > > 
> > > gdb backtrace is:
> > > 
> > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > Program terminated with signal 11, Segmentation fault.
> > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > 903        return obj->class;
> > > (gdb) bt
> > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > >     opaque=0x558a2f2fd420, ret=0)
> > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > #6  0x00007fff9ed75780 in ?? ()
> > > #7  0x0000000000000000 in ?? ()
> > > 
> > > It seems like qemu was completing a discard/write_zero request, but
> > > parent BusState was already freed & set to NULL.
> > > 
> > > Do we need to drain all pending request before unrealizing virtio-blk
> > > device? Like the following patch proposed?
> > > 
> > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > 
> > > If more info is needed, please let me know.
> > 
> > may be this will help: https://patchwork.kernel.org/patch/11213047/
> 
> Yeah, this looks promising! I'll try it out (though it's a one-time
> crash for me). Thanks!

After applying this patch, I don't see the original segfaut and
backtrace, but I see this crash

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
(gdb) bt
#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
    at /usr/src/debug/qemu-4.0/memory.c:568
#3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
#4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
    at /usr/src/debug/qemu-4.0/exec.c:3279
#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
#6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
#9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6

And I searched and found
https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
bug.

But I can still hit the bug even after applying the commit. Do I miss
anything?

Thanks,
Eryu
> Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-07 13:06     ` Eryu Guan
@ 2020-01-07 14:01       ` Julia Suvorova
  2020-01-08  1:54         ` Eryu Guan
  2020-01-09  4:58         ` Eryu Guan
  0 siblings, 2 replies; 11+ messages in thread
From: Julia Suvorova @ 2020-01-07 14:01 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Igor Mammedov, qemu-devel

On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
>
> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > think it's because io completion hits use-after-free when device is
> > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > the git log but didn't find anything obvious).
> > > >
> > > > gdb backtrace is:
> > > >
> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > Program terminated with signal 11, Segmentation fault.
> > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > 903        return obj->class;
> > > > (gdb) bt
> > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > >     opaque=0x558a2f2fd420, ret=0)
> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > #6  0x00007fff9ed75780 in ?? ()
> > > > #7  0x0000000000000000 in ?? ()
> > > >
> > > > It seems like qemu was completing a discard/write_zero request, but
> > > > parent BusState was already freed & set to NULL.
> > > >
> > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > device? Like the following patch proposed?
> > > >
> > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > >
> > > > If more info is needed, please let me know.
> > >
> > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> >
> > Yeah, this looks promising! I'll try it out (though it's a one-time
> > crash for me). Thanks!
>
> After applying this patch, I don't see the original segfaut and
> backtrace, but I see this crash
>
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> (gdb) bt
> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
>     at /usr/src/debug/qemu-4.0/memory.c:568
> #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
>     at /usr/src/debug/qemu-4.0/exec.c:3279
> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
>
> And I searched and found
> https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> bug.
>
> But I can still hit the bug even after applying the commit. Do I miss
> anything?

Hi Eryu,
This backtrace seems to be caused by this bug (there were two bugs in
1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
Although the solution hasn't been tested on virtio-blk yet, you may
want to apply this patch:
    https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
Let me know if this works.

Best regards, Julia Suvorova.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-07 14:01       ` Julia Suvorova
@ 2020-01-08  1:54         ` Eryu Guan
  2020-01-09  4:58         ` Eryu Guan
  1 sibling, 0 replies; 11+ messages in thread
From: Eryu Guan @ 2020-01-08  1:54 UTC (permalink / raw)
  To: Julia Suvorova; +Cc: Igor Mammedov, qemu-devel

On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> >
> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > think it's because io completion hits use-after-free when device is
> > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > the git log but didn't find anything obvious).
> > > > >
> > > > > gdb backtrace is:
> > > > >
> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > Program terminated with signal 11, Segmentation fault.
> > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > 903        return obj->class;
> > > > > (gdb) bt
> > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > #7  0x0000000000000000 in ?? ()
> > > > >
> > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > parent BusState was already freed & set to NULL.
> > > > >
> > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > device? Like the following patch proposed?
> > > > >
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > >
> > > > > If more info is needed, please let me know.
> > > >
> > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > >
> > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > crash for me). Thanks!
> >
> > After applying this patch, I don't see the original segfaut and
> > backtrace, but I see this crash
> >
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > (gdb) bt
> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> >     at /usr/src/debug/qemu-4.0/memory.c:568
> > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> >
> > And I searched and found
> > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > bug.
> >
> > But I can still hit the bug even after applying the commit. Do I miss
> > anything?
> 
> Hi Eryu,
> This backtrace seems to be caused by this bug (there were two bugs in
> 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> Although the solution hasn't been tested on virtio-blk yet, you may
> want to apply this patch:
>     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> Let me know if this works.

Will try it out, thanks a lot!

Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-07 14:01       ` Julia Suvorova
  2020-01-08  1:54         ` Eryu Guan
@ 2020-01-09  4:58         ` Eryu Guan
  2020-01-13 16:38           ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2020-01-09  4:58 UTC (permalink / raw)
  To: Julia Suvorova; +Cc: Igor Mammedov, qemu-devel

On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> >
> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > think it's because io completion hits use-after-free when device is
> > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > the git log but didn't find anything obvious).
> > > > >
> > > > > gdb backtrace is:
> > > > >
> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > Program terminated with signal 11, Segmentation fault.
> > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > 903        return obj->class;
> > > > > (gdb) bt
> > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > #7  0x0000000000000000 in ?? ()
> > > > >
> > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > parent BusState was already freed & set to NULL.
> > > > >
> > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > device? Like the following patch proposed?
> > > > >
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > >
> > > > > If more info is needed, please let me know.
> > > >
> > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > >
> > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > crash for me). Thanks!
> >
> > After applying this patch, I don't see the original segfaut and
> > backtrace, but I see this crash
> >
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > (gdb) bt
> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> >     at /usr/src/debug/qemu-4.0/memory.c:568
> > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> >
> > And I searched and found
> > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > bug.
> >
> > But I can still hit the bug even after applying the commit. Do I miss
> > anything?
> 
> Hi Eryu,
> This backtrace seems to be caused by this bug (there were two bugs in
> 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> Although the solution hasn't been tested on virtio-blk yet, you may
> want to apply this patch:
>     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> Let me know if this works.

Unfortunately, I still see the same segfault & backtrace after applying
commit 421afd2fe8dd ("virtio: reset region cache when on queue
deletion")

Anything I can help to debug?

Thanks,
Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-09  4:58         ` Eryu Guan
@ 2020-01-13 16:38           ` Stefan Hajnoczi
  2020-01-14  2:50             ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-01-13 16:38 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Igor Mammedov, Julia Suvorova, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7682 bytes --]

On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> > >
> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > > think it's because io completion hits use-after-free when device is
> > > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > > the git log but didn't find anything obvious).
> > > > > >
> > > > > > gdb backtrace is:
> > > > > >
> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > 903        return obj->class;
> > > > > > (gdb) bt
> > > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > > #7  0x0000000000000000 in ?? ()
> > > > > >
> > > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > > parent BusState was already freed & set to NULL.
> > > > > >
> > > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > > device? Like the following patch proposed?
> > > > > >
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > > >
> > > > > > If more info is needed, please let me know.
> > > > >
> > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > > >
> > > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > > crash for me). Thanks!
> > >
> > > After applying this patch, I don't see the original segfaut and
> > > backtrace, but I see this crash
> > >
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > > Program terminated with signal 11, Segmentation fault.
> > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > > (gdb) bt
> > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> > >     at /usr/src/debug/qemu-4.0/memory.c:568
> > > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> > >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> > >
> > > And I searched and found
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > > bug.
> > >
> > > But I can still hit the bug even after applying the commit. Do I miss
> > > anything?
> > 
> > Hi Eryu,
> > This backtrace seems to be caused by this bug (there were two bugs in
> > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> > Although the solution hasn't been tested on virtio-blk yet, you may
> > want to apply this patch:
> >     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> > Let me know if this works.
> 
> Unfortunately, I still see the same segfault & backtrace after applying
> commit 421afd2fe8dd ("virtio: reset region cache when on queue
> deletion")
> 
> Anything I can help to debug?

Please post the QEMU command-line and the QMP commands use to remove the
device.

The backtrace shows a vcpu thread submitting a request.  The device
seems to be partially destroyed.  That's surprising because the monitor
and the vcpu thread should use the QEMU global mutex to avoid race
conditions.  Maybe seeing the QMP commands will make it clearer...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-13 16:38           ` Stefan Hajnoczi
@ 2020-01-14  2:50             ` Eryu Guan
  2020-01-14 16:16               ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2020-01-14  2:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Igor Mammedov, Julia Suvorova, qemu-devel

On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > > > think it's because io completion hits use-after-free when device is
> > > > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > > > the git log but didn't find anything obvious).
> > > > > > >
> > > > > > > gdb backtrace is:
> > > > > > >
> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > 903        return obj->class;
> > > > > > > (gdb) bt
> > > > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > > > #7  0x0000000000000000 in ?? ()
> > > > > > >
> > > > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > > > parent BusState was already freed & set to NULL.
> > > > > > >
> > > > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > > > device? Like the following patch proposed?
> > > > > > >
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > > > >
> > > > > > > If more info is needed, please let me know.
> > > > > >
> > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > > > >
> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > > > crash for me). Thanks!
> > > >
> > > > After applying this patch, I don't see the original segfaut and
> > > > backtrace, but I see this crash
> > > >
> > > > [Thread debugging using libthread_db enabled]
> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > > > Program terminated with signal 11, Segmentation fault.
> > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > > > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > > > (gdb) bt
> > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> > > >     at /usr/src/debug/qemu-4.0/memory.c:568
> > > > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > > > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> > > >
> > > > And I searched and found
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > > > bug.
> > > >
> > > > But I can still hit the bug even after applying the commit. Do I miss
> > > > anything?
> > > 
> > > Hi Eryu,
> > > This backtrace seems to be caused by this bug (there were two bugs in
> > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> > > Although the solution hasn't been tested on virtio-blk yet, you may
> > > want to apply this patch:
> > >     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> > > Let me know if this works.
> > 
> > Unfortunately, I still see the same segfault & backtrace after applying
> > commit 421afd2fe8dd ("virtio: reset region cache when on queue
> > deletion")
> > 
> > Anything I can help to debug?
> 
> Please post the QEMU command-line and the QMP commands use to remove the
> device.

It's a normal kata instance using virtio-fs as rootfs.

/usr/local/libexec/qemu-kvm -name sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
 -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
 -cpu host -qmp unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
 -qmp unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
 -m 2048M,slots=10,maxmem=773893M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
 -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 \
 -chardev socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait \
 -device virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
 -chardev socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait \
 -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 \
 -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= \
 -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
 -chardev socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait \
 -chardev socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock \
 -device vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
 -device driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= \
 -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize \
 -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
 -pidfile /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid \
 -smp 1,cores=1,threads=1,sockets=96,maxcpus=96

QMP command to delete device (the device id is just an example, not the
one caused the crash):

"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"

which has been hot plugged by:
"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
"{\"return\": {}}"
"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
"{\"return\": {}}"

> 
> The backtrace shows a vcpu thread submitting a request.  The device
> seems to be partially destroyed.  That's surprising because the monitor
> and the vcpu thread should use the QEMU global mutex to avoid race
> conditions.  Maybe seeing the QMP commands will make it clearer...
> 
> Stefan

Thanks!

Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-14  2:50             ` Eryu Guan
@ 2020-01-14 16:16               ` Stefan Hajnoczi
  2020-01-19  8:13                 ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-01-14 16:16 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Igor Mammedov, Julia Suvorova, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 12130 bytes --]

On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > > > > think it's because io completion hits use-after-free when device is
> > > > > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > > > > the git log but didn't find anything obvious).
> > > > > > > >
> > > > > > > > gdb backtrace is:
> > > > > > > >
> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > > 903        return obj->class;
> > > > > > > > (gdb) bt
> > > > > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > > > > #7  0x0000000000000000 in ?? ()
> > > > > > > >
> > > > > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > > > > parent BusState was already freed & set to NULL.
> > > > > > > >
> > > > > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > > > > device? Like the following patch proposed?
> > > > > > > >
> > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > > > > >
> > > > > > > > If more info is needed, please let me know.
> > > > > > >
> > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > > > > >
> > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > > > > crash for me). Thanks!
> > > > >
> > > > > After applying this patch, I don't see the original segfaut and
> > > > > backtrace, but I see this crash
> > > > >
> > > > > [Thread debugging using libthread_db enabled]
> > > > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > > > > Program terminated with signal 11, Segmentation fault.
> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > > > > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > > > > (gdb) bt
> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > > > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> > > > >     at /usr/src/debug/qemu-4.0/memory.c:568
> > > > > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > > > > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> > > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> > > > >
> > > > > And I searched and found
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > > > > bug.
> > > > >
> > > > > But I can still hit the bug even after applying the commit. Do I miss
> > > > > anything?
> > > > 
> > > > Hi Eryu,
> > > > This backtrace seems to be caused by this bug (there were two bugs in
> > > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> > > > Although the solution hasn't been tested on virtio-blk yet, you may
> > > > want to apply this patch:
> > > >     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> > > > Let me know if this works.
> > > 
> > > Unfortunately, I still see the same segfault & backtrace after applying
> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue
> > > deletion")
> > > 
> > > Anything I can help to debug?
> > 
> > Please post the QEMU command-line and the QMP commands use to remove the
> > device.
> 
> It's a normal kata instance using virtio-fs as rootfs.
> 
> /usr/local/libexec/qemu-kvm -name sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
>  -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
>  -cpu host -qmp unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
>  -qmp unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
>  -m 2048M,slots=10,maxmem=773893M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
>  -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 \
>  -chardev socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait \
>  -device virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
>  -chardev socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait \
>  -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 \
>  -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= \
>  -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
>  -chardev socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait \
>  -chardev socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock \
>  -device vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
>  -device driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= \
>  -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize \
>  -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
>  -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
>  -pidfile /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid \
>  -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
> 
> QMP command to delete device (the device id is just an example, not the
> one caused the crash):
> 
> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
> 
> which has been hot plugged by:
> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
> "{\"return\": {}}"
> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
> "{\"return\": {}}"

Thanks.  I wasn't able to reproduce this crash with qemu.git/master.

One thing that is strange about the latest backtrace you posted: QEMU is
dispatching the memory access instead of using the ioeventfd code that
that virtio-blk-pci normally takes when a virtqueue is notified.  I
guess this means ioeventfd has already been disabled due to the hot
unplug.

Could you try with machine type "i440fx" instead of "q35"?  I wonder if
pci-bridge/shpc is part of the problem.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
  2020-01-14 16:16               ` Stefan Hajnoczi
@ 2020-01-19  8:13                 ` Eryu Guan
  0 siblings, 0 replies; 11+ messages in thread
From: Eryu Guan @ 2020-01-19  8:13 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Igor Mammedov, Julia Suvorova, qemu-devel

On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
> > On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
> > > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
> > > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
> > > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
> > > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
> > > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
> > > > > > > > Eryu Guan <eguan@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
> > > > > > > > > think it's because io completion hits use-after-free when device is
> > > > > > > > > already gone. Is this a known bug that has been fixed? (I went through
> > > > > > > > > the git log but didn't find anything obvious).
> > > > > > > > >
> > > > > > > > > gdb backtrace is:
> > > > > > > > >
> > > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
> > > > > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > > > 903        return obj->class;
> > > > > > > > > (gdb) bt
> > > > > > > > > #0  object_get_class (obj=obj@entry=0x0) at /usr/src/debug/qemu-4.0/qom/object.c:903
> > > > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
> > > > > > > > >     vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
> > > > > > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
> > > > > > > > >     opaque=0x558a2f2fd420, ret=0)
> > > > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
> > > > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
> > > > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
> > > > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
> > > > > > > > >     i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
> > > > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
> > > > > > > > > #6  0x00007fff9ed75780 in ?? ()
> > > > > > > > > #7  0x0000000000000000 in ?? ()
> > > > > > > > >
> > > > > > > > > It seems like qemu was completing a discard/write_zero request, but
> > > > > > > > > parent BusState was already freed & set to NULL.
> > > > > > > > >
> > > > > > > > > Do we need to drain all pending request before unrealizing virtio-blk
> > > > > > > > > device? Like the following patch proposed?
> > > > > > > > >
> > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
> > > > > > > > >
> > > > > > > > > If more info is needed, please let me know.
> > > > > > > >
> > > > > > > > may be this will help: https://patchwork.kernel.org/patch/11213047/
> > > > > > >
> > > > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
> > > > > > > crash for me). Thanks!
> > > > > >
> > > > > > After applying this patch, I don't see the original segfaut and
> > > > > > backtrace, but I see this crash
> > > > > >
> > > > > > [Thread debugging using libthread_db enabled]
> > > > > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
> > > > > > Program terminated with signal 11, Segmentation fault.
> > > > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > > > 1324        VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
> > > > > > Missing separate debuginfos, use: debuginfo-install glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
> > > > > > (gdb) bt
> > > > > > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
> > > > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
> > > > > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
> > > > > >     at /usr/src/debug/qemu-4.0/memory.c:568
> > > > > > #3  0x0000561216837c66 in memory_region_dispatch_write (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
> > > > > > #4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
> > > > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
> > > > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
> > > > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
> > > > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3419
> > > > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
> > > > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
> > > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
> > > > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
> > > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
> > > > > >
> > > > > > And I searched and found
> > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1706759 , which has the same
> > > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
> > > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
> > > > > > bug.
> > > > > >
> > > > > > But I can still hit the bug even after applying the commit. Do I miss
> > > > > > anything?
> > > > > 
> > > > > Hi Eryu,
> > > > > This backtrace seems to be caused by this bug (there were two bugs in
> > > > > 1706759): https://bugzilla.redhat.com/show_bug.cgi?id=1708480
> > > > > Although the solution hasn't been tested on virtio-blk yet, you may
> > > > > want to apply this patch:
> > > > >     https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
> > > > > Let me know if this works.
> > > > 
> > > > Unfortunately, I still see the same segfault & backtrace after applying
> > > > commit 421afd2fe8dd ("virtio: reset region cache when on queue
> > > > deletion")
> > > > 
> > > > Anything I can help to debug?
> > > 
> > > Please post the QEMU command-line and the QMP commands use to remove the
> > > device.
> > 
> > It's a normal kata instance using virtio-fs as rootfs.
> > 
> > /usr/local/libexec/qemu-kvm -name sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
> >  -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
> >  -cpu host -qmp unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
> >  -qmp unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait \
> >  -m 2048M,slots=10,maxmem=773893M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
> >  -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 \
> >  -chardev socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait \
> >  -device virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
> >  -chardev socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait \
> >  -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 \
> >  -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= \
> >  -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
> >  -chardev socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait \
> >  -chardev socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock \
> >  -device vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
> >  -device driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= \
> >  -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize \
> >  -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
> >  -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
> >  -pidfile /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid \
> >  -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
> > 
> > QMP command to delete device (the device id is just an example, not the
> > one caused the crash):
> > 
> > "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
> > 
> > which has been hot plugged by:
> > "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
> > "{\"return\": {}}"
> > "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
> > "{\"return\": {}}"
> 
> Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
> 
> One thing that is strange about the latest backtrace you posted: QEMU is
> dispatching the memory access instead of using the ioeventfd code that
> that virtio-blk-pci normally takes when a virtqueue is notified.  I
> guess this means ioeventfd has already been disabled due to the hot
> unplug.
> 
> Could you try with machine type "i440fx" instead of "q35"?  I wonder if
> pci-bridge/shpc is part of the problem.

Sure, will try it. But it may take some time, as the test bed is busy
with other testing tasks. I'll report back once I got the results.

Thanks,
Eryu


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-19  8:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-31 10:34 [BUG qemu 4.0] segfault when unplugging virtio-blk-pci device Eryu Guan
2019-12-31 10:51 ` Igor Mammedov
2020-01-02  2:08   ` Eryu Guan
2020-01-07 13:06     ` Eryu Guan
2020-01-07 14:01       ` Julia Suvorova
2020-01-08  1:54         ` Eryu Guan
2020-01-09  4:58         ` Eryu Guan
2020-01-13 16:38           ` Stefan Hajnoczi
2020-01-14  2:50             ` Eryu Guan
2020-01-14 16:16               ` Stefan Hajnoczi
2020-01-19  8:13                 ` Eryu Guan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).