All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
@ 2017-02-27 14:09 Christian Borntraeger
  2017-02-27 15:06 ` Cornelia Huck
  2017-02-27 15:28 ` Paolo Bonzini
  0 siblings, 2 replies; 8+ messages in thread
From: Christian Borntraeger @ 2017-02-27 14:09 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin; +Cc: qemu-devel, Cornelia Huck, Halil Pasic

Paolo,

commit 97cd965c070152bc626c7507df9fb356bbe1cd81
"virtio: use VRingMemoryRegionCaches for avail and used rings"
does cause a segfault on my s390 system when I use num-queues.

gdb --args qemu-system-s390x -nographic -enable-kvm -m 1G -drive file=/var/lib/libvirt/qemu/image.zhyp137,if=none,id=d1 -device virtio-blk-ccw,drive=d1,iothread=io1,num-queues=2 -object iothread,id=io1


Thread 3 "qemu-system-s39" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ffe587f910 (LWP 40630)]
0x0000000001024a26 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x3ffe587bff8, plen=0x3ffe587bff0, is_write=false) at /home/cborntra/REPOS/qemu/exec.c:3187
3187	    assert(addr < cache->len && *plen <= cache->len - addr);
Missing separate debuginfos, use: dnf debuginfo-install atk-2.20.0-1.fc24.s390x boost-iostreams-1.60.0-7.fc24.s390x boost-random-1.60.0-7.fc24.s390x boost-system-1.60.0-7.fc24.s390x boost-thread-1.60.0-7.fc24.s390x bzip2-libs-1.0.6-20.fc24.s390x cairo-1.14.6-1.fc24.s390x cyrus-sasl-lib-2.1.26-26.2.fc24.s390x expat-2.1.1-2.fc24.s390x fontconfig-2.11.94-6.fc24.s390x freetype-2.6.3-2.fc24.s390x gdk-pixbuf2-2.34.0-1.fc24.s390x glib2-2.48.1-1.fc24.s390x glusterfs-api-3.8.1-1.fc24.s390x glusterfs-libs-3.8.1-1.fc24.s390x gmp-6.1.1-1.fc24.s390x gnutls-3.4.14-1.fc24.s390x graphite2-1.3.6-1.fc24.s390x gtk2-2.24.30-1.fc24.s390x harfbuzz-1.2.7-1.fc24.s390x keyutils-libs-1.5.9-8.fc24.s390x krb5-libs-1.14.1-8.fc24.s390x libX11-1.6.3-3.fc24.s390x libXau-1.0.8-6.fc24.s390x libXcomposite-0.4.4-8.fc24.s390x libXcursor-1.1.14-6.fc24.s390x libXdamage-1.1.4-8.fc24.s390x libXext-1.3.3-4.fc24.s390x libXfixes-5.0.2-2.fc24.s390x libXi-1.7.6-2.fc24.s390x libXinerama-1.1.3-6.fc24.s390x libXrandr-1.5.0-3.fc24.s390x libXrender-0.9.9-3.fc24.s390x libXxf86vm-1.1.4-3.fc24.s390x libacl-2.2.52-11.fc24.s390x libaio-0.3.110-6.fc24.s390x libatomic_ops-7.4.2-9.fc24.s390x libattr-2.4.47-16.fc24.s390x libblkid-2.28-3.fc24.s390x libcom_err-1.42.13-4.fc24.s390x libcurl-7.47.1-5.fc24.s390x libdatrie-0.2.9-3.fc24.s390x libdrm-2.4.70-1.fc24.s390x libffi-3.1-9.fc24.s390x libgcc-6.2.1-2.0.ibm.fc24.s390x libidn-1.33-1.fc24.s390x libnghttp2-1.7.1-1.fc24.s390x libpng-1.6.23-1.fc24.s390x libpsl-0.13.0-1.fc24.s390x librados2-10.2.2-2.fc24.s390x librbd1-10.2.2-2.fc24.s390x libselinux-2.5-9.fc24.s390x libssh2-1.7.0-5.fc24.s390x libstdc++-6.2.1-2.0.ibm.fc24.s390x libtasn1-4.8-2.fc24.s390x libthai-0.1.24-1.fc24.s390x libunistring-0.9.4-3.fc24.s390x libuuid-2.28-3.fc24.s390x libwayland-client-1.10.0-1.fc24.s390x libwayland-server-1.10.0-1.fc24.s390x libxcb-1.11.1-2.fc24.s390x libxshmfence-1.2-3.fc24.s390x lttng-ust-2.6.2-3.fc24.s390x mesa-libEGL-13.0.0-3.fc24.s390x mesa-libGL-13.0.0-3.fc24.s390x mesa-libgbm-13.0.0-3.fc24.s390x mesa-libglapi-13.0.0-3.fc24.s390x ncurses-libs-6.0-6.20160709.fc24.s390x nettle-3.2-3.fc24.s390x nspr-4.13.0-1.fc24.s390x nss-3.27.0-1.1.fc24.s390x nss-softokn-freebl-3.27.0-1.0.fc24.s390x nss-util-3.27.0-1.0.fc24.s390x openldap-2.4.44-1.fc24.s390x openssl-libs-1.0.2j-1.fc24.s390x p11-kit-0.23.2-2.fc24.s390x pango-1.40.1-1.fc24.s390x pcre-8.39-2.fc24.s390x pixman-0.34.0-2.fc24.s390x userspace-rcu-0.8.6-2.fc24.s390x zlib-1.2.8-10.fc24.s390x
(gdb) bt
#0  0x0000000001024a26 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x3ffe587bff8, plen=0x3ffe587bff0, is_write=false) at /home/cborntra/REPOS/qemu/exec.c:3187
#1  0x0000000001025596 in address_space_lduw_internal_cached (cache=0x38, addr=2, attrs=..., result=0x0, endian=DEVICE_BIG_ENDIAN) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:264
#2  0x0000000001025846 in address_space_lduw_be_cached (cache=0x38, addr=2, attrs=..., result=0x0) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:322
#3  0x000000000102597e in lduw_be_phys_cached (cache=0x38, addr=2) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:340
#4  0x0000000001114856 in virtio_lduw_phys_cached (vdev=0x1c57cd0, cache=0x38, pa=2) at /home/cborntra/REPOS/qemu/include/hw/virtio/virtio-access.h:164
#5  0x000000000111523c in vring_avail_idx (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:201
#6  0x0000000001115bba in virtio_queue_empty (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:332
#7  0x000000000111c312 in virtio_queue_host_notifier_aio_poll (opaque=0x3fffde1e0f8) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:2294
#8  0x000000000147a036 in run_poll_handlers_once (ctx=0x1bb8bb0) at /home/cborntra/REPOS/qemu/util/aio-posix.c:490
#9  0x000000000147a2fe in try_poll_mode (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:566
#10 0x000000000147a3ca in aio_poll (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:595
#11 0x00000000011a0176 in iothread_run (opaque=0x1bb86c0) at /home/cborntra/REPOS/qemu/iothread.c:59
#12 0x000003ffe9087bc4 in start_thread () at /lib64/libpthread.so.0
#13 0x000003ffe8f8a9f2 in thread_start () at /lib64/libc.so.6

It seems to make a difference if its the boot disk or not. Maybe the reset of the
devices that the bootloader does before handling over control to Linux creates
some trouble here.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-27 14:09 [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues Christian Borntraeger
@ 2017-02-27 15:06 ` Cornelia Huck
  2017-02-27 15:37   ` Cornelia Huck
  2017-02-27 15:28 ` Paolo Bonzini
  1 sibling, 1 reply; 8+ messages in thread
From: Cornelia Huck @ 2017-02-27 15:06 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, Halil Pasic

On Mon, 27 Feb 2017 15:09:30 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> Paolo,
> 
> commit 97cd965c070152bc626c7507df9fb356bbe1cd81
> "virtio: use VRingMemoryRegionCaches for avail and used rings"
> does cause a segfault on my s390 system when I use num-queues.
> 
> gdb --args qemu-system-s390x -nographic -enable-kvm -m 1G -drive file=/var/lib/libvirt/qemu/image.zhyp137,if=none,id=d1 -device virtio-blk-ccw,drive=d1,iothread=io1,num-queues=2 -object iothread,id=io1

(...)

> (gdb) bt
> #0  0x0000000001024a26 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x3ffe587bff8, plen=0x3ffe587bff0, is_write=false) at /home/cborntra/REPOS/qemu/exec.c:3187
> #1  0x0000000001025596 in address_space_lduw_internal_cached (cache=0x38, addr=2, attrs=..., result=0x0, endian=DEVICE_BIG_ENDIAN) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:264
> #2  0x0000000001025846 in address_space_lduw_be_cached (cache=0x38, addr=2, attrs=..., result=0x0) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:322
> #3  0x000000000102597e in lduw_be_phys_cached (cache=0x38, addr=2) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:340
> #4  0x0000000001114856 in virtio_lduw_phys_cached (vdev=0x1c57cd0, cache=0x38, pa=2) at /home/cborntra/REPOS/qemu/include/hw/virtio/virtio-access.h:164
> #5  0x000000000111523c in vring_avail_idx (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:201
> #6  0x0000000001115bba in virtio_queue_empty (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:332
> #7  0x000000000111c312 in virtio_queue_host_notifier_aio_poll (opaque=0x3fffde1e0f8) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:2294
> #8  0x000000000147a036 in run_poll_handlers_once (ctx=0x1bb8bb0) at /home/cborntra/REPOS/qemu/util/aio-posix.c:490
> #9  0x000000000147a2fe in try_poll_mode (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:566
> #10 0x000000000147a3ca in aio_poll (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:595
> #11 0x00000000011a0176 in iothread_run (opaque=0x1bb86c0) at /home/cborntra/REPOS/qemu/iothread.c:59
> #12 0x000003ffe9087bc4 in start_thread () at /lib64/libpthread.so.0
> #13 0x000003ffe8f8a9f2 in thread_start () at /lib64/libc.so.6
> 
> It seems to make a difference if its the boot disk or not. Maybe the reset of the
> devices that the bootloader does before handling over control to Linux creates
> some trouble here.

I can reproduce this (the root cause seems to be that the bootloader
only sets up the first queue but the dataplane code wants to handle
both queues); this particular problem is fixed by 
https://patchwork.ozlabs.org/patch/731445/ but then I hit a similar
problem later:

0x0000000010019b46 in address_space_translate_cached (cache=0x60, addr=0, 
    xlat=0x3fffcb7e420, plen=0x3fffcb7e418, is_write=false)
    at /root/git/qemu/exec.c:3187
3187	    assert(addr < cache->len && *plen <= cache->len - addr);

(...)

(gdb) bt
#0  0x0000000010019b46 in address_space_translate_cached (cache=0x60, addr=0, 
    xlat=0x3fffcb7e420, plen=0x3fffcb7e418, is_write=false)
    at /root/git/qemu/exec.c:3187
#1  0x000000001001a5fe in address_space_lduw_internal_cached (cache=0x60, 
    addr=0, attrs=..., result=0x0, endian=DEVICE_BIG_ENDIAN)
    at /root/git/qemu/memory_ldst.inc.c:264
#2  0x000000001001a88e in address_space_lduw_be_cached (cache=0x60, addr=0, 
    attrs=..., result=0x0) at /root/git/qemu/memory_ldst.inc.c:322
#3  0x000000001001a9c6 in lduw_be_phys_cached (cache=0x60, addr=0)
    at /root/git/qemu/memory_ldst.inc.c:340
#4  0x00000000100fa876 in virtio_lduw_phys_cached (vdev=0x10bc2ce0, cache=0x60, 
    pa=0) at /root/git/qemu/include/hw/virtio/virtio-access.h:164
#5  0x00000000100fb536 in vring_used_flags_set_bit (vq=0x3fffdebc090, mask=1)
    at /root/git/qemu/hw/virtio/virtio.c:255
#6  0x00000000100fb7fa in virtio_queue_set_notification (vq=0x3fffdebc090, 
    enable=0) at /root/git/qemu/hw/virtio/virtio.c:297
#7  0x0000000010101d22 in virtio_queue_host_notifier_aio_poll_begin (
    n=0x3fffdebc0f8) at /root/git/qemu/hw/virtio/virtio.c:2285
#8  0x00000000103f4164 in poll_set_started (ctx=0x10ae8230, started=true)
    at /root/git/qemu/util/aio-posix.c:338
#9  0x00000000103f4d5a in try_poll_mode (ctx=0x10ae8230, blocking=true)
    at /root/git/qemu/util/aio-posix.c:553
#10 0x00000000103f4e56 in aio_poll (ctx=0x10ae8230, blocking=true)
    at /root/git/qemu/util/aio-posix.c:595
#11 0x000000001017ea36 in iothread_run (opaque=0x10ae7d40)
    at /root/git/qemu/iothread.c:59
#12 0x000003fffd6084c6 in start_thread () from /lib64/libpthread.so.0
#13 0x000003fffd502ec2 in thread_start () from /lib64/libc.so.6

I think we may be missing guards for not-yet-setup queues in other
places; maybe we can centralize this instead of playing whack-a-mole?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-27 14:09 [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues Christian Borntraeger
  2017-02-27 15:06 ` Cornelia Huck
@ 2017-02-27 15:28 ` Paolo Bonzini
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2017-02-27 15:28 UTC (permalink / raw)
  To: Christian Borntraeger, Michael S. Tsirkin
  Cc: qemu-devel, Cornelia Huck, Halil Pasic



On 27/02/2017 15:09, Christian Borntraeger wrote:
> Paolo,
> 
> commit 97cd965c070152bc626c7507df9fb356bbe1cd81
> "virtio: use VRingMemoryRegionCaches for avail and used rings"
> does cause a segfault on my s390 system when I use num-queues.

I've sent a patch for this already:
[PATCH] virtio: check for vring setup in virtio_queue_empty

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-27 15:06 ` Cornelia Huck
@ 2017-02-27 15:37   ` Cornelia Huck
  2017-02-27 15:41     ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Cornelia Huck @ 2017-02-27 15:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, Halil Pasic

On Mon, 27 Feb 2017 16:06:09 +0100
Cornelia Huck <cornelia.huck@de.ibm.com> wrote:

> On Mon, 27 Feb 2017 15:09:30 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > Paolo,
> > 
> > commit 97cd965c070152bc626c7507df9fb356bbe1cd81
> > "virtio: use VRingMemoryRegionCaches for avail and used rings"
> > does cause a segfault on my s390 system when I use num-queues.
> > 
> > gdb --args qemu-system-s390x -nographic -enable-kvm -m 1G -drive file=/var/lib/libvirt/qemu/image.zhyp137,if=none,id=d1 -device virtio-blk-ccw,drive=d1,iothread=io1,num-queues=2 -object iothread,id=io1
> 
> (...)
> 
> > (gdb) bt
> > #0  0x0000000001024a26 in address_space_translate_cached (cache=0x38, addr=2, xlat=0x3ffe587bff8, plen=0x3ffe587bff0, is_write=false) at /home/cborntra/REPOS/qemu/exec.c:3187
> > #1  0x0000000001025596 in address_space_lduw_internal_cached (cache=0x38, addr=2, attrs=..., result=0x0, endian=DEVICE_BIG_ENDIAN) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:264
> > #2  0x0000000001025846 in address_space_lduw_be_cached (cache=0x38, addr=2, attrs=..., result=0x0) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:322
> > #3  0x000000000102597e in lduw_be_phys_cached (cache=0x38, addr=2) at /home/cborntra/REPOS/qemu/memory_ldst.inc.c:340
> > #4  0x0000000001114856 in virtio_lduw_phys_cached (vdev=0x1c57cd0, cache=0x38, pa=2) at /home/cborntra/REPOS/qemu/include/hw/virtio/virtio-access.h:164
> > #5  0x000000000111523c in vring_avail_idx (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:201
> > #6  0x0000000001115bba in virtio_queue_empty (vq=0x3fffde1e090) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:332
> > #7  0x000000000111c312 in virtio_queue_host_notifier_aio_poll (opaque=0x3fffde1e0f8) at /home/cborntra/REPOS/qemu/hw/virtio/virtio.c:2294
> > #8  0x000000000147a036 in run_poll_handlers_once (ctx=0x1bb8bb0) at /home/cborntra/REPOS/qemu/util/aio-posix.c:490
> > #9  0x000000000147a2fe in try_poll_mode (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:566
> > #10 0x000000000147a3ca in aio_poll (ctx=0x1bb8bb0, blocking=true) at /home/cborntra/REPOS/qemu/util/aio-posix.c:595
> > #11 0x00000000011a0176 in iothread_run (opaque=0x1bb86c0) at /home/cborntra/REPOS/qemu/iothread.c:59
> > #12 0x000003ffe9087bc4 in start_thread () at /lib64/libpthread.so.0
> > #13 0x000003ffe8f8a9f2 in thread_start () at /lib64/libc.so.6
> > 
> > It seems to make a difference if its the boot disk or not. Maybe the reset of the
> > devices that the bootloader does before handling over control to Linux creates
> > some trouble here.
> 
> I can reproduce this (the root cause seems to be that the bootloader
> only sets up the first queue but the dataplane code wants to handle
> both queues); this particular problem is fixed by 
> https://patchwork.ozlabs.org/patch/731445/ but then I hit a similar
> problem later:
> 
> 0x0000000010019b46 in address_space_translate_cached (cache=0x60, addr=0, 
>     xlat=0x3fffcb7e420, plen=0x3fffcb7e418, is_write=false)
>     at /root/git/qemu/exec.c:3187
> 3187	    assert(addr < cache->len && *plen <= cache->len - addr);
> 
> (...)
> 
> (gdb) bt
> #0  0x0000000010019b46 in address_space_translate_cached (cache=0x60, addr=0, 
>     xlat=0x3fffcb7e420, plen=0x3fffcb7e418, is_write=false)
>     at /root/git/qemu/exec.c:3187
> #1  0x000000001001a5fe in address_space_lduw_internal_cached (cache=0x60, 
>     addr=0, attrs=..., result=0x0, endian=DEVICE_BIG_ENDIAN)
>     at /root/git/qemu/memory_ldst.inc.c:264
> #2  0x000000001001a88e in address_space_lduw_be_cached (cache=0x60, addr=0, 
>     attrs=..., result=0x0) at /root/git/qemu/memory_ldst.inc.c:322
> #3  0x000000001001a9c6 in lduw_be_phys_cached (cache=0x60, addr=0)
>     at /root/git/qemu/memory_ldst.inc.c:340
> #4  0x00000000100fa876 in virtio_lduw_phys_cached (vdev=0x10bc2ce0, cache=0x60, 
>     pa=0) at /root/git/qemu/include/hw/virtio/virtio-access.h:164
> #5  0x00000000100fb536 in vring_used_flags_set_bit (vq=0x3fffdebc090, mask=1)
>     at /root/git/qemu/hw/virtio/virtio.c:255
> #6  0x00000000100fb7fa in virtio_queue_set_notification (vq=0x3fffdebc090, 
>     enable=0) at /root/git/qemu/hw/virtio/virtio.c:297
> #7  0x0000000010101d22 in virtio_queue_host_notifier_aio_poll_begin (
>     n=0x3fffdebc0f8) at /root/git/qemu/hw/virtio/virtio.c:2285
> #8  0x00000000103f4164 in poll_set_started (ctx=0x10ae8230, started=true)
>     at /root/git/qemu/util/aio-posix.c:338
> #9  0x00000000103f4d5a in try_poll_mode (ctx=0x10ae8230, blocking=true)
>     at /root/git/qemu/util/aio-posix.c:553
> #10 0x00000000103f4e56 in aio_poll (ctx=0x10ae8230, blocking=true)
>     at /root/git/qemu/util/aio-posix.c:595
> #11 0x000000001017ea36 in iothread_run (opaque=0x10ae7d40)
>     at /root/git/qemu/iothread.c:59
> #12 0x000003fffd6084c6 in start_thread () from /lib64/libpthread.so.0
> #13 0x000003fffd502ec2 in thread_start () from /lib64/libc.so.6
> 
> I think we may be missing guards for not-yet-setup queues in other
> places; maybe we can centralize this instead of playing whack-a-mole?

With the following applied (probably whitespace damaged), my guest
starts:

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e487e36..28906e5 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -287,6 +287,9 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
 void virtio_queue_set_notification(VirtQueue *vq, int enable)
 {
     vq->notification = enable;
+    if (!vq->vring.desc) {
+        return;
+    }
 
     rcu_read_lock();
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {

Maybe introduction of caches just exposed bugs that were already there
(trying to muck with vring state for queues that have not been setup?)
Should we stick some asserts into the respective functions to help
flush out the remaining bugs?

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-27 15:37   ` Cornelia Huck
@ 2017-02-27 15:41     ` Paolo Bonzini
  2017-02-28 12:48       ` Cornelia Huck
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-02-27 15:41 UTC (permalink / raw)
  To: Cornelia Huck, Christian Borntraeger
  Cc: Michael S. Tsirkin, qemu-devel, Halil Pasic



On 27/02/2017 16:37, Cornelia Huck wrote:
> With the following applied (probably whitespace damaged), my guest
> starts:
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e487e36..28906e5 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -287,6 +287,9 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>  void virtio_queue_set_notification(VirtQueue *vq, int enable)
>  {
>      vq->notification = enable;
> +    if (!vq->vring.desc) {
> +        return;
> +    }
>  
>      rcu_read_lock();
>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> 
> Maybe introduction of caches just exposed bugs that were already there
> (trying to muck with vring state for queues that have not been setup?)

Yes, it did.  I had caught a few while writing the patches, but it does
feel like whack-a-mole...

Paolo

> Should we stick some asserts into the respective functions to help
> flush out the remaining bugs?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-27 15:41     ` Paolo Bonzini
@ 2017-02-28 12:48       ` Cornelia Huck
  2017-02-28 13:46         ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Cornelia Huck @ 2017-02-28 12:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christian Borntraeger, Michael S. Tsirkin, qemu-devel, Halil Pasic

On Mon, 27 Feb 2017 16:41:09 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 27/02/2017 16:37, Cornelia Huck wrote:
> > With the following applied (probably whitespace damaged), my guest
> > starts:
> > 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index e487e36..28906e5 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -287,6 +287,9 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> >  void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >  {
> >      vq->notification = enable;
> > +    if (!vq->vring.desc) {
> > +        return;
> > +    }
> >  
> >      rcu_read_lock();
> >      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> > 
> > Maybe introduction of caches just exposed bugs that were already there
> > (trying to muck with vring state for queues that have not been setup?)
> 
> Yes, it did.  I had caught a few while writing the patches, but it does
> feel like whack-a-mole...
> 
> Paolo
> 
> > Should we stick some asserts into the respective functions to help
> > flush out the remaining bugs?

I've been staring at the code some more and I'm not really sure how to
fix this properly.

The dataplane code tries to switch handlers for all virtqueues,
regardless whether they are configured or not. My hack above leaves the
notification in a bit of an ambiguous state, as it cannot
enable/disable notifications on the real queues.

This is ok for this particular case, where we hand over from the bios
(which only enables the first queue) to the Linux kernel (which uses
multiple queues) - but with a virtio reset before additional queues are
configured. I don't think the spec prohibits configuring extra queues
(if present) on the fly, however...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-28 12:48       ` Cornelia Huck
@ 2017-02-28 13:46         ` Paolo Bonzini
  2017-02-28 14:15           ` Cornelia Huck
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-02-28 13:46 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Michael S. Tsirkin, qemu-devel, Halil Pasic



On 28/02/2017 13:48, Cornelia Huck wrote:
> On Mon, 27 Feb 2017 16:41:09 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 27/02/2017 16:37, Cornelia Huck wrote:
>>> With the following applied (probably whitespace damaged), my guest
>>> starts:
>>>
>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>> index e487e36..28906e5 100644
>>> --- a/hw/virtio/virtio.c
>>> +++ b/hw/virtio/virtio.c
>>> @@ -287,6 +287,9 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>>>  void virtio_queue_set_notification(VirtQueue *vq, int enable)
>>>  {
>>>      vq->notification = enable;
>>> +    if (!vq->vring.desc) {
>>> +        return;
>>> +    }
>>>  
>>>      rcu_read_lock();
>>>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>>>
>>> Maybe introduction of caches just exposed bugs that were already there
>>> (trying to muck with vring state for queues that have not been setup?)
>>
>> Yes, it did.  I had caught a few while writing the patches, but it does
>> feel like whack-a-mole...
>>
>> Paolo
>>
>>> Should we stick some asserts into the respective functions to help
>>> flush out the remaining bugs?
> 
> I've been staring at the code some more and I'm not really sure how to
> fix this properly.
> 
> The dataplane code tries to switch handlers for all virtqueues,
> regardless whether they are configured or not. My hack above leaves the
> notification in a bit of an ambiguous state, as it cannot
> enable/disable notifications on the real queues.

What if virtio_queue_set_addr called virtio_queue_set_notification(vq,
vq->notification)?  In fact the RCU-protected part of
virtio_queue_set_notification could become its own function, something
like virtio_queue_update_notification or perhaps a better name.

virtio_queue_set_addr is only called by the virtio transports, not e.g.
on migration, so it seems to be the right spot.

Paolo

> This is ok for this particular case, where we hand over from the bios
> (which only enables the first queue) to the Linux kernel (which uses
> multiple queues) - but with a virtio reset before additional queues are
> configured. I don't think the spec prohibits configuring extra queues
> (if present) on the fly, however...
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues
  2017-02-28 13:46         ` Paolo Bonzini
@ 2017-02-28 14:15           ` Cornelia Huck
  0 siblings, 0 replies; 8+ messages in thread
From: Cornelia Huck @ 2017-02-28 14:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christian Borntraeger, Michael S. Tsirkin, qemu-devel, Halil Pasic

On Tue, 28 Feb 2017 14:46:10 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 28/02/2017 13:48, Cornelia Huck wrote:
> > On Mon, 27 Feb 2017 16:41:09 +0100
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> On 27/02/2017 16:37, Cornelia Huck wrote:
> >>> With the following applied (probably whitespace damaged), my guest
> >>> starts:
> >>>
> >>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>> index e487e36..28906e5 100644
> >>> --- a/hw/virtio/virtio.c
> >>> +++ b/hw/virtio/virtio.c
> >>> @@ -287,6 +287,9 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
> >>>  void virtio_queue_set_notification(VirtQueue *vq, int enable)
> >>>  {
> >>>      vq->notification = enable;
> >>> +    if (!vq->vring.desc) {
> >>> +        return;
> >>> +    }
> >>>  
> >>>      rcu_read_lock();
> >>>      if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >>>
> >>> Maybe introduction of caches just exposed bugs that were already there
> >>> (trying to muck with vring state for queues that have not been setup?)
> >>
> >> Yes, it did.  I had caught a few while writing the patches, but it does
> >> feel like whack-a-mole...
> >>
> >> Paolo
> >>
> >>> Should we stick some asserts into the respective functions to help
> >>> flush out the remaining bugs?
> > 
> > I've been staring at the code some more and I'm not really sure how to
> > fix this properly.
> > 
> > The dataplane code tries to switch handlers for all virtqueues,
> > regardless whether they are configured or not. My hack above leaves the
> > notification in a bit of an ambiguous state, as it cannot
> > enable/disable notifications on the real queues.
> 
> What if virtio_queue_set_addr called virtio_queue_set_notification(vq,
> vq->notification)?  In fact the RCU-protected part of
> virtio_queue_set_notification could become its own function, something
> like virtio_queue_update_notification or perhaps a better name.
> 
> virtio_queue_set_addr is only called by the virtio transports, not e.g.
> on migration, so it seems to be the right spot.

And virtio_queue_set_rings() for virtio-1. This sounds like a plan;
I'll play with it a bit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-28 14:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 14:09 [Qemu-devel] segfault use VRingMemoryRegionCaches for avail and used ring vs num-queues Christian Borntraeger
2017-02-27 15:06 ` Cornelia Huck
2017-02-27 15:37   ` Cornelia Huck
2017-02-27 15:41     ` Paolo Bonzini
2017-02-28 12:48       ` Cornelia Huck
2017-02-28 13:46         ` Paolo Bonzini
2017-02-28 14:15           ` Cornelia Huck
2017-02-27 15:28 ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.