From: Jason Wang <jasowang@redhat.com>
To: Dongli Zhang <dongli.zhang@oracle.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, fam@euphon.net, berrange@redhat.com,
ehabkost@redhat.com, mst@redhat.com, joe.jin@oracle.com,
armbru@redhat.com, dgilbert@redhat.com, stefanha@redhat.com,
pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [PATCH 0/6] Add debug interface to kick/call on purpose
Date: Fri, 26 Mar 2021 15:24:46 +0800 [thread overview]
Message-ID: <440216a8-821f-92dd-bc8b-fb2427bdc0e6@redhat.com> (raw)
In-Reply-To: <20210326054433.11762-1-dongli.zhang@oracle.com>
在 2021/3/26 下午1:44, Dongli Zhang 写道:
> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
> the loss of doorbell kick, e.g.,
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
>
> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>
> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
> to help narrow down if the issue is due to loss of irq/kick. So far the new
> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
> or legacy IRQ).
>
> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> on purpose by admin at QEMU/host side for a specific device.
>
>
> This device can be used as a workaround if call/kick is lost due to
> virtualization software (e.g., kernel or QEMU) issue.
>
> We may also implement the interface for VFIO PCI, e.g., to write to
> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
> on purpose. This is considered future work once the virtio part is done.
>
>
> Below is from live crash analysis. Initially, the queue=2 has count=15 for
> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
> used available. We suspect this is because vhost-scsi was not notified by
> VM. In order to narrow down and analyze the issue, we use live crash to
> dump the current counter of eventfd for queue=2.
>
> crash> eventfd_ctx ffff8f67f6bbe700
> struct eventfd_ctx {
> kref = {
> refcount = {
> refs = {
> counter = 4
> }
> }
> },
> wqh = {
> lock = {
> {
> rlock = {
> raw_lock = {
> val = {
> counter = 0
> }
> }
> }
> }
> },
> head = {
> next = 0xffff8f841dc08e18,
> prev = 0xffff8f841dc08e18
> }
> },
> count = 15, ---> eventfd is 15 !!!
> flags = 526336
> }
>
> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
> with this interface.
>
> { "execute": "x-debug-device-event",
> "arguments": { "dev": "/machine/peripheral/vscsi0",
> "event": "kick", "queue": 2 } }
>
> The counter is increased to 16. Suppose the hang issue is resolved, it
> indicates something bad is in software that the 'kick' is lost.
What do you mean by "software" here? And it looks to me you're testing
whether event_notifier_set() is called by virtio_queue_notify() here. If
so, I'm not sure how much value could we gain from a dedicated debug
interface like this consider there're a lot of exisinting general
purpose debugging method like tracing or gdb. I'd say the path from
virtio_queue_notify() to event_notifier_set() is only a very small
fraction of the process of virtqueue kick which is unlikey to be buggy.
Consider usually the ioeventfd will be offloaded to KVM, it's more a
chance that something is wrong in setuping ioeventfd instead of here.
Irq is even more complicated.
I think we could not gain much for introducing an dedicated mechanism
for such a corner case.
Thanks
>
> crash> eventfd_ctx ffff8f67f6bbe700
> struct eventfd_ctx {
> kref = {
> refcount = {
> refs = {
> counter = 4
> }
> }
> },
> wqh = {
> lock = {
> {
> rlock = {
> raw_lock = {
> val = {
> counter = 0
> }
> }
> }
> }
> },
> head = {
> next = 0xffff8f841dc08e18,
> prev = 0xffff8f841dc08e18
> }
> },
> count = 16, ---> eventfd incremented to 16 !!!
> flags = 526336
> }
>
>
> Original RFC link:
>
> https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html
>
> Changed since RFC:
> - add support for more virtio/vhost pci devices
> - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this
> mischeivous command had been used
> - fix grammer error (s/lost/loss/)
> - change version to 6.1
> - fix incorrect example in qapi/qdev.json
> - manage event types with enum/array, instead of hard coding
>
>
> Dongli Zhang (6):
> qdev: introduce qapi/hmp command for kick/call event
> virtio: introduce helper function for kick/call device event
> virtio-blk-pci: implement device event interface for kick/call
> virtio-scsi-pci: implement device event interface for kick/call
> vhost-scsi-pci: implement device event interface for kick/call
> virtio-net-pci: implement device event interface for kick/call
>
> hmp-commands.hx | 14 ++++++++
> hw/block/virtio-blk.c | 9 +++++
> hw/net/virtio-net.c | 9 +++++
> hw/scsi/vhost-scsi.c | 6 ++++
> hw/scsi/virtio-scsi.c | 9 +++++
> hw/virtio/vhost-scsi-pci.c | 10 ++++++
> hw/virtio/virtio-blk-pci.c | 10 ++++++
> hw/virtio/virtio-net-pci.c | 10 ++++++
> hw/virtio/virtio-scsi-pci.c | 10 ++++++
> hw/virtio/virtio.c | 64 ++++++++++++++++++++++++++++++++++++
> include/hw/qdev-core.h | 9 +++++
> include/hw/virtio/vhost-scsi.h | 3 ++
> include/hw/virtio/virtio-blk.h | 2 ++
> include/hw/virtio/virtio-net.h | 3 ++
> include/hw/virtio/virtio-scsi.h | 3 ++
> include/hw/virtio/virtio.h | 3 ++
> include/monitor/hmp.h | 1 +
> qapi/qdev.json | 30 +++++++++++++++++
> softmmu/qdev-monitor.c | 56 +++++++++++++++++++++++++++++++
> 19 files changed, 261 insertions(+)
>
>
> I did tests with below cases.
>
> - virtio-blk-pci (ioeventfd on/off, iothread, live migration)
> - virtio-scsi-pci (ioeventfd on/off)
> - vhost-scsi-pci
> - virtio-net-pci (ioeventfd on/off, vhost)
>
> Thank you very much!
>
> Dongli Zhang
>
>
next prev parent reply other threads:[~2021-03-26 7:26 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-26 5:44 [PATCH 0/6] Add debug interface to kick/call on purpose Dongli Zhang
2021-03-26 5:44 ` [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event Dongli Zhang
2021-04-07 13:40 ` Eduardo Habkost
2021-04-08 5:49 ` Dongli Zhang
2021-03-26 5:44 ` [PATCH 2/6] virtio: introduce helper function for kick/call device event Dongli Zhang
2021-03-26 5:44 ` [PATCH 3/6] virtio-blk-pci: implement device event interface for kick/call Dongli Zhang
2021-03-26 5:44 ` [PATCH 4/6] virtio-scsi-pci: " Dongli Zhang
2021-03-26 5:44 ` [PATCH 5/6] vhost-scsi-pci: " Dongli Zhang
2021-03-26 5:44 ` [PATCH 6/6] virtio-net-pci: " Dongli Zhang
2021-03-26 7:24 ` Jason Wang [this message]
2021-03-26 21:16 ` [PATCH 0/6] Add debug interface to kick/call on purpose Dongli Zhang
2021-03-29 3:56 ` Jason Wang
2021-03-30 7:29 ` Dongli Zhang
2021-04-02 3:47 ` Jason Wang
2021-04-05 20:00 ` Dongli Zhang
2021-04-06 1:55 ` Jason Wang
2021-04-06 8:43 ` Dongli Zhang
2021-04-06 23:27 ` Dongli Zhang
2021-04-07 2:20 ` Jason Wang
2021-04-08 5:51 ` Dongli Zhang
2021-04-08 5:59 ` Jason Wang
2021-04-07 2:18 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=440216a8-821f-92dd-bc8b-fb2427bdc0e6@redhat.com \
--to=jasowang@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=dongli.zhang@oracle.com \
--cc=ehabkost@redhat.com \
--cc=fam@euphon.net \
--cc=joe.jin@oracle.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).