qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Dongli Zhang <dongli.zhang@oracle.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, fam@euphon.net, berrange@redhat.com,
	ehabkost@redhat.com, mst@redhat.com, joe.jin@oracle.com,
	armbru@redhat.com, dgilbert@redhat.com, stefanha@redhat.com,
	pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [PATCH 0/6] Add debug interface to kick/call on purpose
Date: Fri, 26 Mar 2021 15:24:46 +0800	[thread overview]
Message-ID: <440216a8-821f-92dd-bc8b-fb2427bdc0e6@redhat.com> (raw)
In-Reply-To: <20210326054433.11762-1-dongli.zhang@oracle.com>


在 2021/3/26 下午1:44, Dongli Zhang 写道:
> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
> the loss of doorbell kick, e.g.,
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
>
> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>
> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
> to help narrow down if the issue is due to loss of irq/kick. So far the new
> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
> or legacy IRQ).
>
> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> on purpose by admin at QEMU/host side for a specific device.
>
>
> This device can be used as a workaround if call/kick is lost due to
> virtualization software (e.g., kernel or QEMU) issue.
>
> We may also implement the interface for VFIO PCI, e.g., to write to
> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
> on purpose. This is considered future work once the virtio part is done.
>
>
> Below is from live crash analysis. Initially, the queue=2 has count=15 for
> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
> used available. We suspect this is because vhost-scsi was not notified by
> VM. In order to narrow down and analyze the issue, we use live crash to
> dump the current counter of eventfd for queue=2.
>
> crash> eventfd_ctx ffff8f67f6bbe700
> struct eventfd_ctx {
>    kref = {
>      refcount = {
>        refs = {
>          counter = 4
>        }
>      }
>    },
>    wqh = {
>      lock = {
>        {
>          rlock = {
>            raw_lock = {
>              val = {
>                counter = 0
>              }
>            }
>          }
>        }
>      },
>      head = {
>        next = 0xffff8f841dc08e18,
>        prev = 0xffff8f841dc08e18
>      }
>    },
>    count = 15, ---> eventfd is 15 !!!
>    flags = 526336
> }
>
> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
> with this interface.
>
> { "execute": "x-debug-device-event",
>    "arguments": { "dev": "/machine/peripheral/vscsi0",
>                   "event": "kick", "queue": 2 } }
>
> The counter is increased to 16. Suppose the hang issue is resolved, it
> indicates something bad is in software that the 'kick' is lost.


What do you mean by "software" here? And it looks to me you're testing 
whether event_notifier_set() is called by virtio_queue_notify() here. If 
so, I'm not sure how much value could we gain from a dedicated debug 
interface like this consider there're a lot of exisinting general 
purpose debugging method like tracing or gdb. I'd say the path from 
virtio_queue_notify() to event_notifier_set() is only a very small 
fraction of the process of virtqueue kick which is unlikey to be buggy. 
Consider usually the ioeventfd will be offloaded to KVM, it's more a 
chance that something is wrong in setuping ioeventfd instead of here. 
Irq is even more complicated.

I think we could not gain much for introducing an dedicated mechanism 
for such a corner case.

Thanks


>
> crash> eventfd_ctx ffff8f67f6bbe700
> struct eventfd_ctx {
>    kref = {
>      refcount = {
>        refs = {
>          counter = 4
>        }
>      }
>    },
>    wqh = {
>      lock = {
>        {
>          rlock = {
>            raw_lock = {
>              val = {
>                counter = 0
>              }
>            }
>          }
>        }
>      },
>      head = {
>        next = 0xffff8f841dc08e18,
>        prev = 0xffff8f841dc08e18
>      }
>    },
>    count = 16, ---> eventfd incremented to 16 !!!
>    flags = 526336
> }
>
>
> Original RFC link:
>
> https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg03441.html
>
> Changed since RFC:
>    - add support for more virtio/vhost pci devices
>    - add log (toggled by DEBUG_VIRTIO_EVENT) to virtio.c to say that this
>      mischeivous command had been used
>    - fix grammer error (s/lost/loss/)
>    - change version to 6.1
>    - fix incorrect example in qapi/qdev.json
>    - manage event types with enum/array, instead of hard coding
>
>
> Dongli Zhang (6):
>     qdev: introduce qapi/hmp command for kick/call event
>     virtio: introduce helper function for kick/call device event
>     virtio-blk-pci: implement device event interface for kick/call
>     virtio-scsi-pci: implement device event interface for kick/call
>     vhost-scsi-pci: implement device event interface for kick/call
>     virtio-net-pci: implement device event interface for kick/call
>
>   hmp-commands.hx                 | 14 ++++++++
>   hw/block/virtio-blk.c           |  9 +++++
>   hw/net/virtio-net.c             |  9 +++++
>   hw/scsi/vhost-scsi.c            |  6 ++++
>   hw/scsi/virtio-scsi.c           |  9 +++++
>   hw/virtio/vhost-scsi-pci.c      | 10 ++++++
>   hw/virtio/virtio-blk-pci.c      | 10 ++++++
>   hw/virtio/virtio-net-pci.c      | 10 ++++++
>   hw/virtio/virtio-scsi-pci.c     | 10 ++++++
>   hw/virtio/virtio.c              | 64 ++++++++++++++++++++++++++++++++++++
>   include/hw/qdev-core.h          |  9 +++++
>   include/hw/virtio/vhost-scsi.h  |  3 ++
>   include/hw/virtio/virtio-blk.h  |  2 ++
>   include/hw/virtio/virtio-net.h  |  3 ++
>   include/hw/virtio/virtio-scsi.h |  3 ++
>   include/hw/virtio/virtio.h      |  3 ++
>   include/monitor/hmp.h           |  1 +
>   qapi/qdev.json                  | 30 +++++++++++++++++
>   softmmu/qdev-monitor.c          | 56 +++++++++++++++++++++++++++++++
>   19 files changed, 261 insertions(+)
>
>
> I did tests with below cases.
>
> - virtio-blk-pci (ioeventfd on/off, iothread, live migration)
> - virtio-scsi-pci (ioeventfd on/off)
> - vhost-scsi-pci
> - virtio-net-pci (ioeventfd on/off, vhost)
>
> Thank you very much!
>
> Dongli Zhang
>
>



  parent reply	other threads:[~2021-03-26  7:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26  5:44 [PATCH 0/6] Add debug interface to kick/call on purpose Dongli Zhang
2021-03-26  5:44 ` [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event Dongli Zhang
2021-04-07 13:40   ` Eduardo Habkost
2021-04-08  5:49     ` Dongli Zhang
2021-03-26  5:44 ` [PATCH 2/6] virtio: introduce helper function for kick/call device event Dongli Zhang
2021-03-26  5:44 ` [PATCH 3/6] virtio-blk-pci: implement device event interface for kick/call Dongli Zhang
2021-03-26  5:44 ` [PATCH 4/6] virtio-scsi-pci: " Dongli Zhang
2021-03-26  5:44 ` [PATCH 5/6] vhost-scsi-pci: " Dongli Zhang
2021-03-26  5:44 ` [PATCH 6/6] virtio-net-pci: " Dongli Zhang
2021-03-26  7:24 ` Jason Wang [this message]
2021-03-26 21:16   ` [PATCH 0/6] Add debug interface to kick/call on purpose Dongli Zhang
2021-03-29  3:56     ` Jason Wang
2021-03-30  7:29       ` Dongli Zhang
2021-04-02  3:47         ` Jason Wang
2021-04-05 20:00           ` Dongli Zhang
2021-04-06  1:55             ` Jason Wang
2021-04-06  8:43               ` Dongli Zhang
2021-04-06 23:27                 ` Dongli Zhang
2021-04-07  2:20                   ` Jason Wang
2021-04-08  5:51                     ` Dongli Zhang
2021-04-08  5:59                       ` Jason Wang
2021-04-07  2:18                 ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=440216a8-821f-92dd-bc8b-fb2427bdc0e6@redhat.com \
    --to=jasowang@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=dongli.zhang@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=fam@euphon.net \
    --cc=joe.jin@oracle.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).