qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Add debug interface to kick/call on purpose
@ 2021-01-15  0:27 Dongli Zhang
  2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dongli Zhang @ 2021-01-15  0:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, ehabkost, mst, joe.jin, armbru, dgilbert, pbonzini,
	joao.m.martins

The virtio device/driver (e.g., vhost-scsi and indeed any device including
e1000e) may hang due to the lost of IRQ or the lost of doorbell register
kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

The virtio-net was in trouble in above link because the 'kick' was not
taking effect (missed).

This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
narrow down if the issue is due to lost of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.

This device can also be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.


Below is from live crash analysis. Initially, the queue=3 has count=30 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=3.

crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
  kref = {
    refcount = {
      refs = {
        counter = 4
      }
    }
  }, 
  wqh = {
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              }, 
              {
                locked = 0 '\000', 
                pending = 0 '\000'
              }, 
              {
                locked_pending = 0, 
                tail = 0
              }
            }
          }
        }
      }
    }, 
    head = {
      next = 0xffffa104ae40d360, 
      prev = 0xffffa104ae40d360
    }
  }, 
  count = 30,  -----> eventfd is 30 !!! 
  flags = 526336, 
  id = 26
}

Now we kick the doorbell for vhost-scsi queue=3 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 3 } }


The counter increased to 31. Suppose the hang issue is addressed, it
indicates something bad is in software that the 'kick' is lost.

crash> eventfd_ctx ffffa10392537ac0
struct eventfd_ctx {
  kref = {
    refcount = {
      refs = {
        counter = 4
      }
    }
  },
  wqh = {
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    head = {
      next = 0xffffa104ae40d360,
      prev = 0xffffa104ae40d360
    }
  },
  count = 31,  -----> eventfd incremented to 31 !!!
  flags = 526336,
  id = 26
}


Only the interface for vhost-scsi is implemented since this is RFC. I will
implement for other types (e.g., eventfd or MSI-X) if the RFC is reasonable.

Thank you very much!

Dongli Zhang




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd
  2021-01-15  0:27 [PATCH RFC 0/2] Add debug interface to kick/call on purpose Dongli Zhang
@ 2021-01-15  0:27 ` Dongli Zhang
  2021-01-19 22:20   ` Eric Blake
  2021-01-15  0:27 ` [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent Dongli Zhang
  2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
  2 siblings, 1 reply; 7+ messages in thread
From: Dongli Zhang @ 2021-01-15  0:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, ehabkost, mst, joe.jin, armbru, dgilbert, pbonzini,
	joao.m.martins

The virtio device/driver (e.g., vhost-scsi) may hang due to the lost of IRQ
or the lost of doorbell register kick, e.g.,

https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html

This patch adds a new debug interface 'DeviceEvent' to DeviceClass to help
narrow down if the issue is due to lost of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 hmp-commands.hx        | 14 ++++++++++++++
 include/hw/qdev-core.h |  6 ++++++
 include/monitor/hmp.h  |  1 +
 qapi/qdev.json         | 30 ++++++++++++++++++++++++++++++
 softmmu/qdev-monitor.c | 41 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 92 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 73e0832ea1..0fbb72568f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1867,3 +1867,17 @@ ERST
         .flags      = "p",
     },
 
+    {
+        .name       = "x-debug-device-event",
+        .args_type  = "dev:s,event:s,queue:l",
+        .params     = "dev event queue",
+        .help       = "generate device event for a specific device queue",
+        .cmd        = hmp_x_debug_device_event,
+        .flags      = "p",
+    },
+
+SRST
+``x-debug-device-event`` *dev* *event* *queue*
+  Generate device event *event* for specific *queue* of *dev*
+ERST
+
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bafc311bfa..83df3bab89 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -29,9 +29,14 @@ typedef enum DeviceCategory {
     DEVICE_CATEGORY_MAX
 } DeviceCategory;
 
+#define DEVICE_EVENT_CALL 1
+#define DEVICE_EVENT_KICK 2
+
 typedef void (*DeviceRealize)(DeviceState *dev, Error **errp);
 typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
+typedef void (*DeviceEvent)(DeviceState *dev, int event, int queue,
+                            Error **errp);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
 
@@ -132,6 +137,7 @@ struct DeviceClass {
     DeviceReset reset;
     DeviceRealize realize;
     DeviceUnrealize unrealize;
+    DeviceEvent event;
 
     /* device state */
     const VMStateDescription *vmsd;
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index ed2913fd18..ffb48fce06 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -133,5 +133,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
 void hmp_replay_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
+void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi/qdev.json b/qapi/qdev.json
index b83178220b..6fc7a5bfc1 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -124,3 +124,33 @@
 ##
 { 'event': 'DEVICE_DELETED',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @x-debug-device-event:
+#
+# Generate device event for a specific device queue
+#
+# @dev: device path
+#
+# @event: event (e.g., kick or call) to trigger
+#
+# @queue: queue id
+#
+# Returns: Nothing on success
+#
+# Since: 5.3
+#
+# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
+#        send notification to QEMU/vhost while the 'call' event is to
+#        interrupt VM on purpose.
+#
+# Example:
+#
+# -> { "execute": "x-debug-device_event",
+#      "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
+#                     "queue": "1" } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'x-debug-device-event',
+  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 8dc656becc..63dee5f1a6 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -915,6 +915,47 @@ void hmp_device_del(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, err);
 }
 
+void qmp_x_debug_device_event(const char *dev, const char *event,
+                              int64_t queue, Error **errp)
+{
+    DeviceState *device = find_device_state(dev, NULL);
+    DeviceClass *dc;
+    int evt;
+
+    if (!device) {
+        error_setg(errp, "Device %s not found", dev);
+        return;
+    }
+
+    dc = DEVICE_GET_CLASS(device);
+    if (!dc->event) {
+        error_setg(errp, "device_event is not supported");
+        return;
+    }
+
+    if (!strcmp(event, "kick"))
+        evt = DEVICE_EVENT_KICK;
+    else if (!strcmp(event, "call"))
+        evt = DEVICE_EVENT_CALL;
+    else {
+        error_setg(errp, "Unsupported event %s", event);
+        return;
+    }
+
+    dc->event(device, evt, queue, errp);
+}
+
+void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict)
+{
+    const char *dev = qdict_get_str(qdict, "dev");
+    const char *event = qdict_get_str(qdict, "event");
+    int queue = qdict_get_try_int(qdict, "queue", -1);
+    Error *err = NULL;
+
+    qmp_x_debug_device_event(dev, event, queue, &err);
+    hmp_handle_error(mon, err);
+}
+
 BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 {
     DeviceState *dev;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent
  2021-01-15  0:27 [PATCH RFC 0/2] Add debug interface to kick/call on purpose Dongli Zhang
  2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
@ 2021-01-15  0:27 ` Dongli Zhang
  2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
  2 siblings, 0 replies; 7+ messages in thread
From: Dongli Zhang @ 2021-01-15  0:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: berrange, ehabkost, mst, joe.jin, armbru, dgilbert, pbonzini,
	joao.m.martins

This patch implements DeviceEvent for vhost-scsi. As RFC, this patch only
considers the case of eventfd and only for vhost-scsi.

Below are example for HMP and QAPI.

(qemu) device_event /machine/peripheral/vscsi0 kick 1

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
                 "event": "kick",
                 "queue": 1 } }

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 hw/virtio/vhost-scsi-pci.c | 10 ++++++++++
 hw/virtio/virtio.c         | 19 +++++++++++++++++++
 include/hw/virtio/virtio.h |  3 +++
 3 files changed, 32 insertions(+)

diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index cb71a294fa..0236720868 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -62,12 +62,22 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
     qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
 }
 
+static void vhost_scsi_pci_event(DeviceState *dev, int event, int queue,
+                                 Error **errp)
+{
+    VHostSCSIPCI *vscsi = VHOST_SCSI_PCI(dev);
+    DeviceState *vdev = DEVICE(&vscsi->vdev);
+
+    virtio_device_event_eventfd(vdev, event, queue, errp);
+}
+
 static void vhost_scsi_pci_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
     PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
     k->realize = vhost_scsi_pci_realize;
+    dc->event = vhost_scsi_pci_event;
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
     device_class_set_props(dc, vhost_scsi_pci_properties);
     pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b308026596..d9168c4ac8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3690,6 +3690,25 @@ static void virtio_device_unrealize(DeviceState *dev)
     vdev->bus_name = NULL;
 }
 
+void virtio_device_event_eventfd(DeviceState *dev, int event, int queue,
+                         Error **errp)
+{
+    struct VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    int num = virtio_get_num_queues(vdev);
+
+    if (queue < 0 || queue >= num) {
+        error_setg(errp, "Invalid queue %d", queue);
+        return;
+    }
+
+    VirtQueue *vq = &vdev->vq[queue];
+
+    if (event == DEVICE_EVENT_CALL)
+        event_notifier_set(&vq->guest_notifier);
+    else if (event == DEVICE_EVENT_KICK)
+        event_notifier_set(&vq->host_notifier);
+}
+
 static void virtio_device_free_virtqueues(VirtIODevice *vdev)
 {
     int i;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b7ece7a6a8..606ebdfb85 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -397,4 +397,7 @@ static inline bool virtio_device_disabled(VirtIODevice *vdev)
 bool virtio_legacy_allowed(VirtIODevice *vdev);
 bool virtio_legacy_check_disabled(VirtIODevice *vdev);
 
+void virtio_device_event_eventfd(DeviceState *dev, int event, int queue,
+                                 Error **errp);
+
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
  2021-01-15  0:27 [PATCH RFC 0/2] Add debug interface to kick/call on purpose Dongli Zhang
  2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
  2021-01-15  0:27 ` [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent Dongli Zhang
@ 2021-01-15 10:27 ` Daniel P. Berrangé
  2021-01-18 16:59   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2021-01-15 10:27 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: ehabkost, mst, joe.jin, armbru, qemu-devel, pbonzini,
	joao.m.martins, dgilbert

On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
> The virtio device/driver (e.g., vhost-scsi and indeed any device including
> e1000e) may hang due to the lost of IRQ or the lost of doorbell register
> kick, e.g.,
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> 
> The virtio-net was in trouble in above link because the 'kick' was not
> taking effect (missed).
> 
> This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
> narrow down if the issue is due to lost of irq/kick. So far the new
> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
> IRQ).
> 
> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> on purpose by admin at QEMU/host side for a specific device.

I'm really not convinced that we want to give admins the direct ability to
poke at internals of devices in a running QEMU. It feels like there is way
too much potential for the admin to make a situation far worse by doing
the wrong thing here, and people dealing with support tickets will have
no idea that the admin has been poking internals of the device and broken
it by doing something wrong.

You pointed to bug that hit where this could conceivably be useful, but
that's a one time issue and should not a common occurrance that justifies
making an official public API to poke at devices forever more IMHO.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
  2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
@ 2021-01-18 16:59   ` Dr. David Alan Gilbert
  2021-01-19 22:11     ` Dongli Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Dr. David Alan Gilbert @ 2021-01-18 16:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: ehabkost, mst, Dongli Zhang, joe.jin, qemu-devel, armbru,
	pbonzini, joao.m.martins

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
> > The virtio device/driver (e.g., vhost-scsi and indeed any device including
> > e1000e) may hang due to the lost of IRQ or the lost of doorbell register
> > kick, e.g.,
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > 
> > The virtio-net was in trouble in above link because the 'kick' was not
> > taking effect (missed).
> > 
> > This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
> > narrow down if the issue is due to lost of irq/kick. So far the new
> > interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> > e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
> > IRQ).
> > 
> > The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> > vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> > on purpose by admin at QEMU/host side for a specific device.
> 
> I'm really not convinced that we want to give admins the direct ability to
> poke at internals of devices in a running QEMU. It feels like there is way
> too much potential for the admin to make a situation far worse by doing
> the wrong thing here,

We already do have commands to write to an iport, and to inject MCEs for
example; is this that much different?

> and people dealing with support tickets will have
> no idea that the admin has been poking internals of the device and broken
> it by doing something wrong.

You could add a one time log entry to say that this mischeivous command
had been used.

> You pointed to bug that hit where this could conceivably be useful, but
> that's a one time issue and should not a common occurrance that justifies
> making an official public API to poke at devices forever more IMHO.

I think where it might be practically useful is if you were debugging a
hung customers VM and need to find a way to get it to move again.
THat's something I'm not familiar with on the virtio side;
mst - is this useful from a virtio side?

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/2] Add debug interface to kick/call on purpose
  2021-01-18 16:59   ` Dr. David Alan Gilbert
@ 2021-01-19 22:11     ` Dongli Zhang
  0 siblings, 0 replies; 7+ messages in thread
From: Dongli Zhang @ 2021-01-19 22:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Daniel P. Berrangé
  Cc: ehabkost, mst, joe.jin, armbru, qemu-devel, pbonzini, joao.m.martins



On 1/18/21 8:59 AM, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
>> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
>>> The virtio device/driver (e.g., vhost-scsi and indeed any device including
>>> e1000e) may hang due to the lost of IRQ or the lost of doorbell register
>>> kick, e.g.,
>>>
>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!K_zaQzJhlvPjRZe9efEtyX8vB6fMlKQeNy_RGz7oPp9k76pC8zarG1nSs1SFSL2xI1g$ 
>>>
>>> The virtio-net was in trouble in above link because the 'kick' was not
>>> taking effect (missed).
>>>
>>> This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
>>> narrow down if the issue is due to lost of irq/kick. So far the new
>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
>>> IRQ).
>>>
>>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>>> on purpose by admin at QEMU/host side for a specific device.
>>
>> I'm really not convinced that we want to give admins the direct ability to
>> poke at internals of devices in a running QEMU. It feels like there is way
>> too much potential for the admin to make a situation far worse by doing
>> the wrong thing here,
> 
> We already do have commands to write to an iport, and to inject MCEs for
> example; is this that much different?
> 
>> and people dealing with support tickets will have
>> no idea that the admin has been poking internals of the device and broken
>> it by doing something wrong.
> 
> You could add a one time log entry to say that this mischeivous command
> had been used.
> 
>> You pointed to bug that hit where this could conceivably be useful, but
>> that's a one time issue and should not a common occurrance that justifies
>> making an official public API to poke at devices forever more IMHO.
> 
> I think where it might be practically useful is if you were debugging a
> hung customers VM and need to find a way to get it to move again.
> THat's something I'm not familiar with on the virtio side;
> mst - is this useful from a virtio side?

BTW, the linux kernel blk-mq has similar idea/interface. To run the below will
be able to 'run' the block IO queue on purpose.

echo "kick" > /sys/kernel/debug/block/sda/state

It is helpful for diagnostic if we assume the IO stall is due to an unknown race
that a 'run' of queue is missing.

Dongli Zhang

> 
> Dave
> 
>> Regards,
>> Daniel
>> -- 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd
  2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
@ 2021-01-19 22:20   ` Eric Blake
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2021-01-19 22:20 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel
  Cc: berrange, ehabkost, mst, joe.jin, dgilbert, armbru, pbonzini,
	joao.m.martins

On 1/14/21 6:27 PM, Dongli Zhang wrote:
> The virtio device/driver (e.g., vhost-scsi) may hang due to the lost of IRQ

s/lost/loss/

> or the lost of doorbell register kick, e.g.,

and again

> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> 
> This patch adds a new debug interface 'DeviceEvent' to DeviceClass to help
> narrow down if the issue is due to lost of irq/kick. So far the new

and again

> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
> IRQ).
> 
> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> on purpose by admin at QEMU/host side for a specific device.
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---

> +++ b/qapi/qdev.json
> @@ -124,3 +124,33 @@
>  ##
>  { 'event': 'DEVICE_DELETED',
>    'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @x-debug-device-event:
> +#
> +# Generate device event for a specific device queue
> +#
> +# @dev: device path
> +#
> +# @event: event (e.g., kick or call) to trigger
> +#
> +# @queue: queue id
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 5.3

The next release is named 6.0, not 5.3.

> +#
> +# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
> +#        send notification to QEMU/vhost while the 'call' event is to
> +#        interrupt VM on purpose.
> +#
> +# Example:
> +#
> +# -> { "execute": "x-debug-device_event",
> +#      "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
> +#                     "queue": "1" } }

Your example has queue typed as a string...

> +# <- { "return": {} }
> +#
> +##
> +{ 'command': 'x-debug-device-event',
> +  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }

...which does not match its actual type as an integer.

event should be an enum type (the finite choice of 'kick' or 'call', and
introspectible if we add new choices in the future) rather than an
open-coded str.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-19 22:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15  0:27 [PATCH RFC 0/2] Add debug interface to kick/call on purpose Dongli Zhang
2021-01-15  0:27 ` [PATCH RFC 1/2] qdev: add debug interface to kick/call eventfd Dongli Zhang
2021-01-19 22:20   ` Eric Blake
2021-01-15  0:27 ` [PATCH RFC 2/2] vhost-scsi: implement DeviceEvent Dongli Zhang
2021-01-15 10:27 ` [PATCH RFC 0/2] Add debug interface to kick/call on purpose Daniel P. Berrangé
2021-01-18 16:59   ` Dr. David Alan Gilbert
2021-01-19 22:11     ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).