qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
@ 2020-07-15 15:01 Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 1/7] scsi/scsi_bus: switch search direction in scsi_device_find Maxim Levitsky
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Maxim Levitsky,
	Paolo Bonzini

Hi!

This is a patch series that is a result of my discussion with Paulo on
how to correctly fix the root cause of the BZ #1812399.

The root cause of this bug is the fact that IO thread is running mostly
unlocked versus main thread on which device hotplug is done.

qdev_device_add first creates the device object, then places it on the bus,
and only then realizes it.

However some drivers and currently only virtio-scsi enumerate its child bus
devices on each request that is received from the guest and that can happen on the IO
thread.

Thus we have a window when new device is on the bus but not realized and can be accessed
by the virtio-scsi driver in that state.

Fix that by doing two things:

1. Add partial RCU protection to the list of a bus's child devices.
This allows the scsi IO thread to safely enumerate the child devices
while it races with the hotplug placing the device on the bus.

2. Make the virtio-scsi driver check .realized property of the scsi device
and avoid touching the device if it isn't

Note that in the particular bug report the issue wasn't a race but rather due
to combination of things, the .realize code in the middle managed to trigger IO on the virtqueue
which caused the virtio-scsi driver to access the half realized device. However
since this can happen as well with real IO thread, this patch series was done,
which fixes this as well.

Changes from V1:
  * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the failing unit test,
    make check pass now

  * Patches 6,7 are new as well: I added scsi_device_get as suggested by Stefan as well, although
    this is more a refactoring that anything else as it doesn't solve
    an existing race.

  * Addressed most of the review feedback from V1
    - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK

Changes from V2:

  * No longer RFC
  * Addressed most of the feedback from Stefan
  * Fixed reference count leak in patch 7 when device is about to be unrealized
  * Better testing

This series was tested by adding a virtio-scsi drive with iothread,
then running fio stress job in the guest in a loop, and then adding/removing
the scsi drive on the host in the loop.
This test was failing usually on 1st iteration withouth this patch series,
and now it seems to work smoothly.

Best regards,
	Maxim Levitsky

Maxim Levitsky (7):
  scsi/scsi_bus: switch search direction in scsi_device_find
  Implement drain_call_rcu and use it in hmp_device_del
  device-core: use RCU for list of childs of a bus
  device-core: use atomic_set on .realized property
  virtio-scsi: don't touch scsi devices that are not yet realized or
    about to be un-realized
  scsi: Add scsi_device_get
  virtio-scsi: use scsi_device_get

 hw/core/bus.c          | 28 +++++++++++++--------
 hw/core/qdev.c         | 56 +++++++++++++++++++++++++++++++-----------
 hw/scsi/scsi-bus.c     | 48 +++++++++++++++++++++++++++++++-----
 hw/scsi/virtio-scsi.c  | 47 ++++++++++++++++++++++++++++-------
 include/hw/qdev-core.h | 11 +++++++++
 include/hw/scsi/scsi.h |  2 ++
 include/qemu/rcu.h     |  1 +
 qdev-monitor.c         | 22 +++++++++++++++++
 util/rcu.c             | 55 +++++++++++++++++++++++++++++++++++++++++
 9 files changed, 230 insertions(+), 40 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/7] scsi/scsi_bus: switch search direction in scsi_device_find
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 2/7] Implement drain_call_rcu and use it in hmp_device_del Maxim Levitsky
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Maxim Levitsky,
	Stefan Hajnoczi, Paolo Bonzini

This change will allow us to convert the bus children list to RCU,
while not changing the logic of this function

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/scsi-bus.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index df65cc2223..f8adfbc2a5 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1572,7 +1572,7 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
     BusChild *kid;
     SCSIDevice *target_dev = NULL;
 
-    QTAILQ_FOREACH_REVERSE(kid, &bus->qbus.children, sibling) {
+    QTAILQ_FOREACH(kid, &bus->qbus.children, sibling) {
         DeviceState *qdev = kid->child;
         SCSIDevice *dev = SCSI_DEVICE(qdev);
 
@@ -1580,7 +1580,15 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
             if (dev->lun == lun) {
                 return dev;
             }
-            target_dev = dev;
+
+            /*
+             * If we don't find exact match (channel/bus/lun),
+             * we will return the first device which matches channel/bus
+             */
+
+            if (!target_dev) {
+                target_dev = dev;
+            }
         }
     }
     return target_dev;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/7] Implement drain_call_rcu and use it in hmp_device_del
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 1/7] scsi/scsi_bus: switch search direction in scsi_device_find Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 3/7] device-core: use RCU for list of childs of a bus Maxim Levitsky
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Stefan Hajnoczi,
	Maxim Levitsky, Stefan Hajnoczi, Paolo Bonzini

This allows to preserve the semantics of hmp_device_del,
that the device is deleted immediatly which was changed by previos
patch that delayed this to RCU callback

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/rcu.h |  1 +
 qdev-monitor.c     | 22 +++++++++++++++++++
 util/rcu.c         | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 570aa603eb..0e375ebe13 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -133,6 +133,7 @@ struct rcu_head {
 };
 
 extern void call_rcu1(struct rcu_head *head, RCUCBFunc *func);
+extern void drain_call_rcu(void);
 
 /* The operands of the minus operator must have the same type,
  * which must be the one that we specify in the cast.
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 71ebce19df..a889fbc509 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -803,6 +803,18 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
         return;
     }
     dev = qdev_device_add(opts, errp);
+
+    /*
+     * Drain all pending RCU callbacks. This is done because
+     * some bus related operations can delay a device removal
+     * (in this case this can happen if device is added and then
+     * removed due to a configuration error)
+     * to a RCU callback, but user might expect that this interface
+     * will finish its job completely once qmp command returns result
+     * to the user
+     */
+    drain_call_rcu();
+
     if (!dev) {
         qemu_opts_del(opts);
         return;
@@ -894,6 +906,16 @@ void qmp_device_del(const char *id, Error **errp)
         }
 
         qdev_unplug(dev, errp);
+
+        /*
+         * Drain all pending RCU callbacks. This is done because
+         * some bus related operations can delay a device removal
+         * to a RCU callback, but user might expect that this interface
+         * will finish its job completely once qmp command returns result
+         * to the user
+         */
+
+        drain_call_rcu();
     }
 }
 
diff --git a/util/rcu.c b/util/rcu.c
index 60a37f72c3..c4fefa9333 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -293,6 +293,61 @@ void call_rcu1(struct rcu_head *node, void (*func)(struct rcu_head *node))
     qemu_event_set(&rcu_call_ready_event);
 }
 
+
+struct rcu_drain {
+    struct rcu_head rcu;
+    QemuEvent drain_complete_event;
+};
+
+static void drain_rcu_callback(struct rcu_head *node)
+{
+    struct rcu_drain *event = (struct rcu_drain *)node;
+    qemu_event_set(&event->drain_complete_event);
+}
+
+/*
+ * This function ensures that all pending RCU callbacks
+ * on the current thread are done executing
+
+ * drops big qemu lock during the wait to allow RCU thread
+ * to process the callbacks
+ *
+ */
+
+void drain_call_rcu(void)
+{
+    struct rcu_drain rcu_drain;
+    bool locked = qemu_mutex_iothread_locked();
+
+    memset(&rcu_drain, 0, sizeof(struct rcu_drain));
+    qemu_event_init(&rcu_drain.drain_complete_event, false);
+
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
+
+
+    /*
+     * RCU callbacks are invoked in the same order as in which they
+     * are registered, thus we can be sure that when 'drain_rcu_callback'
+     * is called, all RCU callbacks that were registered on this thread
+     * prior to calling this function are completed.
+     *
+     * Note that since we have only one global queue of the RCU callbacks,
+     * we also end up waiting for most of RCU callbacks that were registered
+     * on the other threads, but this is a side effect that shoudn't be
+     * assumed.
+     */
+
+    call_rcu1(&rcu_drain.rcu, drain_rcu_callback);
+    qemu_event_wait(&rcu_drain.drain_complete_event);
+
+    if (locked) {
+        qemu_mutex_lock_iothread();
+    }
+
+}
+
 void rcu_register_thread(void)
 {
     assert(rcu_reader.ctr == 0);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/7] device-core: use RCU for list of childs of a bus
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 1/7] scsi/scsi_bus: switch search direction in scsi_device_find Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 2/7] Implement drain_call_rcu and use it in hmp_device_del Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-08-05 10:32   ` Stefan Hajnoczi
  2020-07-15 15:01 ` [PATCH v3 4/7] device-core: use atomic_set on .realized property Maxim Levitsky
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Maxim Levitsky,
	Paolo Bonzini

This fixes the race between device emulation code that tries to find
a child device to dispatch the request to (e.g a scsi disk),
and hotplug of a new device to that bus.

Note that this doesn't convert all the readers of the list
but only these that might go over that list without BQL held.

This is a very small first step to make this code thread safe.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/core/bus.c          | 28 ++++++++++++++++++----------
 hw/core/qdev.c         | 37 +++++++++++++++++++++++--------------
 hw/scsi/scsi-bus.c     | 17 ++++++++++++++---
 hw/scsi/virtio-scsi.c  |  6 +++++-
 include/hw/qdev-core.h |  9 +++++++++
 5 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index 6b987b6946..385eb3ad5a 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -49,12 +49,14 @@ int qbus_walk_children(BusState *bus,
         }
     }
 
-    QTAILQ_FOREACH(kid, &bus->children, sibling) {
-        err = qdev_walk_children(kid->child,
-                                 pre_devfn, pre_busfn,
-                                 post_devfn, post_busfn, opaque);
-        if (err < 0) {
-            return err;
+    WITH_RCU_READ_LOCK_GUARD() {
+        QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+            err = qdev_walk_children(kid->child,
+                                     pre_devfn, pre_busfn,
+                                     post_devfn, post_busfn, opaque);
+            if (err < 0) {
+                return err;
+            }
         }
     }
 
@@ -90,9 +92,13 @@ static void bus_reset_child_foreach(Object *obj, ResettableChildCallback cb,
     BusState *bus = BUS(obj);
     BusChild *kid;
 
-    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+    rcu_read_lock();
+
+    QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
         cb(OBJECT(kid->child), opaque, type);
     }
+
+    rcu_read_unlock();
 }
 
 static void qbus_init(BusState *bus, DeviceState *parent, const char *name)
@@ -194,9 +200,11 @@ static void bus_set_realized(Object *obj, bool value, Error **errp)
 
         /* TODO: recursive realization */
     } else if (!value && bus->realized) {
-        QTAILQ_FOREACH(kid, &bus->children, sibling) {
-            DeviceState *dev = kid->child;
-            qdev_unrealize(dev);
+        WITH_RCU_READ_LOCK_GUARD() {
+            QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+                DeviceState *dev = kid->child;
+                qdev_unrealize(dev);
+            }
         }
         if (bc->unrealize) {
             bc->unrealize(bus);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 01796823b4..ee69b11d11 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -51,6 +51,12 @@ const VMStateDescription *qdev_get_vmsd(DeviceState *dev)
     return dc->vmsd;
 }
 
+static void bus_free_bus_child(BusChild *kid)
+{
+    object_unref(OBJECT(kid->child));
+    g_free(kid);
+}
+
 static void bus_remove_child(BusState *bus, DeviceState *child)
 {
     BusChild *kid;
@@ -60,15 +66,16 @@ static void bus_remove_child(BusState *bus, DeviceState *child)
             char name[32];
 
             snprintf(name, sizeof(name), "child[%d]", kid->index);
-            QTAILQ_REMOVE(&bus->children, kid, sibling);
+            QTAILQ_REMOVE_RCU(&bus->children, kid, sibling);
 
             bus->num_children--;
 
             /* This gives back ownership of kid->child back to us.  */
             object_property_del(OBJECT(bus), name);
-            object_unref(OBJECT(kid->child));
-            g_free(kid);
-            return;
+
+            /* free the bus kid, when it is safe to do so*/
+            call_rcu(kid, bus_free_bus_child, rcu);
+            break;
         }
     }
 }
@@ -83,7 +90,7 @@ static void bus_add_child(BusState *bus, DeviceState *child)
     kid->child = child;
     object_ref(OBJECT(kid->child));
 
-    QTAILQ_INSERT_HEAD(&bus->children, kid, sibling);
+    QTAILQ_INSERT_HEAD_RCU(&bus->children, kid, sibling);
 
     /* This transfers ownership of kid->child to the property.  */
     snprintf(name, sizeof(name), "child[%d]", kid->index);
@@ -689,17 +696,19 @@ DeviceState *qdev_find_recursive(BusState *bus, const char *id)
     DeviceState *ret;
     BusState *child;
 
-    QTAILQ_FOREACH(kid, &bus->children, sibling) {
-        DeviceState *dev = kid->child;
+    WITH_RCU_READ_LOCK_GUARD() {
+        QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+            DeviceState *dev = kid->child;
 
-        if (dev->id && strcmp(dev->id, id) == 0) {
-            return dev;
-        }
+            if (dev->id && strcmp(dev->id, id) == 0) {
+                return dev;
+            }
 
-        QLIST_FOREACH(child, &dev->child_bus, sibling) {
-            ret = qdev_find_recursive(child, id);
-            if (ret) {
-                return ret;
+            QLIST_FOREACH(child, &dev->child_bus, sibling) {
+                ret = qdev_find_recursive(child, id);
+                if (ret) {
+                    return ret;
+                }
             }
         }
     }
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f8adfbc2a5..92d412b65c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -400,7 +400,10 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
     id = r->req.dev->id;
     found_lun0 = false;
     n = 0;
-    QTAILQ_FOREACH(kid, &r->req.bus->qbus.children, sibling) {
+
+    rcu_read_lock();
+
+    QTAILQ_FOREACH_RCU(kid, &r->req.bus->qbus.children, sibling) {
         DeviceState *qdev = kid->child;
         SCSIDevice *dev = SCSI_DEVICE(qdev);
 
@@ -421,7 +424,7 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
     memset(r->buf, 0, len);
     stl_be_p(&r->buf[0], n);
     i = found_lun0 ? 8 : 16;
-    QTAILQ_FOREACH(kid, &r->req.bus->qbus.children, sibling) {
+    QTAILQ_FOREACH_RCU(kid, &r->req.bus->qbus.children, sibling) {
         DeviceState *qdev = kid->child;
         SCSIDevice *dev = SCSI_DEVICE(qdev);
 
@@ -430,6 +433,9 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
             i += 8;
         }
     }
+
+    rcu_read_unlock();
+
     assert(i == n + 8);
     r->len = len;
     return true;
@@ -1572,12 +1578,15 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
     BusChild *kid;
     SCSIDevice *target_dev = NULL;
 
-    QTAILQ_FOREACH(kid, &bus->qbus.children, sibling) {
+    rcu_read_lock();
+
+    QTAILQ_FOREACH_RCU(kid, &bus->qbus.children, sibling) {
         DeviceState *qdev = kid->child;
         SCSIDevice *dev = SCSI_DEVICE(qdev);
 
         if (dev->channel == channel && dev->id == id) {
             if (dev->lun == lun) {
+                rcu_read_unlock();
                 return dev;
             }
 
@@ -1591,6 +1600,8 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
             }
         }
     }
+
+    rcu_read_unlock();
     return target_dev;
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index b49775269e..d3cfc8dfb6 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -367,12 +367,16 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
     case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET:
         target = req->req.tmf.lun[1];
         s->resetting++;
-        QTAILQ_FOREACH(kid, &s->bus.qbus.children, sibling) {
+
+        rcu_read_lock();
+        QTAILQ_FOREACH_RCU(kid, &s->bus.qbus.children, sibling) {
              d = SCSI_DEVICE(kid->child);
              if (d->channel == 0 && d->id == target) {
                 qdev_reset_all(&d->qdev);
              }
         }
+        rcu_read_unlock();
+
         s->resetting--;
         break;
 
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index fe78073c70..e252640c1a 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -3,6 +3,8 @@
 
 #include "qemu/queue.h"
 #include "qemu/bitmap.h"
+#include "qemu/rcu.h"
+#include "qemu/rcu_queue.h"
 #include "qom/object.h"
 #include "hw/hotplug.h"
 #include "hw/resettable.h"
@@ -230,6 +232,7 @@ struct BusClass {
 };
 
 typedef struct BusChild {
+    struct rcu_head rcu;
     DeviceState *child;
     int index;
     QTAILQ_ENTRY(BusChild) sibling;
@@ -250,6 +253,12 @@ struct BusState {
     int max_index;
     bool realized;
     int num_children;
+
+    /*
+     * children is a RCU QTAILQ, thus readers must use RCU to access it,
+     * and writers must hold the big qemu lock
+     */
+
     QTAILQ_HEAD(, BusChild) children;
     QLIST_ENTRY(BusState) sibling;
     ResettableState reset;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/7] device-core: use atomic_set on .realized property
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (2 preceding siblings ...)
  2020-07-15 15:01 ` [PATCH v3 3/7] device-core: use RCU for list of childs of a bus Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-08-05 10:33   ` Stefan Hajnoczi
  2020-07-15 15:01 ` [PATCH v3 5/7] virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized Maxim Levitsky
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Maxim Levitsky,
	Paolo Bonzini

Some code might race with placement of new devices on a bus.
We currently first place a (unrealized) device on the bus
and then realize it.

As a workaround, users that scan the child device list, can
check the realized property to see if it is safe to access such a device.
Use an atomic write here too to aid with this.

A separate discussion is what to do with devices that are unrealized:
It looks like for this case we only call the hotplug handler's unplug
callback and its up to it to unrealize the device.
An atomic operation doesn't cause harm for this code path though.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/core/qdev.c         | 19 ++++++++++++++++++-
 include/hw/qdev-core.h |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index ee69b11d11..2ec4971aec 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -966,7 +966,25 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
             }
        }
 
+       atomic_store_release(&dev->realized, value);
+
     } else if (!value && dev->realized) {
+
+        /*
+         * Change the value so that any concurrent users are aware
+         * that the device is going to be unrealized
+         *
+         * TODO: change .realized property to enum that states
+         * each phase of the device realization/unrealization
+         */
+
+        atomic_set(&dev->realized, value);
+        /*
+         * execute full memory barrier to ensure that concurrent users
+         * see this update prior to any other changes to the device
+         */
+        smp_mb();
+
         QLIST_FOREACH(bus, &dev->child_bus, sibling) {
             qbus_unrealize(bus);
         }
@@ -981,7 +999,6 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
     }
 
     assert(local_err == NULL);
-    dev->realized = value;
     return;
 
 child_realize_fail:
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index e252640c1a..5faa715449 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -165,6 +165,8 @@ struct NamedClockList {
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
+ *            When accessed outsize big qemu lock, must be accessed with
+ *            atomic_load_acquire()
  * @reset: ResettableState for the device; handled by Resettable interface.
  *
  * This structure should not be accessed directly.  We declare it here
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/7] virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (3 preceding siblings ...)
  2020-07-15 15:01 ` [PATCH v3 4/7] device-core: use atomic_set on .realized property Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-07-15 15:01 ` [PATCH v3 6/7] scsi: Add scsi_device_get Maxim Levitsky
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Maxim Levitsky,
	Stefan Hajnoczi, Paolo Bonzini

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1812399

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/virtio-scsi.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index d3cfc8dfb6..18952e0f1c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -35,13 +35,30 @@ static inline int virtio_scsi_get_lun(uint8_t *lun)
 
 static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
 {
+    SCSIDevice *device = NULL;
+
     if (lun[0] != 1) {
         return NULL;
     }
     if (lun[2] != 0 && !(lun[2] >= 0x40 && lun[2] < 0x80)) {
         return NULL;
     }
-    return scsi_device_find(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
+
+    device = scsi_device_find(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
+
+    /*
+     * This function might run on the IO thread and we might race against
+     * main thread hot-plugging the device.
+     *
+     * We assume that as soon as .realized is set to true we can let
+     * the user access the device.
+     */
+
+    if (!device || !atomic_load_acquire(&device->qdev.realized)) {
+        return NULL;
+    }
+
+    return device;
 }
 
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/7] scsi: Add scsi_device_get
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (4 preceding siblings ...)
  2020-07-15 15:01 ` [PATCH v3 5/7] virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-08-05 10:45   ` Stefan Hajnoczi
  2020-07-15 15:01 ` [PATCH v3 7/7] virtio-scsi: use scsi_device_get Maxim Levitsky
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Stefan Hajnoczi,
	Maxim Levitsky, Paolo Bonzini

Add scsi_device_get which finds the scsi device
and takes a reference to it.

Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 hw/scsi/scsi-bus.c     | 31 ++++++++++++++++++++++++-------
 include/hw/scsi/scsi.h |  2 ++
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 92d412b65c..677074eaa6 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1572,8 +1572,13 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev)
     return g_strdup_printf("channel@%x/%s@%x,%x", d->channel,
                            qdev_fw_name(dev), d->id, d->lun);
 }
-
-SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
+/*
+ * Finds a matching channel/id/lun device on scsi bus, and
+ * takes a reference to it and returns it.
+ * If we don't find exact match (channel/bus/lun),
+ * we will return the first device which matches channel/bus
+ */
+SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int id, int lun)
 {
     BusChild *kid;
     SCSIDevice *target_dev = NULL;
@@ -1586,25 +1591,37 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
 
         if (dev->channel == channel && dev->id == id) {
             if (dev->lun == lun) {
+                object_ref(OBJECT(dev));
                 rcu_read_unlock();
                 return dev;
             }
 
-            /*
-             * If we don't find exact match (channel/bus/lun),
-             * we will return the first device which matches channel/bus
-             */
-
             if (!target_dev) {
                 target_dev = dev;
             }
         }
     }
 
+    object_ref(OBJECT(target_dev));
     rcu_read_unlock();
     return target_dev;
 }
 
+/*
+ * This function works like scsi_device_get but doesn't take a reference
+ * to the returned object.
+ * Devices that run under the QEMU global mutex can use this function.
+ * Devices that run outside the QEMU global mutex must use
+ * scsi_device_get() instead.
+ */
+SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
+{
+    SCSIDevice *dev = scsi_device_get(bus, channel, id, lun);
+
+    object_unref(OBJECT(dev));
+    return dev;
+}
+
 /* SCSI request list.  For simplicity, pv points to the whole device */
 
 static int put_scsi_requests(QEMUFile *f, void *pv, size_t size,
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 2fc23e44ba..6e11ece641 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -193,7 +193,9 @@ void scsi_generic_read_device_inquiry(SCSIDevice *dev);
 int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, int len, bool fixed);
 int scsi_SG_IO_FROM_DEV(BlockBackend *blk, uint8_t *cmd, uint8_t cmd_size,
                         uint8_t *buf, uint8_t buf_size);
+
 SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun);
+SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int target, int lun);
 
 /* scsi-generic.c. */
 extern const SCSIReqOps scsi_generic_req_ops;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 7/7] virtio-scsi: use scsi_device_get
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (5 preceding siblings ...)
  2020-07-15 15:01 ` [PATCH v3 6/7] scsi: Add scsi_device_get Maxim Levitsky
@ 2020-07-15 15:01 ` Maxim Levitsky
  2020-07-29 19:56 ` [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-15 15:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Stefan Hajnoczi,
	Maxim Levitsky, Stefan Hajnoczi, Paolo Bonzini

This will help us to avoid the scsi device disappearing
after we took a reference to it.

It doesn't by itself forbid case when we try to access
an unrealized device

Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/virtio-scsi.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 18952e0f1c..a4870b155d 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -33,7 +33,7 @@ static inline int virtio_scsi_get_lun(uint8_t *lun)
     return ((lun[2] << 8) | lun[3]) & 0x3FFF;
 }
 
-static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
+static inline SCSIDevice *virtio_scsi_device_get(VirtIOSCSI *s, uint8_t *lun)
 {
     SCSIDevice *device = NULL;
 
@@ -44,7 +44,7 @@ static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
         return NULL;
     }
 
-    device = scsi_device_find(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
+    device = scsi_device_get(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
 
     /*
      * This function might run on the IO thread and we might race against
@@ -55,12 +55,15 @@ static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
      */
 
     if (!device || !atomic_load_acquire(&device->qdev.realized)) {
+        object_unref(OBJECT(device));
         return NULL;
     }
 
     return device;
 }
 
+
+
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
@@ -273,7 +276,7 @@ static inline void virtio_scsi_ctx_check(VirtIOSCSI *s, SCSIDevice *d)
  *  case of async cancellation. */
 static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
-    SCSIDevice *d = virtio_scsi_device_find(s, req->req.tmf.lun);
+    SCSIDevice *d = virtio_scsi_device_get(s, req->req.tmf.lun);
     SCSIRequest *r, *next;
     BusChild *kid;
     int target;
@@ -387,10 +390,10 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 
         rcu_read_lock();
         QTAILQ_FOREACH_RCU(kid, &s->bus.qbus.children, sibling) {
-             d = SCSI_DEVICE(kid->child);
-             if (d->channel == 0 && d->id == target) {
-                qdev_reset_all(&d->qdev);
-             }
+            SCSIDevice *d1 = SCSI_DEVICE(kid->child);
+            if (d1->channel == 0 && d1->id == target) {
+                qdev_reset_all(&d1->qdev);
+            }
         }
         rcu_read_unlock();
 
@@ -403,14 +406,17 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
         break;
     }
 
+    object_unref(OBJECT(d));
     return ret;
 
 incorrect_lun:
     req->resp.tmf.response = VIRTIO_SCSI_S_INCORRECT_LUN;
+    object_unref(OBJECT(d));
     return ret;
 
 fail:
     req->resp.tmf.response = VIRTIO_SCSI_S_BAD_TARGET;
+    object_unref(OBJECT(d));
     return ret;
 }
 
@@ -581,7 +587,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
         }
     }
 
-    d = virtio_scsi_device_find(s, req->req.cmd.lun);
+    d = virtio_scsi_device_get(s, req->req.cmd.lun);
     if (!d) {
         req->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
         virtio_scsi_complete_cmd_req(req);
@@ -597,10 +603,12 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
             req->sreq->cmd.xfer > req->qsgl.size)) {
         req->resp.cmd.response = VIRTIO_SCSI_S_OVERRUN;
         virtio_scsi_complete_cmd_req(req);
+        object_unref(OBJECT(d));
         return -ENOBUFS;
     }
     scsi_req_ref(req->sreq);
     blk_io_plug(d->conf.blk);
+    object_unref(OBJECT(d));
     return 0;
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (6 preceding siblings ...)
  2020-07-15 15:01 ` [PATCH v3 7/7] virtio-scsi: use scsi_device_get Maxim Levitsky
@ 2020-07-29 19:56 ` Maxim Levitsky
  2020-08-05 10:48 ` Stefan Hajnoczi
  2020-08-13 12:26 ` Maxim Levitsky
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-07-29 19:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin

On Wed, 2020-07-15 at 18:01 +0300, Maxim Levitsky wrote:
> Hi!
> 
> This is a patch series that is a result of my discussion with Paulo on
> how to correctly fix the root cause of the BZ #1812399.
> 
> The root cause of this bug is the fact that IO thread is running mostly
> unlocked versus main thread on which device hotplug is done.
> 
> qdev_device_add first creates the device object, then places it on the bus,
> and only then realizes it.
> 
> However some drivers and currently only virtio-scsi enumerate its child bus
> devices on each request that is received from the guest and that can happen on the IO
> thread.
> 
> Thus we have a window when new device is on the bus but not realized and can be accessed
> by the virtio-scsi driver in that state.
> 
> Fix that by doing two things:
> 
> 1. Add partial RCU protection to the list of a bus's child devices.
> This allows the scsi IO thread to safely enumerate the child devices
> while it races with the hotplug placing the device on the bus.
> 
> 2. Make the virtio-scsi driver check .realized property of the scsi device
> and avoid touching the device if it isn't
> 
> Note that in the particular bug report the issue wasn't a race but rather due
> to combination of things, the .realize code in the middle managed to trigger IO on the virtqueue
> which caused the virtio-scsi driver to access the half realized device. However
> since this can happen as well with real IO thread, this patch series was done,
> which fixes this as well.
> 
> Changes from V1:
>   * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the failing unit test,
>     make check pass now
> 
>   * Patches 6,7 are new as well: I added scsi_device_get as suggested by Stefan as well, although
>     this is more a refactoring that anything else as it doesn't solve
>     an existing race.
> 
>   * Addressed most of the review feedback from V1
>     - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK
> 
> Changes from V2:
> 
>   * No longer RFC
>   * Addressed most of the feedback from Stefan
>   * Fixed reference count leak in patch 7 when device is about to be unrealized
>   * Better testing
> 
> This series was tested by adding a virtio-scsi drive with iothread,
> then running fio stress job in the guest in a loop, and then adding/removing
> the scsi drive on the host in the loop.
> This test was failing usually on 1st iteration withouth this patch series,
> and now it seems to work smoothly.
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (7):
>   scsi/scsi_bus: switch search direction in scsi_device_find
>   Implement drain_call_rcu and use it in hmp_device_del
>   device-core: use RCU for list of childs of a bus
>   device-core: use atomic_set on .realized property
>   virtio-scsi: don't touch scsi devices that are not yet realized or
>     about to be un-realized
>   scsi: Add scsi_device_get
>   virtio-scsi: use scsi_device_get
> 
>  hw/core/bus.c          | 28 +++++++++++++--------
>  hw/core/qdev.c         | 56 +++++++++++++++++++++++++++++++-----------
>  hw/scsi/scsi-bus.c     | 48 +++++++++++++++++++++++++++++++-----
>  hw/scsi/virtio-scsi.c  | 47 ++++++++++++++++++++++++++++-------
>  include/hw/qdev-core.h | 11 +++++++++
>  include/hw/scsi/scsi.h |  2 ++
>  include/qemu/rcu.h     |  1 +
>  qdev-monitor.c         | 22 +++++++++++++++++
>  util/rcu.c             | 55 +++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 230 insertions(+), 40 deletions(-)
> 
> -- 
> 2.26.2
> 
Very gentle ping about this patch series.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/7] device-core: use RCU for list of childs of a bus
  2020-07-15 15:01 ` [PATCH v3 3/7] device-core: use RCU for list of childs of a bus Maxim Levitsky
@ 2020-08-05 10:32   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 10:32 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

On Wed, Jul 15, 2020 at 06:01:55PM +0300, Maxim Levitsky wrote:
> This fixes the race between device emulation code that tries to find
> a child device to dispatch the request to (e.g a scsi disk),
> and hotplug of a new device to that bus.
> 
> Note that this doesn't convert all the readers of the list
> but only these that might go over that list without BQL held.
> 
> This is a very small first step to make this code thread safe.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/core/bus.c          | 28 ++++++++++++++++++----------
>  hw/core/qdev.c         | 37 +++++++++++++++++++++++--------------
>  hw/scsi/scsi-bus.c     | 17 ++++++++++++++---
>  hw/scsi/virtio-scsi.c  |  6 +++++-
>  include/hw/qdev-core.h |  9 +++++++++
>  5 files changed, 69 insertions(+), 28 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/7] device-core: use atomic_set on .realized property
  2020-07-15 15:01 ` [PATCH v3 4/7] device-core: use atomic_set on .realized property Maxim Levitsky
@ 2020-08-05 10:33   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 10:33 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

On Wed, Jul 15, 2020 at 06:01:56PM +0300, Maxim Levitsky wrote:
> Some code might race with placement of new devices on a bus.
> We currently first place a (unrealized) device on the bus
> and then realize it.
> 
> As a workaround, users that scan the child device list, can
> check the realized property to see if it is safe to access such a device.
> Use an atomic write here too to aid with this.
> 
> A separate discussion is what to do with devices that are unrealized:
> It looks like for this case we only call the hotplug handler's unplug
> callback and its up to it to unrealize the device.
> An atomic operation doesn't cause harm for this code path though.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/core/qdev.c         | 19 ++++++++++++++++++-
>  include/hw/qdev-core.h |  2 ++
>  2 files changed, 20 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 6/7] scsi: Add scsi_device_get
  2020-07-15 15:01 ` [PATCH v3 6/7] scsi: Add scsi_device_get Maxim Levitsky
@ 2020-08-05 10:45   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 10:45 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, Stefan Hajnoczi, qemu-devel,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

On Wed, Jul 15, 2020 at 06:01:58PM +0300, Maxim Levitsky wrote:
> Add scsi_device_get which finds the scsi device
> and takes a reference to it.
> 
> Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  hw/scsi/scsi-bus.c     | 31 ++++++++++++++++++++++++-------
>  include/hw/scsi/scsi.h |  2 ++
>  2 files changed, 26 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (7 preceding siblings ...)
  2020-07-29 19:56 ` [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
@ 2020-08-05 10:48 ` Stefan Hajnoczi
  2020-08-13 12:26 ` Maxim Levitsky
  9 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 10:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Maxim Levitsky

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]

On Wed, Jul 15, 2020 at 06:01:52PM +0300, Maxim Levitsky wrote:
> Hi!
> 
> This is a patch series that is a result of my discussion with Paulo on
> how to correctly fix the root cause of the BZ #1812399.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread
  2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
                   ` (8 preceding siblings ...)
  2020-08-05 10:48 ` Stefan Hajnoczi
@ 2020-08-13 12:26 ` Maxim Levitsky
  9 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2020-08-13 12:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Michael S. Tsirkin

On Wed, 2020-07-15 at 18:01 +0300, Maxim Levitsky wrote:
> Hi!
> 
> This is a patch series that is a result of my discussion with Paulo on
> how to correctly fix the root cause of the BZ #1812399.
> 
> The root cause of this bug is the fact that IO thread is running mostly
> unlocked versus main thread on which device hotplug is done.
> 
> qdev_device_add first creates the device object, then places it on the bus,
> and only then realizes it.
> 
> However some drivers and currently only virtio-scsi enumerate its child bus
> devices on each request that is received from the guest and that can happen on the IO
> thread.
> 
> Thus we have a window when new device is on the bus but not realized and can be accessed
> by the virtio-scsi driver in that state.
> 
> Fix that by doing two things:
> 
> 1. Add partial RCU protection to the list of a bus's child devices.
> This allows the scsi IO thread to safely enumerate the child devices
> while it races with the hotplug placing the device on the bus.
> 
> 2. Make the virtio-scsi driver check .realized property of the scsi device
> and avoid touching the device if it isn't
> 
> Note that in the particular bug report the issue wasn't a race but rather due
> to combination of things, the .realize code in the middle managed to trigger IO on the virtqueue
> which caused the virtio-scsi driver to access the half realized device. However
> since this can happen as well with real IO thread, this patch series was done,
> which fixes this as well.
> 
> Changes from V1:
>   * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the failing unit test,
>     make check pass now
> 
>   * Patches 6,7 are new as well: I added scsi_device_get as suggested by Stefan as well, although
>     this is more a refactoring that anything else as it doesn't solve
>     an existing race.
> 
>   * Addressed most of the review feedback from V1
>     - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK
> 
> Changes from V2:
> 
>   * No longer RFC
>   * Addressed most of the feedback from Stefan
>   * Fixed reference count leak in patch 7 when device is about to be unrealized
>   * Better testing
> 
> This series was tested by adding a virtio-scsi drive with iothread,
> then running fio stress job in the guest in a loop, and then adding/removing
> the scsi drive on the host in the loop.
> This test was failing usually on 1st iteration withouth this patch series,
> and now it seems to work smoothly.
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (7):
>   scsi/scsi_bus: switch search direction in scsi_device_find
>   Implement drain_call_rcu and use it in hmp_device_del
>   device-core: use RCU for list of childs of a bus
>   device-core: use atomic_set on .realized property
>   virtio-scsi: don't touch scsi devices that are not yet realized or
>     about to be un-realized
>   scsi: Add scsi_device_get
>   virtio-scsi: use scsi_device_get
> 
>  hw/core/bus.c          | 28 +++++++++++++--------
>  hw/core/qdev.c         | 56 +++++++++++++++++++++++++++++++-----------
>  hw/scsi/scsi-bus.c     | 48 +++++++++++++++++++++++++++++++-----
>  hw/scsi/virtio-scsi.c  | 47 ++++++++++++++++++++++++++++-------
>  include/hw/qdev-core.h | 11 +++++++++
>  include/hw/scsi/scsi.h |  2 ++
>  include/qemu/rcu.h     |  1 +
>  qdev-monitor.c         | 22 +++++++++++++++++
>  util/rcu.c             | 55 +++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 230 insertions(+), 40 deletions(-)
> 
> -- 
> 2.26.2
> 

Seems that this patch series doesn't have any disagreeements, so
anybody wants to put it on an maintainer's tree, now that the freeze is over?

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-13 12:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 15:01 [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
2020-07-15 15:01 ` [PATCH v3 1/7] scsi/scsi_bus: switch search direction in scsi_device_find Maxim Levitsky
2020-07-15 15:01 ` [PATCH v3 2/7] Implement drain_call_rcu and use it in hmp_device_del Maxim Levitsky
2020-07-15 15:01 ` [PATCH v3 3/7] device-core: use RCU for list of childs of a bus Maxim Levitsky
2020-08-05 10:32   ` Stefan Hajnoczi
2020-07-15 15:01 ` [PATCH v3 4/7] device-core: use atomic_set on .realized property Maxim Levitsky
2020-08-05 10:33   ` Stefan Hajnoczi
2020-07-15 15:01 ` [PATCH v3 5/7] virtio-scsi: don't touch scsi devices that are not yet realized or about to be un-realized Maxim Levitsky
2020-07-15 15:01 ` [PATCH v3 6/7] scsi: Add scsi_device_get Maxim Levitsky
2020-08-05 10:45   ` Stefan Hajnoczi
2020-07-15 15:01 ` [PATCH v3 7/7] virtio-scsi: use scsi_device_get Maxim Levitsky
2020-07-29 19:56 ` [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread Maxim Levitsky
2020-08-05 10:48 ` Stefan Hajnoczi
2020-08-13 12:26 ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).