All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] memory: prevent dma-reentracy issues
@ 2023-01-19  7:03 Alexander Bulekov
  2023-01-19  7:03 ` [PATCH v4 1/3] " Alexander Bulekov
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible

Unlike v3, these changes should address issues in devices that bypass
DMA apis and directly call into address_space.
e.g. https://gitlab.com/qemu-project/qemu/-/issues/827

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
    optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
    attempting to guess DeviceStates in BHs.


Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  2 ++
 hw/9pfs/xen-9p-backend.c          |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c    |  5 +++--
 hw/block/virtio-blk.c             |  5 +++--
 hw/char/virtio-serial-bus.c       |  3 ++-
 hw/display/qxl.c                  |  9 ++++++---
 hw/display/virtio-gpu.c           |  6 ++++--
 hw/ide/ahci.c                     |  3 ++-
 hw/ide/core.c                     |  3 ++-
 hw/misc/imx_rngc.c                |  6 ++++--
 hw/misc/macio/mac_dbdma.c         |  2 +-
 hw/net/virtio-net.c               |  3 ++-
 hw/nvme/ctrl.c                    |  6 ++++--
 hw/scsi/mptsas.c                  |  3 ++-
 hw/scsi/scsi-bus.c                |  3 ++-
 hw/scsi/vmw_pvscsi.c              |  3 ++-
 hw/usb/dev-uas.c                  |  3 ++-
 hw/usb/hcd-dwc2.c                 |  3 ++-
 hw/usb/hcd-ehci.c                 |  3 ++-
 hw/usb/hcd-uhci.c                 |  2 +-
 hw/usb/host-libusb.c              |  6 ++++--
 hw/usb/redirect.c                 |  6 ++++--
 hw/usb/xen-usb.c                  |  3 ++-
 hw/virtio/virtio-balloon.c        |  5 +++--
 hw/virtio/virtio-crypto.c         |  3 ++-
 include/block/aio.h               | 18 ++++++++++++++++--
 include/hw/qdev-core.h            |  7 +++++++
 include/qemu/main-loop.h          |  7 +++++--
 softmmu/memory.c                  | 15 +++++++++++++++
 softmmu/trace-events              |  1 +
 tests/unit/ptimer-test-stubs.c    |  3 ++-
 util/async.c                      | 12 +++++++++++-
 util/main-loop.c                  |  5 +++--
 34 files changed, 128 insertions(+), 43 deletions(-)

-- 
2.39.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/3] memory: prevent dma-reentracy issues
  2023-01-19  7:03 [PATCH v4 0/3] memory: prevent dma-reentracy issues Alexander Bulekov
@ 2023-01-19  7:03 ` Alexander Bulekov
  2023-01-25 21:12   ` Stefan Hajnoczi
  2023-01-19  7:03 ` [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API Alexander Bulekov
  2023-01-19  7:03 ` [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded Alexander Bulekov
  2 siblings, 1 reply; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  7 +++++++
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 23 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 35fddb19a6..8858195262 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
     QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+    bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@ struct DeviceState {
     int alias_required_for_version;
     ResettableState reset;
     GSList *unplug_blockers;
+
+    /* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+    MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e05332d07f..90ffaaa4f5 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -542,6 +543,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->mem_reentrancy_guard.engaged_in_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->mem_reentrancy_guard.engaged_in_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +568,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->mem_reentrancy_guard.engaged_in_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 22606dc27b..62d04ea9a7 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API
  2023-01-19  7:03 [PATCH v4 0/3] memory: prevent dma-reentracy issues Alexander Bulekov
  2023-01-19  7:03 ` [PATCH v4 1/3] " Alexander Bulekov
@ 2023-01-19  7:03 ` Alexander Bulekov
  2023-01-25 21:24   ` Stefan Hajnoczi
  2023-01-19  7:03 ` [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded Alexander Bulekov
  2 siblings, 1 reply; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Fam Zheng, Kevin Wolf,
	Hanna Reitz, open list:Block I/O path

Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 docs/devel/multiple-iothreads.txt |  2 ++
 include/block/aio.h               | 18 ++++++++++++++++--
 include/qemu/main-loop.h          |  7 +++++--
 tests/unit/ptimer-test-stubs.c    |  3 ++-
 util/async.c                      | 12 +++++++++++-
 util/main-loop.c                  |  5 +++--
 6 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
index 343120f2ef..e4fafed9d9 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  * LEGACY timer_new_ms() - create a timer
  * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
  * LEGACY qemu_aio_wait() - run an event loop iteration
 
 Since they implicitly work on the main loop they cannot be used in code that
@@ -72,6 +73,7 @@ Instead, use the AioContext functions directly (see include/block/aio.h):
  * aio_set_event_notifier() - monitor an event notifier
  * aio_timer_new() - create a timer
  * aio_bh_new() - create a BH
+ * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
  * aio_poll() - run an event loop iteration
 
 The AioContext can be obtained from the IOThread using
diff --git a/include/block/aio.h b/include/block/aio.h
index 0f65a3cc9e..94d661ff7e 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -23,6 +23,8 @@
 #include "qemu/thread.h"
 #include "qemu/timer.h"
 #include "block/graph-lock.h"
+#include "hw/qdev-core.h"
+
 
 typedef struct BlockAIOCB BlockAIOCB;
 typedef void BlockCompletionFunc(void *opaque, int ret);
@@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
  * is opaque and must be allocated prior to its use.
  *
  * @name: A human-readable identifier for debugging purposes.
+ * @reentrancy_guard: A guard set when entering a cb to prevent
+ * device-reentrancy issues
  */
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-                        const char *name);
+                        const char *name, MemReentrancyGuard *reentrancy_guard);
 
 /**
  * aio_bh_new: Allocate a new bottom half structure
@@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
  * string.
  */
 #define aio_bh_new(ctx, cb, opaque) \
-    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
+
+/**
+ * aio_bh_new_guarded: Allocate a new bottom half structure with a
+ * reentrancy_guard
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index c25f390696..84d1ce57f0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
+#define qemu_bh_new_guarded(cb, opaque, guard) \
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
 #define qemu_bh_new(cb, opaque) \
-    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+                         MemReentrancyGuard *reentrancy_guard);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index f5e75a96b6..24d5413f9d 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
     return deadline;
 }
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+                         MemReentrancyGuard *reentrancy_guard)
 {
     QEMUBH *bh = g_new(QEMUBH, 1);
 
diff --git a/util/async.c b/util/async.c
index 14d63b3091..08924c3212 100644
--- a/util/async.c
+++ b/util/async.c
@@ -65,6 +65,7 @@ struct QEMUBH {
     void *opaque;
     QSLIST_ENTRY(QEMUBH) next;
     unsigned flags;
+    MemReentrancyGuard *reentrancy_guard;
 };
 
 /* Called concurrently from any thread */
@@ -133,7 +134,7 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
 }
 
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-                        const char *name)
+                        const char *name, MemReentrancyGuard *reentrancy_guard)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -142,13 +143,22 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
         .cb = cb,
         .opaque = opaque,
         .name = name,
+        .reentrancy_guard = reentrancy_guard,
     };
     return bh;
 }
 
 void aio_bh_call(QEMUBH *bh)
 {
+    if (bh->reentrancy_guard) {
+        bh->reentrancy_guard->engaged_in_io = true;
+    }
+
     bh->cb(bh->opaque);
+
+    if (bh->reentrancy_guard) {
+        bh->reentrancy_guard->engaged_in_io = false;
+    }
 }
 
 /* Multiple occurrences of aio_bh_poll cannot be called concurrently. */
diff --git a/util/main-loop.c b/util/main-loop.c
index 58f776a8c9..07d2e2040a 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -617,9 +617,10 @@ void main_loop_wait(int nonblocking)
 
 /* Functions to operate on the main QEMU AioContext.  */
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name, MemReentrancyGuard *reentrancy_guard)
 {
-    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
+    return aio_bh_new_full(qemu_aio_context, cb, opaque, name,
+                           reentrancy_guard);
 }
 
 /*
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
  2023-01-19  7:03 [PATCH v4 0/3] memory: prevent dma-reentracy issues Alexander Bulekov
  2023-01-19  7:03 ` [PATCH v4 1/3] " Alexander Bulekov
  2023-01-19  7:03 ` [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API Alexander Bulekov
@ 2023-01-19  7:03 ` Alexander Bulekov
  2023-01-25 22:19   ` Stefan Hajnoczi
  2023-01-25 22:19   ` Stefan Hajnoczi
  2 siblings, 2 replies; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Kevin Wolf, Hanna Reitz, Amit Shah,
	Marc-André Lureau, John Snow, Peter Maydell,
	Mark Cave-Ayland, Keith Busch, Klaus Jensen, Fam Zheng,
	Dmitry Fleytman, Gonglei (Arei),
	open list:X86 Xen CPUs, open list:virtio-blk,
	open list:i.MX31 (kzm), open list:Old World (g3beige)

This protects devices from bh->mmio reentrancy issues.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 hw/9pfs/xen-9p-backend.c        | 4 +++-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/block/virtio-blk.c           | 5 +++--
 hw/char/virtio-serial-bus.c     | 3 ++-
 hw/display/qxl.c                | 9 ++++++---
 hw/display/virtio-gpu.c         | 6 ++++--
 hw/ide/ahci.c                   | 3 ++-
 hw/ide/core.c                   | 3 ++-
 hw/misc/imx_rngc.c              | 6 ++++--
 hw/misc/macio/mac_dbdma.c       | 2 +-
 hw/net/virtio-net.c             | 3 ++-
 hw/nvme/ctrl.c                  | 6 ++++--
 hw/scsi/mptsas.c                | 3 ++-
 hw/scsi/scsi-bus.c              | 3 ++-
 hw/scsi/vmw_pvscsi.c            | 3 ++-
 hw/usb/dev-uas.c                | 3 ++-
 hw/usb/hcd-dwc2.c               | 3 ++-
 hw/usb/hcd-ehci.c               | 3 ++-
 hw/usb/hcd-uhci.c               | 2 +-
 hw/usb/host-libusb.c            | 6 ++++--
 hw/usb/redirect.c               | 6 ++++--
 hw/usb/xen-usb.c                | 3 ++-
 hw/virtio/virtio-balloon.c      | 5 +++--
 hw/virtio/virtio-crypto.c       | 3 ++-
 25 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 65c4979c3c..f077c1b255 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -441,7 +441,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
         xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
                                        XEN_FLEX_RING_SIZE(ring_order);
 
-        xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, &xen_9pdev->rings[i]);
+        xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+                                                     &xen_9pdev->rings[i],
+                                                     &DEVICE(xen_9pdev)->mem_reentrancy_guard);
         xen_9pdev->rings[i].out_cons = 0;
         xen_9pdev->rings[i].out_size = 0;
         xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 26f965cabc..191a8c90aa 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     } else {
         s->ctx = qemu_get_aio_context();
     }
-    s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+    s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+                               &DEVICE(s)->mem_reentrancy_guard);
     s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
     *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 2785b9e849..e31806b317 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -632,8 +632,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice *xendev,
     } else {
         dataplane->ctx = qemu_get_aio_context();
     }
-    dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-                               dataplane);
+    dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+                                       dataplane,
+                                       &DEVICE(xendev)->mem_reentrancy_guard);
 
     return dataplane;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f717550fdc..e9f516e633 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -866,8 +866,9 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool running,
      * requests will be processed while starting the data plane.
      */
     if (!s->bh && !virtio_bus_ioeventfd_enabled(bus)) {
-        s->bh = aio_bh_new(blk_get_aio_context(s->conf.conf.blk),
-                           virtio_blk_dma_restart_bh, s);
+        s->bh = aio_bh_new_guarded(blk_get_aio_context(s->conf.conf.blk),
+                                   virtio_blk_dma_restart_bh, s,
+                                   &DEVICE(s)->mem_reentrancy_guard);
         blk_inc_in_flight(s->conf.conf.blk);
         qemu_bh_schedule(s->bh);
     }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..dd619f0731 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    port->bh = qemu_bh_new(flush_queued_data_bh, port);
+    port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+                                   &dev->mem_reentrancy_guard);
     port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 6772849dec..67efa3c3ef 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2223,11 +2223,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error **errp)
 
     qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-    qxl->update_irq = qemu_bh_new(qxl_update_irq_bh, qxl);
+    qxl->update_irq = qemu_bh_new_guarded(qxl_update_irq_bh, qxl,
+                                          &DEVICE(qxl)->mem_reentrancy_guard);
     qxl_reset_state(qxl);
 
-    qxl->update_area_bh = qemu_bh_new(qxl_render_update_area_bh, qxl);
-    qxl->ssd.cursor_bh = qemu_bh_new(qemu_spice_cursor_refresh_bh, &qxl->ssd);
+    qxl->update_area_bh = qemu_bh_new_guarded(qxl_render_update_area_bh, qxl,
+                                              &DEVICE(qxl)->mem_reentrancy_guard);
+    qxl->ssd.cursor_bh = qemu_bh_new_guarded(qemu_spice_cursor_refresh_bh, &qxl->ssd,
+                                             &DEVICE(qxl)->mem_reentrancy_guard);
 }
 
 static void qxl_realize_primary(PCIDevice *dev, Error **errp)
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 5e15c79b94..66ac9b6cc5 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1339,8 +1339,10 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error **errp)
 
     g->ctrl_vq = virtio_get_queue(vdev, 0);
     g->cursor_vq = virtio_get_queue(vdev, 1);
-    g->ctrl_bh = qemu_bh_new(virtio_gpu_ctrl_bh, g);
-    g->cursor_bh = qemu_bh_new(virtio_gpu_cursor_bh, g);
+    g->ctrl_bh = qemu_bh_new_guarded(virtio_gpu_ctrl_bh, g,
+                                     &qdev->mem_reentrancy_guard);
+    g->cursor_bh = qemu_bh_new_guarded(virtio_gpu_cursor_bh, g,
+                                       &qdev->mem_reentrancy_guard);
     QTAILQ_INIT(&g->reslist);
     QTAILQ_INIT(&g->cmdq);
     QTAILQ_INIT(&g->fenceq);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 7ce001cacd..37091150cb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1508,7 +1508,8 @@ static void ahci_cmd_done(const IDEDMA *dma)
     ahci_write_fis_d2h(ad);
 
     if (ad->port_regs.cmd_issue && !ad->check_bh) {
-        ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
+        ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
+                                           &DEVICE(ad)->mem_reentrancy_guard);
         qemu_bh_schedule(ad->check_bh);
     }
 }
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 5d1039378f..8c8d1a8ec2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -519,7 +519,8 @@ BlockAIOCB *ide_issue_trim(
 
     iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
     iocb->s = s;
-    iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+    iocb->bh = qemu_bh_new_guarded(ide_trim_bh_cb, iocb,
+                                   &DEVICE(s)->mem_reentrancy_guard);
     iocb->ret = 0;
     iocb->qiov = qiov;
     iocb->i = -1;
diff --git a/hw/misc/imx_rngc.c b/hw/misc/imx_rngc.c
index 632c03779c..082c6980ad 100644
--- a/hw/misc/imx_rngc.c
+++ b/hw/misc/imx_rngc.c
@@ -228,8 +228,10 @@ static void imx_rngc_realize(DeviceState *dev, Error **errp)
     sysbus_init_mmio(sbd, &s->iomem);
 
     sysbus_init_irq(sbd, &s->irq);
-    s->self_test_bh = qemu_bh_new(imx_rngc_self_test, s);
-    s->seed_bh = qemu_bh_new(imx_rngc_seed, s);
+    s->self_test_bh = qemu_bh_new_guarded(imx_rngc_self_test, s,
+                                          &dev->mem_reentrancy_guard);
+    s->seed_bh = qemu_bh_new_guarded(imx_rngc_seed, s,
+                                     &dev->mem_reentrancy_guard);
 }
 
 static void imx_rngc_reset(DeviceState *dev)
diff --git a/hw/misc/macio/mac_dbdma.c b/hw/misc/macio/mac_dbdma.c
index efcc02609f..cc7e02203d 100644
--- a/hw/misc/macio/mac_dbdma.c
+++ b/hw/misc/macio/mac_dbdma.c
@@ -914,7 +914,7 @@ static void mac_dbdma_realize(DeviceState *dev, Error **errp)
 {
     DBDMAState *s = MAC_DBDMA(dev);
 
-    s->bh = qemu_bh_new(DBDMA_run_bh, s);
+    s->bh = qemu_bh_new_guarded(DBDMA_run_bh, s, &dev->mem_reentrancy_guard);
 }
 
 static void mac_dbdma_class_init(ObjectClass *oc, void *data)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3ae909041a..a170c724de 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2885,7 +2885,8 @@ static void virtio_net_add_queue(VirtIONet *n, int index)
         n->vqs[index].tx_vq =
             virtio_add_queue(vdev, n->net_conf.tx_queue_size,
                              virtio_net_handle_tx_bh);
-        n->vqs[index].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[index]);
+        n->vqs[index].tx_bh = qemu_bh_new_guarded(virtio_net_tx_bh, &n->vqs[index],
+                                                  &DEVICE(vdev)->mem_reentrancy_guard);
     }
 
     n->vqs[index].tx_waiting = 0;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f25cc2c235..dcb250e772 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4318,7 +4318,8 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
         QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
     }
 
-    sq->bh = qemu_bh_new(nvme_process_sq, sq);
+    sq->bh = qemu_bh_new_guarded(nvme_process_sq, sq,
+                                 &DEVICE(sq->ctrl)->mem_reentrancy_guard);
 
     if (n->dbbuf_enabled) {
         sq->db_addr = n->dbbuf_dbs + (sqid << 3);
@@ -4708,7 +4709,8 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
         }
     }
     n->cq[cqid] = cq;
-    cq->bh = qemu_bh_new(nvme_post_cqes, cq);
+    cq->bh = qemu_bh_new_guarded(nvme_post_cqes, cq,
+                                 &DEVICE(cq->ctrl)->mem_reentrancy_guard);
 }
 
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
index c485da792c..3de288b454 100644
--- a/hw/scsi/mptsas.c
+++ b/hw/scsi/mptsas.c
@@ -1322,7 +1322,8 @@ static void mptsas_scsi_realize(PCIDevice *dev, Error **errp)
     }
     s->max_devices = MPTSAS_NUM_PORTS;
 
-    s->request_bh = qemu_bh_new(mptsas_fetch_requests, s);
+    s->request_bh = qemu_bh_new_guarded(mptsas_fetch_requests, s,
+                                        &DEVICE(dev)->mem_reentrancy_guard);
 
     scsi_bus_init(&s->bus, sizeof(s->bus), &dev->qdev, &mptsas_scsi_info);
 }
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index ceceafb2cd..e5c9f7a53d 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -193,7 +193,8 @@ static void scsi_dma_restart_cb(void *opaque, bool running, RunState state)
         AioContext *ctx = blk_get_aio_context(s->conf.blk);
         /* The reference is dropped in scsi_dma_restart_bh.*/
         object_ref(OBJECT(s));
-        s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
+        s->bh = aio_bh_new_guarded(ctx, scsi_dma_restart_bh, s,
+                                   &DEVICE(s)->mem_reentrancy_guard);
         qemu_bh_schedule(s->bh);
     }
 }
diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index fa76696855..4de34536e9 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -1184,7 +1184,8 @@ pvscsi_realizefn(PCIDevice *pci_dev, Error **errp)
         pcie_endpoint_cap_init(pci_dev, PVSCSI_EXP_EP_OFFSET);
     }
 
-    s->completion_worker = qemu_bh_new(pvscsi_process_completion_queue, s);
+    s->completion_worker = qemu_bh_new_guarded(pvscsi_process_completion_queue, s,
+                                               &DEVICE(pci_dev)->mem_reentrancy_guard);
 
     scsi_bus_init(&s->bus, sizeof(s->bus), DEVICE(pci_dev), &pvscsi_scsi_info);
     /* override default SCSI bus hotplug-handler, with pvscsi's one */
diff --git a/hw/usb/dev-uas.c b/hw/usb/dev-uas.c
index 88f99c05d5..f013ded91e 100644
--- a/hw/usb/dev-uas.c
+++ b/hw/usb/dev-uas.c
@@ -937,7 +937,8 @@ static void usb_uas_realize(USBDevice *dev, Error **errp)
 
     QTAILQ_INIT(&uas->results);
     QTAILQ_INIT(&uas->requests);
-    uas->status_bh = qemu_bh_new(usb_uas_send_status_bh, uas);
+    uas->status_bh = qemu_bh_new_guarded(usb_uas_send_status_bh, uas,
+                                         &d->mem_reentrancy_guard);
 
     dev->flags |= (1 << USB_DEV_FLAG_IS_SCSI_STORAGE);
     scsi_bus_init(&uas->bus, sizeof(uas->bus), DEVICE(dev), &usb_uas_scsi_info);
diff --git a/hw/usb/hcd-dwc2.c b/hw/usb/hcd-dwc2.c
index 8755e9cbb0..a0c4e782b2 100644
--- a/hw/usb/hcd-dwc2.c
+++ b/hw/usb/hcd-dwc2.c
@@ -1364,7 +1364,8 @@ static void dwc2_realize(DeviceState *dev, Error **errp)
     s->fi = USB_FRMINTVL - 1;
     s->eof_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, dwc2_frame_boundary, s);
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, dwc2_work_timer, s);
-    s->async_bh = qemu_bh_new(dwc2_work_bh, s);
+    s->async_bh = qemu_bh_new_guarded(dwc2_work_bh, s,
+                                      &dev->mem_reentrancy_guard);
 
     sysbus_init_irq(sbd, &s->irq);
 }
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index d4da8dcb8d..c930c60921 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2533,7 +2533,8 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp)
     }
 
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_work_timer, s);
-    s->async_bh = qemu_bh_new(ehci_work_bh, s);
+    s->async_bh = qemu_bh_new_guarded(ehci_work_bh, s,
+                                      &dev->mem_reentrancy_guard);
     s->device = dev;
 
     s->vmstate = qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 30ae0104bb..bdc891f57a 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1193,7 +1193,7 @@ void usb_uhci_common_realize(PCIDevice *dev, Error **errp)
                               USB_SPEED_MASK_LOW | USB_SPEED_MASK_FULL);
         }
     }
-    s->bh = qemu_bh_new(uhci_bh, s);
+    s->bh = qemu_bh_new_guarded(uhci_bh, s, &DEVICE(dev)->mem_reentrancy_guard);
     s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, uhci_frame_timer, s);
     s->num_ports_vmstate = NB_PORTS;
     QTAILQ_INIT(&s->queues);
diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index 176868d345..f500db85ab 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -1141,7 +1141,8 @@ static void usb_host_nodev_bh(void *opaque)
 static void usb_host_nodev(USBHostDevice *s)
 {
     if (!s->bh_nodev) {
-        s->bh_nodev = qemu_bh_new(usb_host_nodev_bh, s);
+        s->bh_nodev = qemu_bh_new_guarded(usb_host_nodev_bh, s,
+                                          &DEVICE(s)->mem_reentrancy_guard);
     }
     qemu_bh_schedule(s->bh_nodev);
 }
@@ -1739,7 +1740,8 @@ static int usb_host_post_load(void *opaque, int version_id)
     USBHostDevice *dev = opaque;
 
     if (!dev->bh_postld) {
-        dev->bh_postld = qemu_bh_new(usb_host_post_load_bh, dev);
+        dev->bh_postld = qemu_bh_new_guarded(usb_host_post_load_bh, dev,
+                                             &DEVICE(dev)->mem_reentrancy_guard);
     }
     qemu_bh_schedule(dev->bh_postld);
     dev->bh_postld_pending = true;
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index fd7df599bc..39fbaaab16 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1441,8 +1441,10 @@ static void usbredir_realize(USBDevice *udev, Error **errp)
         }
     }
 
-    dev->chardev_close_bh = qemu_bh_new(usbredir_chardev_close_bh, dev);
-    dev->device_reject_bh = qemu_bh_new(usbredir_device_reject_bh, dev);
+    dev->chardev_close_bh = qemu_bh_new_guarded(usbredir_chardev_close_bh, dev,
+                                                &DEVICE(dev)->mem_reentrancy_guard);
+    dev->device_reject_bh = qemu_bh_new_guarded(usbredir_device_reject_bh, dev,
+                                                &DEVICE(dev)->mem_reentrancy_guard);
     dev->attach_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, usbredir_do_attach, dev);
 
     packet_id_queue_init(&dev->cancelled, dev, "cancelled");
diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 0f7369e7ed..dec91294ad 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -1021,7 +1021,8 @@ static void usbback_alloc(struct XenLegacyDevice *xendev)
 
     QTAILQ_INIT(&usbif->req_free_q);
     QSIMPLEQ_INIT(&usbif->hotplug_q);
-    usbif->bh = qemu_bh_new(usbback_bh, usbif);
+    usbif->bh = qemu_bh_new_guarded(usbback_bh, usbif,
+                                    &DEVICE(xendev)->mem_reentrancy_guard);
 }
 
 static int usbback_free(struct XenLegacyDevice *xendev)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 746f07c4d2..309cebacc6 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -908,8 +908,9 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
         precopy_add_notifier(&s->free_page_hint_notify);
 
         object_ref(OBJECT(s->iothread));
-        s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
-                                     virtio_ballloon_get_free_page_hints, s);
+        s->free_page_bh = aio_bh_new_guarded(iothread_get_aio_context(s->iothread),
+                                             virtio_ballloon_get_free_page_hints, s,
+                                             &DEVICE(s)->mem_reentrancy_guard);
     }
 
     if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 516425e26a..4c95f1096e 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -1050,7 +1050,8 @@ static void virtio_crypto_device_realize(DeviceState *dev, Error **errp)
         vcrypto->vqs[i].dataq =
                  virtio_add_queue(vdev, 1024, virtio_crypto_handle_dataq_bh);
         vcrypto->vqs[i].dataq_bh =
-                 qemu_bh_new(virtio_crypto_dataq_bh, &vcrypto->vqs[i]);
+                 qemu_bh_new_guarded(virtio_crypto_dataq_bh, &vcrypto->vqs[i],
+                                     &dev->mem_reentrancy_guard);
         vcrypto->vqs[i].vcrypto = vcrypto;
     }
 
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/3] memory: prevent dma-reentracy issues
  2023-01-19  7:03 ` [PATCH v4 1/3] " Alexander Bulekov
@ 2023-01-25 21:12   ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2023-01-25 21:12 UTC (permalink / raw)
  To: Alexander Bulekov
  Cc: qemu-devel, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

On Thu, Jan 19, 2023 at 02:03:06AM -0500, Alexander Bulekov wrote:
> Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> This flag is set/checked prior to calling a device's MemoryRegion
> handlers, and set when device code initiates DMA.  The purpose of this
> flag is to prevent two types of DMA-based reentrancy issues:
> 
> 1.) mmio -> dma -> mmio case
> 2.) bh -> dma write -> mmio case
> 
> These issues have led to problems such as stack-exhaustion and
> use-after-frees.
> 
> Summary of the problem from Peter Maydell:
> https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  include/hw/qdev-core.h |  7 +++++++
>  softmmu/memory.c       | 15 +++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 23 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API
  2023-01-19  7:03 ` [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API Alexander Bulekov
@ 2023-01-25 21:24   ` Stefan Hajnoczi
  2023-01-26  4:18     ` Alexander Bulekov
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2023-01-25 21:24 UTC (permalink / raw)
  To: Alexander Bulekov
  Cc: qemu-devel, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Fam Zheng, Kevin Wolf,
	Hanna Reitz, open list:Block I/O path

[-- Attachment #1: Type: text/plain, Size: 8235 bytes --]

On Thu, Jan 19, 2023 at 02:03:07AM -0500, Alexander Bulekov wrote:
> Devices can pass their MemoryReentrancyGuard (from their DeviceState),
> when creating new BHes. Then, the async API will toggle the guard
> before/after calling the BH call-back. This prevents bh->mmio reentrancy
> issues.
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  docs/devel/multiple-iothreads.txt |  2 ++
>  include/block/aio.h               | 18 ++++++++++++++++--
>  include/qemu/main-loop.h          |  7 +++++--
>  tests/unit/ptimer-test-stubs.c    |  3 ++-
>  util/async.c                      | 12 +++++++++++-
>  util/main-loop.c                  |  5 +++--
>  6 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
> index 343120f2ef..e4fafed9d9 100644
> --- a/docs/devel/multiple-iothreads.txt
> +++ b/docs/devel/multiple-iothreads.txt
> @@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
>   * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
>   * LEGACY timer_new_ms() - create a timer
>   * LEGACY qemu_bh_new() - create a BH
> + * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
>   * LEGACY qemu_aio_wait() - run an event loop iteration
>  
>  Since they implicitly work on the main loop they cannot be used in code that
> @@ -72,6 +73,7 @@ Instead, use the AioContext functions directly (see include/block/aio.h):
>   * aio_set_event_notifier() - monitor an event notifier
>   * aio_timer_new() - create a timer
>   * aio_bh_new() - create a BH
> + * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
>   * aio_poll() - run an event loop iteration
>  
>  The AioContext can be obtained from the IOThread using
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0f65a3cc9e..94d661ff7e 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -23,6 +23,8 @@
>  #include "qemu/thread.h"
>  #include "qemu/timer.h"
>  #include "block/graph-lock.h"
> +#include "hw/qdev-core.h"
> +
>  
>  typedef struct BlockAIOCB BlockAIOCB;
>  typedef void BlockCompletionFunc(void *opaque, int ret);
> @@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
>   * is opaque and must be allocated prior to its use.
>   *
>   * @name: A human-readable identifier for debugging purposes.
> + * @reentrancy_guard: A guard set when entering a cb to prevent
> + * device-reentrancy issues
>   */
>  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> -                        const char *name);
> +                        const char *name, MemReentrancyGuard *reentrancy_guard);
>  
>  /**
>   * aio_bh_new: Allocate a new bottom half structure
> @@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
>   * string.
>   */
>  #define aio_bh_new(ctx, cb, opaque) \
> -    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
> +    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
> +
> +/**
> + * aio_bh_new_guarded: Allocate a new bottom half structure with a
> + * reentrancy_guard
> + *
> + * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
> + * string.
> + */
> +#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
> +    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
>  
>  /**
>   * aio_notify: Force processing of pending events.
> diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> index c25f390696..84d1ce57f0 100644
> --- a/include/qemu/main-loop.h
> +++ b/include/qemu/main-loop.h
> @@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
>  
>  void qemu_fd_register(int fd);
>  
> +#define qemu_bh_new_guarded(cb, opaque, guard) \
> +    qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
>  #define qemu_bh_new(cb, opaque) \
> -    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
> -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
> +    qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
> +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> +                         MemReentrancyGuard *reentrancy_guard);
>  void qemu_bh_schedule_idle(QEMUBH *bh);
>  
>  enum {
> diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
> index f5e75a96b6..24d5413f9d 100644
> --- a/tests/unit/ptimer-test-stubs.c
> +++ b/tests/unit/ptimer-test-stubs.c
> @@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
>      return deadline;
>  }
>  
> -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
> +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> +                         MemReentrancyGuard *reentrancy_guard)
>  {
>      QEMUBH *bh = g_new(QEMUBH, 1);
>  
> diff --git a/util/async.c b/util/async.c
> index 14d63b3091..08924c3212 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -65,6 +65,7 @@ struct QEMUBH {
>      void *opaque;
>      QSLIST_ENTRY(QEMUBH) next;
>      unsigned flags;
> +    MemReentrancyGuard *reentrancy_guard;
>  };
>  
>  /* Called concurrently from any thread */
> @@ -133,7 +134,7 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
>  }
>  
>  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> -                        const char *name)
> +                        const char *name, MemReentrancyGuard *reentrancy_guard)
>  {
>      QEMUBH *bh;
>      bh = g_new(QEMUBH, 1);
> @@ -142,13 +143,22 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
>          .cb = cb,
>          .opaque = opaque,
>          .name = name,
> +        .reentrancy_guard = reentrancy_guard,
>      };
>      return bh;
>  }
>  
>  void aio_bh_call(QEMUBH *bh)
>  {
> +    if (bh->reentrancy_guard) {
> +        bh->reentrancy_guard->engaged_in_io = true;
> +    }
> +
>      bh->cb(bh->opaque);
> +
> +    if (bh->reentrancy_guard) {
> +        bh->reentrancy_guard->engaged_in_io = false;
> +    }
>  }

QEMU supports nested event loops. I think aio_bh_call() -> cb() ->
aio_poll() -> aio_bh_call() -> ... is possible although it should be
rare.

->engaged_in_io will set to false after the innermost aio_bh_call()
returns. Therefore the protection doesn't cover the remainder of the
parent cb() functions.

I think aio_bh_call() should be:

  void aio_bh_call(QEMUBH *bh)
  {
      bool last_engaged_in_io = false;

      if (bh->reentrancy_guard) {
          last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
          bh->reentrancy_guard->engaged_in_io = true;
      }

      bh->cb(bh->opaque);

      if (bh->reentrancy_guard) {
          bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
      }
  }

That way nested aio_poll() calls work as expected.

This also raises the question whether aio_bh_call() should call abort(3)
if ->engaged_in_io is already true when the function is entered? I think
that may be too strict, but I'm not sure. A scenario where this can
happen:

The memory region read/write function calls aio_poll() -> aio_bh_call()
and a BH with our device's re-entrancy guard is executed.

>  
>  /* Multiple occurrences of aio_bh_poll cannot be called concurrently. */
> diff --git a/util/main-loop.c b/util/main-loop.c
> index 58f776a8c9..07d2e2040a 100644
> --- a/util/main-loop.c
> +++ b/util/main-loop.c
> @@ -617,9 +617,10 @@ void main_loop_wait(int nonblocking)
>  
>  /* Functions to operate on the main QEMU AioContext.  */
>  
> -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
> +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name, MemReentrancyGuard *reentrancy_guard)
>  {
> -    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
> +    return aio_bh_new_full(qemu_aio_context, cb, opaque, name,
> +                           reentrancy_guard);
>  }
>  
>  /*
> -- 
> 2.39.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
  2023-01-19  7:03 ` [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded Alexander Bulekov
@ 2023-01-25 22:19   ` Stefan Hajnoczi
  2023-01-25 22:19   ` Stefan Hajnoczi
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2023-01-25 22:19 UTC (permalink / raw)
  To: Alexander Bulekov
  Cc: qemu-devel, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Kevin Wolf, Hanna Reitz, Amit Shah,
	Marc-André Lureau, John Snow, Peter Maydell,
	Mark Cave-Ayland, Keith Busch, Klaus Jensen, Fam Zheng,
	Dmitry Fleytman, Gonglei (Arei),
	open list:X86 Xen CPUs, open list:virtio-blk,
	open list:i.MX31 (kzm), open list:Old World (g3beige)

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  hw/9pfs/xen-9p-backend.c        | 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c           | 5 +++--
>  hw/char/virtio-serial-bus.c     | 3 ++-
>  hw/display/qxl.c                | 9 ++++++---
>  hw/display/virtio-gpu.c         | 6 ++++--
>  hw/ide/ahci.c                   | 3 ++-
>  hw/ide/core.c                   | 3 ++-
>  hw/misc/imx_rngc.c              | 6 ++++--
>  hw/misc/macio/mac_dbdma.c       | 2 +-
>  hw/net/virtio-net.c             | 3 ++-
>  hw/nvme/ctrl.c                  | 6 ++++--
>  hw/scsi/mptsas.c                | 3 ++-
>  hw/scsi/scsi-bus.c              | 3 ++-
>  hw/scsi/vmw_pvscsi.c            | 3 ++-
>  hw/usb/dev-uas.c                | 3 ++-
>  hw/usb/hcd-dwc2.c               | 3 ++-
>  hw/usb/hcd-ehci.c               | 3 ++-
>  hw/usb/hcd-uhci.c               | 2 +-
>  hw/usb/host-libusb.c            | 6 ++++--
>  hw/usb/redirect.c               | 6 ++++--
>  hw/usb/xen-usb.c                | 3 ++-
>  hw/virtio/virtio-balloon.c      | 5 +++--
>  hw/virtio/virtio-crypto.c       | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Should scripts/checkpatch.pl complain when qemu_bh_new() or aio_bh_new()
are called from hw/? Adding a check is important so new instances cannot
be added accidentally in the future.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
  2023-01-19  7:03 ` [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded Alexander Bulekov
  2023-01-25 22:19   ` Stefan Hajnoczi
@ 2023-01-25 22:19   ` Stefan Hajnoczi
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2023-01-25 22:19 UTC (permalink / raw)
  To: Alexander Bulekov
  Cc: qemu-devel, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Kevin Wolf, Hanna Reitz, Amit Shah,
	Marc-André Lureau, John Snow, Peter Maydell,
	Mark Cave-Ayland, Keith Busch, Klaus Jensen, Fam Zheng,
	Dmitry Fleytman, Gonglei (Arei),
	open list:X86 Xen CPUs, open list:virtio-blk,
	open list:i.MX31 (kzm), open list:Old World (g3beige)

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  hw/9pfs/xen-9p-backend.c        | 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c           | 5 +++--
>  hw/char/virtio-serial-bus.c     | 3 ++-
>  hw/display/qxl.c                | 9 ++++++---
>  hw/display/virtio-gpu.c         | 6 ++++--
>  hw/ide/ahci.c                   | 3 ++-
>  hw/ide/core.c                   | 3 ++-
>  hw/misc/imx_rngc.c              | 6 ++++--
>  hw/misc/macio/mac_dbdma.c       | 2 +-
>  hw/net/virtio-net.c             | 3 ++-
>  hw/nvme/ctrl.c                  | 6 ++++--
>  hw/scsi/mptsas.c                | 3 ++-
>  hw/scsi/scsi-bus.c              | 3 ++-
>  hw/scsi/vmw_pvscsi.c            | 3 ++-
>  hw/usb/dev-uas.c                | 3 ++-
>  hw/usb/hcd-dwc2.c               | 3 ++-
>  hw/usb/hcd-ehci.c               | 3 ++-
>  hw/usb/hcd-uhci.c               | 2 +-
>  hw/usb/host-libusb.c            | 6 ++++--
>  hw/usb/redirect.c               | 6 ++++--
>  hw/usb/xen-usb.c                | 3 ++-
>  hw/virtio/virtio-balloon.c      | 5 +++--
>  hw/virtio/virtio-crypto.c       | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API
  2023-01-25 21:24   ` Stefan Hajnoczi
@ 2023-01-26  4:18     ` Alexander Bulekov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-26  4:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen, Fam Zheng, Kevin Wolf,
	Hanna Reitz, open list:Block I/O path

On 230125 1624, Stefan Hajnoczi wrote:
> On Thu, Jan 19, 2023 at 02:03:07AM -0500, Alexander Bulekov wrote:
> > Devices can pass their MemoryReentrancyGuard (from their DeviceState),
> > when creating new BHes. Then, the async API will toggle the guard
> > before/after calling the BH call-back. This prevents bh->mmio reentrancy
> > issues.
> > 
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > ---
> >  docs/devel/multiple-iothreads.txt |  2 ++
> >  include/block/aio.h               | 18 ++++++++++++++++--
> >  include/qemu/main-loop.h          |  7 +++++--
> >  tests/unit/ptimer-test-stubs.c    |  3 ++-
> >  util/async.c                      | 12 +++++++++++-
> >  util/main-loop.c                  |  5 +++--
> >  6 files changed, 39 insertions(+), 8 deletions(-)
> > 
> > diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
> > index 343120f2ef..e4fafed9d9 100644
> > --- a/docs/devel/multiple-iothreads.txt
> > +++ b/docs/devel/multiple-iothreads.txt
> > @@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
> >   * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
> >   * LEGACY timer_new_ms() - create a timer
> >   * LEGACY qemu_bh_new() - create a BH
> > + * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
> >   * LEGACY qemu_aio_wait() - run an event loop iteration
> >  
> >  Since they implicitly work on the main loop they cannot be used in code that
> > @@ -72,6 +73,7 @@ Instead, use the AioContext functions directly (see include/block/aio.h):
> >   * aio_set_event_notifier() - monitor an event notifier
> >   * aio_timer_new() - create a timer
> >   * aio_bh_new() - create a BH
> > + * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
> >   * aio_poll() - run an event loop iteration
> >  
> >  The AioContext can be obtained from the IOThread using
> > diff --git a/include/block/aio.h b/include/block/aio.h
> > index 0f65a3cc9e..94d661ff7e 100644
> > --- a/include/block/aio.h
> > +++ b/include/block/aio.h
> > @@ -23,6 +23,8 @@
> >  #include "qemu/thread.h"
> >  #include "qemu/timer.h"
> >  #include "block/graph-lock.h"
> > +#include "hw/qdev-core.h"
> > +
> >  
> >  typedef struct BlockAIOCB BlockAIOCB;
> >  typedef void BlockCompletionFunc(void *opaque, int ret);
> > @@ -332,9 +334,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> >   * is opaque and must be allocated prior to its use.
> >   *
> >   * @name: A human-readable identifier for debugging purposes.
> > + * @reentrancy_guard: A guard set when entering a cb to prevent
> > + * device-reentrancy issues
> >   */
> >  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> > -                        const char *name);
> > +                        const char *name, MemReentrancyGuard *reentrancy_guard);
> >  
> >  /**
> >   * aio_bh_new: Allocate a new bottom half structure
> > @@ -343,7 +347,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> >   * string.
> >   */
> >  #define aio_bh_new(ctx, cb, opaque) \
> > -    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
> > +    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
> > +
> > +/**
> > + * aio_bh_new_guarded: Allocate a new bottom half structure with a
> > + * reentrancy_guard
> > + *
> > + * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
> > + * string.
> > + */
> > +#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
> > +    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
> >  
> >  /**
> >   * aio_notify: Force processing of pending events.
> > diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> > index c25f390696..84d1ce57f0 100644
> > --- a/include/qemu/main-loop.h
> > +++ b/include/qemu/main-loop.h
> > @@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
> >  
> >  void qemu_fd_register(int fd);
> >  
> > +#define qemu_bh_new_guarded(cb, opaque, guard) \
> > +    qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
> >  #define qemu_bh_new(cb, opaque) \
> > -    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
> > -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
> > +    qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
> > +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> > +                         MemReentrancyGuard *reentrancy_guard);
> >  void qemu_bh_schedule_idle(QEMUBH *bh);
> >  
> >  enum {
> > diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
> > index f5e75a96b6..24d5413f9d 100644
> > --- a/tests/unit/ptimer-test-stubs.c
> > +++ b/tests/unit/ptimer-test-stubs.c
> > @@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
> >      return deadline;
> >  }
> >  
> > -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
> > +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
> > +                         MemReentrancyGuard *reentrancy_guard)
> >  {
> >      QEMUBH *bh = g_new(QEMUBH, 1);
> >  
> > diff --git a/util/async.c b/util/async.c
> > index 14d63b3091..08924c3212 100644
> > --- a/util/async.c
> > +++ b/util/async.c
> > @@ -65,6 +65,7 @@ struct QEMUBH {
> >      void *opaque;
> >      QSLIST_ENTRY(QEMUBH) next;
> >      unsigned flags;
> > +    MemReentrancyGuard *reentrancy_guard;
> >  };
> >  
> >  /* Called concurrently from any thread */
> > @@ -133,7 +134,7 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
> >  }
> >  
> >  QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> > -                        const char *name)
> > +                        const char *name, MemReentrancyGuard *reentrancy_guard)
> >  {
> >      QEMUBH *bh;
> >      bh = g_new(QEMUBH, 1);
> > @@ -142,13 +143,22 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
> >          .cb = cb,
> >          .opaque = opaque,
> >          .name = name,
> > +        .reentrancy_guard = reentrancy_guard,
> >      };
> >      return bh;
> >  }
> >  
> >  void aio_bh_call(QEMUBH *bh)
> >  {
> > +    if (bh->reentrancy_guard) {
> > +        bh->reentrancy_guard->engaged_in_io = true;
> > +    }
> > +
> >      bh->cb(bh->opaque);
> > +
> > +    if (bh->reentrancy_guard) {
> > +        bh->reentrancy_guard->engaged_in_io = false;
> > +    }
> >  }
> 
> QEMU supports nested event loops. I think aio_bh_call() -> cb() ->
> aio_poll() -> aio_bh_call() -> ... is possible although it should be
> rare.
> 

Maybe 9p's v9fs_co_run_in_worker is an example of that, though I'm not
sure. That was one of the calls to qemu_bh_new that I could not find
a straightforward way to refactor..

> ->engaged_in_io will set to false after the innermost aio_bh_call()
> returns. Therefore the protection doesn't cover the remainder of the
> parent cb() functions.
> 
> I think aio_bh_call() should be:
> 
>   void aio_bh_call(QEMUBH *bh)
>   {
>       bool last_engaged_in_io = false;
> 
>       if (bh->reentrancy_guard) {
>           last_engaged_in_io = bh->reentrancy_guard->engaged_in_io;
>           bh->reentrancy_guard->engaged_in_io = true;
>       }
> 
>       bh->cb(bh->opaque);
> 
>       if (bh->reentrancy_guard) {
>           bh->reentrancy_guard->engaged_in_io = last_engaged_in_io;
>       }
>   }
> 
> That way nested aio_poll() calls work as expected.
> 
> This also raises the question whether aio_bh_call() should call abort(3)
> if ->engaged_in_io is already true when the function is entered? I think
> that may be too strict, but I'm not sure. A scenario where this can
> happen:
> 
> The memory region read/write function calls aio_poll() -> aio_bh_call()
> and a BH with our device's re-entrancy guard is executed.
> 

Is this sort of "bh reentrancy" only likely through a deliberate
design-decision by the code author? If so then, maybe it doesn't need to
be treated with the same severity as the memory-reentrancy case. I'll
add a tracepoint in the next version.
Thanks
-Alex

> >  
> >  /* Multiple occurrences of aio_bh_poll cannot be called concurrently. */
> > diff --git a/util/main-loop.c b/util/main-loop.c
> > index 58f776a8c9..07d2e2040a 100644
> > --- a/util/main-loop.c
> > +++ b/util/main-loop.c
> > @@ -617,9 +617,10 @@ void main_loop_wait(int nonblocking)
> >  
> >  /* Functions to operate on the main QEMU AioContext.  */
> >  
> > -QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
> > +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name, MemReentrancyGuard *reentrancy_guard)
> >  {
> > -    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
> > +    return aio_bh_new_full(qemu_aio_context, cb, opaque, name,
> > +                           reentrancy_guard);
> >  }
> >  
> >  /*
> > -- 
> > 2.39.0
> > 




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 0/3] memory: prevent dma-reentracy issues
@ 2023-01-19  7:01 Alexander Bulekov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible

Unlike v3, these changes should address issues in devices that bypass
DMA apis and directly call into address_space.
e.g. https://gitlab.com/qemu-project/qemu/-/issues/827

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
    optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
    attempting to guess DeviceStates in BHs.


Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  2 ++
 hw/9pfs/xen-9p-backend.c          |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c    |  5 +++--
 hw/block/virtio-blk.c             |  5 +++--
 hw/char/virtio-serial-bus.c       |  3 ++-
 hw/display/qxl.c                  |  9 ++++++---
 hw/display/virtio-gpu.c           |  6 ++++--
 hw/ide/ahci.c                     |  3 ++-
 hw/ide/core.c                     |  3 ++-
 hw/misc/imx_rngc.c                |  6 ++++--
 hw/misc/macio/mac_dbdma.c         |  2 +-
 hw/net/virtio-net.c               |  3 ++-
 hw/nvme/ctrl.c                    |  6 ++++--
 hw/scsi/mptsas.c                  |  3 ++-
 hw/scsi/scsi-bus.c                |  3 ++-
 hw/scsi/vmw_pvscsi.c              |  3 ++-
 hw/usb/dev-uas.c                  |  3 ++-
 hw/usb/hcd-dwc2.c                 |  3 ++-
 hw/usb/hcd-ehci.c                 |  3 ++-
 hw/usb/hcd-uhci.c                 |  2 +-
 hw/usb/host-libusb.c              |  6 ++++--
 hw/usb/redirect.c                 |  6 ++++--
 hw/usb/xen-usb.c                  |  3 ++-
 hw/virtio/virtio-balloon.c        |  5 +++--
 hw/virtio/virtio-crypto.c         |  3 ++-
 include/block/aio.h               | 18 ++++++++++++++++--
 include/hw/qdev-core.h            |  7 +++++++
 include/qemu/main-loop.h          |  7 +++++--
 softmmu/memory.c                  | 15 +++++++++++++++
 softmmu/trace-events              |  1 +
 tests/unit/ptimer-test-stubs.c    |  3 ++-
 util/async.c                      | 12 +++++++++++-
 util/main-loop.c                  |  5 +++--
 34 files changed, 128 insertions(+), 43 deletions(-)

-- 
2.39.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 0/3] memory: prevent dma-reentracy issues
@ 2023-01-19  7:00 Alexander Bulekov
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Bulekov @ 2023-01-19  7:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexander Bulekov, Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Mauro Matteo Cascella, Peter Xu, Jason Wang, David Hildenbrand,
	Gerd Hoffmann, Thomas Huth, Laurent Vivier, Bandan Das,
	Edgar E . Iglesias, Darren Kenny, Bin Meng, Paolo Bonzini,
	Michael S . Tsirkin, Marcel Apfelbaum, Daniel P . Berrangé,
	Eduardo Habkost, Jon Maloy, Siqi Chen

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible

Unlike v3, these changes should address issues in devices that bypass
DMA apis and directly call into address_space.
e.g. https://gitlab.com/qemu-project/qemu/-/issues/827

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
    optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
    attempting to guess DeviceStates in BHs.


Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

Alexander Bulekov (3):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  2 ++
 hw/9pfs/xen-9p-backend.c          |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c    |  5 +++--
 hw/block/virtio-blk.c             |  5 +++--
 hw/char/virtio-serial-bus.c       |  3 ++-
 hw/display/qxl.c                  |  9 ++++++---
 hw/display/virtio-gpu.c           |  6 ++++--
 hw/ide/ahci.c                     |  3 ++-
 hw/ide/core.c                     |  3 ++-
 hw/misc/imx_rngc.c                |  6 ++++--
 hw/misc/macio/mac_dbdma.c         |  2 +-
 hw/net/virtio-net.c               |  3 ++-
 hw/nvme/ctrl.c                    |  6 ++++--
 hw/scsi/mptsas.c                  |  3 ++-
 hw/scsi/scsi-bus.c                |  3 ++-
 hw/scsi/vmw_pvscsi.c              |  3 ++-
 hw/usb/dev-uas.c                  |  3 ++-
 hw/usb/hcd-dwc2.c                 |  3 ++-
 hw/usb/hcd-ehci.c                 |  3 ++-
 hw/usb/hcd-uhci.c                 |  2 +-
 hw/usb/host-libusb.c              |  6 ++++--
 hw/usb/redirect.c                 |  6 ++++--
 hw/usb/xen-usb.c                  |  3 ++-
 hw/virtio/virtio-balloon.c        |  5 +++--
 hw/virtio/virtio-crypto.c         |  3 ++-
 include/block/aio.h               | 18 ++++++++++++++++--
 include/hw/qdev-core.h            |  7 +++++++
 include/qemu/main-loop.h          |  7 +++++--
 softmmu/memory.c                  | 15 +++++++++++++++
 softmmu/trace-events              |  1 +
 tests/unit/ptimer-test-stubs.c    |  3 ++-
 util/async.c                      | 12 +++++++++++-
 util/main-loop.c                  |  5 +++--
 34 files changed, 128 insertions(+), 43 deletions(-)

-- 
2.39.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-26  4:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19  7:03 [PATCH v4 0/3] memory: prevent dma-reentracy issues Alexander Bulekov
2023-01-19  7:03 ` [PATCH v4 1/3] " Alexander Bulekov
2023-01-25 21:12   ` Stefan Hajnoczi
2023-01-19  7:03 ` [PATCH v4 2/3] async: Add an optional reentrancy guard to the BH API Alexander Bulekov
2023-01-25 21:24   ` Stefan Hajnoczi
2023-01-26  4:18     ` Alexander Bulekov
2023-01-19  7:03 ` [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded Alexander Bulekov
2023-01-25 22:19   ` Stefan Hajnoczi
2023-01-25 22:19   ` Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2023-01-19  7:01 [PATCH v4 0/3] memory: prevent dma-reentracy issues Alexander Bulekov
2023-01-19  7:00 Alexander Bulekov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.