All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
@ 2021-12-17  3:08 Alexander Bulekov
  2021-12-17  6:27 ` Qiuhao Li
  2021-12-17 13:58 ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 10+ messages in thread
From: Alexander Bulekov @ 2021-12-17  3:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	Qiuhao Li, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Thomas Huth, Edgar E . Iglesias, Philippe Mathieu-Daudé

Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
the DeviceState, which is set/checked when we call an accessor
associated with the device's IO MRs.

The problem, in short, as I understand it: For the vast majority of
cases, we want to prevent a device from accessing it's own PIO/MMIO
regions over DMA.

This patch/solution is based on some assumptions:
1. DMA accesses that hit mmio regions are only dangerous if they end up
interacting with memory-regions belonging to the device initiating the
DMA.
Not dangerous:  sdhci_pio->dma_write->e1000_mmio
Dangerous:      sdhci_pio->dma_write->sdhci_mmio

2. Most devices do not interact with their own PIO/MMIO memory-regions
using DMA.

3. There is no way for there to be multiple simultaneous accesses to a
device's PIO/MMIO memory-regions.

4. All devices are QOMified :-)

With this patch, I wasn't able to reproduce the issues being tracked
here, with QTest reproducers:
https://gitlab.com/qemu-project/qemu/-/issues/556

This passes the i386 qos/qtests for me and I was able to boot some linux/windows
VMs with basic devices configured, without any apparent problems.

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Li Qiang <liq3ea@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595..32f7c779ab 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    int engaged_in_direct_io;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..255c3c602f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->engaged_in_direct_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->engaged_in_direct_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->engaged_in_direct_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887b3c..d7228316db 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread
* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
@ 2021-12-17  8:51 Qiuhao Li
  0 siblings, 0 replies; 10+ messages in thread
From: Qiuhao Li @ 2021-12-17  8:51 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	qemu-devel, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Edgar E . Iglesias,
	Thomas Huth, Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 4486 bytes --]

Yes, it still works. Now it looks orthodox:

cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
-machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
-device nvme,drive=disk0,serial=1 -qtest stdio \

outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
outl 0xcf8 0x80000804               /* CMD - Command */
outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
clock_step
EOF

I also wrote a PoC in the guest OS which led to worse result, but the QTest reproducer may be enough.


________________________________
From: Klaus Jensen
Sent: Friday, December 17, 2021 16:37
To: Qiuhao Li
Cc: Alexander Bulekov; qemu-devel@nongnu.org; Laurent Vivier; Peter Maydell; Mauro Matteo Cascella; Daniel P. Berrangé; David Hildenbrand; Jason Wang; Bin Meng; Li Qiang; Thomas Huth; Peter Xu; Eduardo Habkost; Darren Kenny; Bandan Das; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini; Edgar E . Iglesias; Philippe Mathieu-Daudé
Subject: Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level

On Dec 17 06:27, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
>
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
>
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
>
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
>
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
>
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
>
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
>

This is a good reproducer. Does it still work if you do the `write
0xe0001000 0x1 0x01` at the end instead? It looks weird that you ring
the doorbell prior to writing the command in the queue.

[-- Attachment #2: Type: text/html, Size: 7207 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-17 16:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17  3:08 [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level Alexander Bulekov
2021-12-17  6:27 ` Qiuhao Li
2021-12-17  8:37   ` Klaus Jensen
2021-12-17  9:44   ` Mauro Matteo Cascella
2021-12-17 14:20   ` Alexander Bulekov
2021-12-17 13:58 ` Philippe Mathieu-Daudé
2021-12-17 14:30   ` Alexander Bulekov
2021-12-17 15:25     ` Philippe Mathieu-Daudé
2021-12-17 16:51       ` Alexander Bulekov
2021-12-17  8:51 Qiuhao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.