All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
@ 2021-12-17  3:08 Alexander Bulekov
  2021-12-17  6:27 ` Qiuhao Li
  2021-12-17 13:58 ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 10+ messages in thread
From: Alexander Bulekov @ 2021-12-17  3:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	Qiuhao Li, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Thomas Huth, Edgar E . Iglesias, Philippe Mathieu-Daudé

Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
the DeviceState, which is set/checked when we call an accessor
associated with the device's IO MRs.

The problem, in short, as I understand it: For the vast majority of
cases, we want to prevent a device from accessing it's own PIO/MMIO
regions over DMA.

This patch/solution is based on some assumptions:
1. DMA accesses that hit mmio regions are only dangerous if they end up
interacting with memory-regions belonging to the device initiating the
DMA.
Not dangerous:  sdhci_pio->dma_write->e1000_mmio
Dangerous:      sdhci_pio->dma_write->sdhci_mmio

2. Most devices do not interact with their own PIO/MMIO memory-regions
using DMA.

3. There is no way for there to be multiple simultaneous accesses to a
device's PIO/MMIO memory-regions.

4. All devices are QOMified :-)

With this patch, I wasn't able to reproduce the issues being tracked
here, with QTest reproducers:
https://gitlab.com/qemu-project/qemu/-/issues/556

This passes the i386 qos/qtests for me and I was able to boot some linux/windows
VMs with basic devices configured, without any apparent problems.

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Li Qiang <liq3ea@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595..32f7c779ab 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    int engaged_in_direct_io;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..255c3c602f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;
 
     if (!access_size_min) {
@@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }
 
+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->engaged_in_direct_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->engaged_in_direct_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->engaged_in_direct_io = false;
+    }
     return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887b3c..d7228316db 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17  3:08 [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level Alexander Bulekov
@ 2021-12-17  6:27 ` Qiuhao Li
  2021-12-17  8:37   ` Klaus Jensen
                     ` (2 more replies)
  2021-12-17 13:58 ` Philippe Mathieu-Daudé
  1 sibling, 3 replies; 10+ messages in thread
From: Qiuhao Li @ 2021-12-17  6:27 UTC (permalink / raw)
  To: Alexander Bulekov, qemu-devel
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, Thomas Huth,
	Peter Xu, Eduardo Habkost, Darren Kenny, Bandan Das,
	Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Edgar E . Iglesias, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 8357 bytes --]

Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:

1. Device A Timer's callback -> Device A MMIO handler
2. Device A BH's callback -> Device A MMIO handler

In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?

Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:

cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
-machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
-device nvme,drive=disk0,serial=1 -qtest stdio \

outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
outl 0xcf8 0x80000804               /* CMD - Command */
outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
clock_step
EOF

CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?

________________________________
From: Alexander Bulekov <alxndr@bu.edu>
Sent: Friday, December 17, 2021 11:08
To: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Cc: Alexander Bulekov <alxndr@bu.edu>; Philippe Mathieu-Daudé <philmd@redhat.com>; Mauro Matteo Cascella <mcascell@redhat.com>; Qiuhao Li <Qiuhao.Li@outlook.com>; Peter Xu <peterx@redhat.com>; Jason Wang <jasowang@redhat.com>; David Hildenbrand <david@redhat.com>; Gerd Hoffmann <kraxel@redhat.com>; Peter Maydell <peter.maydell@linaro.org>; Li Qiang <liq3ea@gmail.com>; Thomas Huth <thuth@redhat.com>; Laurent Vivier <lvivier@redhat.com>; Bandan Das <bsd@redhat.com>; Edgar E . Iglesias <edgar.iglesias@gmail.com>; Darren Kenny <darren.kenny@oracle.com>; Bin Meng <bin.meng@windriver.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Eduardo Habkost <eduardo@habkost.net>
Subject: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level

Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
the DeviceState, which is set/checked when we call an accessor
associated with the device's IO MRs.

The problem, in short, as I understand it: For the vast majority of
cases, we want to prevent a device from accessing it's own PIO/MMIO
regions over DMA.

This patch/solution is based on some assumptions:
1. DMA accesses that hit mmio regions are only dangerous if they end up
interacting with memory-regions belonging to the device initiating the
DMA.
Not dangerous:  sdhci_pio->dma_write->e1000_mmio
Dangerous:      sdhci_pio->dma_write->sdhci_mmio

2. Most devices do not interact with their own PIO/MMIO memory-regions
using DMA.

3. There is no way for there to be multiple simultaneous accesses to a
device's PIO/MMIO memory-regions.

4. All devices are QOMified :-)

With this patch, I wasn't able to reproduce the issues being tracked
here, with QTest reproducers:
https://gitlab.com/qemu-project/qemu/-/issues/556

This passes the i386 qos/qtests for me and I was able to boot some linux/windows
VMs with basic devices configured, without any apparent problems.

Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Li Qiang <liq3ea@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
---
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 15 +++++++++++++++
 softmmu/trace-events   |  1 +
 3 files changed, 17 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 20d3066595..32f7c779ab 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    int engaged_in_direct_io;
 };

 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..255c3c602f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
     uint64_t access_mask;
     unsigned access_size;
     unsigned i;
+    DeviceState *dev = NULL;
     MemTxResult r = MEMTX_OK;

     if (!access_size_min) {
@@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         access_size_max = 4;
     }

+    /* Do not allow more than one simultanous access to a device's IO Regions */
+    if (mr->owner &&
+        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+        if (dev->engaged_in_direct_io) {
+            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
+            return MEMTX_ERROR;
+        }
+        dev->engaged_in_direct_io = true;
+    }
+
     /* FIXME: support unaligned access? */
     access_size = MAX(MIN(size, access_size_max), access_size_min);
     access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
                         access_mask, attrs);
         }
     }
+    if (dev) {
+        dev->engaged_in_direct_io = false;
+    }
     return r;
 }

diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887b3c..d7228316db 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
--
2.33.0


[-- Attachment #2: Type: text/html, Size: 14469 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17  6:27 ` Qiuhao Li
@ 2021-12-17  8:37   ` Klaus Jensen
  2021-12-17  9:44   ` Mauro Matteo Cascella
  2021-12-17 14:20   ` Alexander Bulekov
  2 siblings, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2021-12-17  8:37 UTC (permalink / raw)
  To: Qiuhao Li
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	qemu-devel, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Edgar E . Iglesias,
	Thomas Huth, Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 2487 bytes --]

On Dec 17 06:27, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
> 
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
> 
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
> 
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
> 
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
> 
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
> 
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
> 

This is a good reproducer. Does it still work if you do the `write
0xe0001000 0x1 0x01` at the end instead? It looks weird that you ring
the doorbell prior to writing the command in the queue.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17  6:27 ` Qiuhao Li
  2021-12-17  8:37   ` Klaus Jensen
@ 2021-12-17  9:44   ` Mauro Matteo Cascella
  2021-12-17 14:20   ` Alexander Bulekov
  2 siblings, 0 replies; 10+ messages in thread
From: Mauro Matteo Cascella @ 2021-12-17  9:44 UTC (permalink / raw)
  To: Qiuhao Li
  Cc: Laurent Vivier, Peter Maydell, Thomas Huth,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	qemu-devel, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Edgar E . Iglesias, Philippe Mathieu-Daudé

On Fri, Dec 17, 2021 at 7:28 AM Qiuhao Li <Qiuhao.Li@outlook.com> wrote:
>
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
>
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
>
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
>
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
>
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
>
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
>
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556??

Upstream issue: https://gitlab.com/qemu-project/qemu/-/issues/782

Thanks.
-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17  3:08 [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level Alexander Bulekov
  2021-12-17  6:27 ` Qiuhao Li
@ 2021-12-17 13:58 ` Philippe Mathieu-Daudé
  2021-12-17 14:30   ` Alexander Bulekov
  1 sibling, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-12-17 13:58 UTC (permalink / raw)
  To: Alexander Bulekov, qemu-devel, Paolo Bonzini, Edgar E . Iglesias
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, Qiuhao Li,
	Peter Xu, Eduardo Habkost, Darren Kenny, Bandan Das,
	Gerd Hoffmann, Stefan Hajnoczi, Thomas Huth

On 12/17/21 04:08, Alexander Bulekov wrote:
> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> the DeviceState, which is set/checked when we call an accessor
> associated with the device's IO MRs.

Your approach is exactly what Gerd suggested:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html

> The problem, in short, as I understand it: For the vast majority of
> cases, we want to prevent a device from accessing it's own PIO/MMIO
> regions over DMA.
> 
> This patch/solution is based on some assumptions:
> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> interacting with memory-regions belonging to the device initiating the
> DMA.
> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> Dangerous:      sdhci_pio->dma_write->sdhci_mmio

It doesn't have to be dangerous, see Paolo's example which
invalidated my previous attempt and forced me to write 24
patches in multiples series to keep the "niche" cases working:
https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html

> 
> 2. Most devices do not interact with their own PIO/MMIO memory-regions
> using DMA.
> 
> 3. There is no way for there to be multiple simultaneous accesses to a
> device's PIO/MMIO memory-regions.
> 
> 4. All devices are QOMified :-)
> 
> With this patch, I wasn't able to reproduce the issues being tracked
> here, with QTest reproducers:
> https://gitlab.com/qemu-project/qemu/-/issues/556
> 
> This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> VMs with basic devices configured, without any apparent problems.
> 
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Li Qiang <liq3ea@gmail.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Laurent Vivier <lvivier@redhat.com>
> Cc: Bandan Das <bsd@redhat.com>
> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> Cc: Darren Kenny <darren.kenny@oracle.com>
> Cc: Bin Meng <bin.meng@windriver.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 15 +++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 20d3066595..32f7c779ab 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -193,6 +193,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    int engaged_in_direct_io;
>  };
>  
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7340e19ff5..255c3c602f 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>      uint64_t access_mask;
>      unsigned access_size;
>      unsigned i;
> +    DeviceState *dev = NULL;
>      MemTxResult r = MEMTX_OK;
>  
>      if (!access_size_min) {
> @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>          access_size_max = 4;
>      }
>  
> +    /* Do not allow more than one simultanous access to a device's IO Regions */
> +    if (mr->owner &&
> +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> +        if (dev->engaged_in_direct_io) {
> +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> +            return MEMTX_ERROR;
> +        }
> +        dev->engaged_in_direct_io = true;
> +    }
> +
>      /* FIXME: support unaligned access? */
>      access_size = MAX(MIN(size, access_size_max), access_size_min);
>      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>                          access_mask, attrs);
>          }
>      }
> +    if (dev) {
> +        dev->engaged_in_direct_io = false;
> +    }
>      return r;
>  }
>  
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 9c88887b3c..d7228316db 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17  6:27 ` Qiuhao Li
  2021-12-17  8:37   ` Klaus Jensen
  2021-12-17  9:44   ` Mauro Matteo Cascella
@ 2021-12-17 14:20   ` Alexander Bulekov
  2 siblings, 0 replies; 10+ messages in thread
From: Alexander Bulekov @ 2021-12-17 14:20 UTC (permalink / raw)
  To: Qiuhao Li
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, qemu-devel,
	Peter Xu, Eduardo Habkost, Darren Kenny, Bandan Das,
	Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini, Thomas Huth,
	Edgar E . Iglesias, Philippe Mathieu-Daudé

On 211217 0627, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
> 
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
> 
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?

Ahh, I forgot that this can happen in BHs as well... I think the only
place to do that, without major API changes, is at pci_dma_rw and
dma_buffer_rw (where we can catch a glimpse of the
PCIDevice/DeviceState).

I'll send a V2.

-Alex

> 
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
> 
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
> 
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
> 
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
> 
> ________________________________
> From: Alexander Bulekov <alxndr@bu.edu>
> Sent: Friday, December 17, 2021 11:08
> To: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
> Cc: Alexander Bulekov <alxndr@bu.edu>; Philippe Mathieu-Daudé <philmd@redhat.com>; Mauro Matteo Cascella <mcascell@redhat.com>; Qiuhao Li <Qiuhao.Li@outlook.com>; Peter Xu <peterx@redhat.com>; Jason Wang <jasowang@redhat.com>; David Hildenbrand <david@redhat.com>; Gerd Hoffmann <kraxel@redhat.com>; Peter Maydell <peter.maydell@linaro.org>; Li Qiang <liq3ea@gmail.com>; Thomas Huth <thuth@redhat.com>; Laurent Vivier <lvivier@redhat.com>; Bandan Das <bsd@redhat.com>; Edgar E . Iglesias <edgar.iglesias@gmail.com>; Darren Kenny <darren.kenny@oracle.com>; Bin Meng <bin.meng@windriver.com>; Paolo Bonzini <pbonzini@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Eduardo Habkost <eduardo@habkost.net>
> Subject: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
> 
> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> the DeviceState, which is set/checked when we call an accessor
> associated with the device's IO MRs.
> 
> The problem, in short, as I understand it: For the vast majority of
> cases, we want to prevent a device from accessing it's own PIO/MMIO
> regions over DMA.
> 
> This patch/solution is based on some assumptions:
> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> interacting with memory-regions belonging to the device initiating the
> DMA.
> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> 
> 2. Most devices do not interact with their own PIO/MMIO memory-regions
> using DMA.
> 
> 3. There is no way for there to be multiple simultaneous accesses to a
> device's PIO/MMIO memory-regions.
> 
> 4. All devices are QOMified :-)
> 
> With this patch, I wasn't able to reproduce the issues being tracked
> here, with QTest reproducers:
> https://gitlab.com/qemu-project/qemu/-/issues/556
> 
> This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> VMs with basic devices configured, without any apparent problems.
> 
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Li Qiang <liq3ea@gmail.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Laurent Vivier <lvivier@redhat.com>
> Cc: Bandan Das <bsd@redhat.com>
> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> Cc: Darren Kenny <darren.kenny@oracle.com>
> Cc: Bin Meng <bin.meng@windriver.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> ---
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 15 +++++++++++++++
>  softmmu/trace-events   |  1 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 20d3066595..32f7c779ab 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -193,6 +193,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    int engaged_in_direct_io;
>  };
> 
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7340e19ff5..255c3c602f 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>      uint64_t access_mask;
>      unsigned access_size;
>      unsigned i;
> +    DeviceState *dev = NULL;
>      MemTxResult r = MEMTX_OK;
> 
>      if (!access_size_min) {
> @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>          access_size_max = 4;
>      }
> 
> +    /* Do not allow more than one simultanous access to a device's IO Regions */
> +    if (mr->owner &&
> +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> +        if (dev->engaged_in_direct_io) {
> +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> +            return MEMTX_ERROR;
> +        }
> +        dev->engaged_in_direct_io = true;
> +    }
> +
>      /* FIXME: support unaligned access? */
>      access_size = MAX(MIN(size, access_size_max), access_size_min);
>      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>                          access_mask, attrs);
>          }
>      }
> +    if (dev) {
> +        dev->engaged_in_direct_io = false;
> +    }
>      return r;
>  }
> 
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 9c88887b3c..d7228316db 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
>  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
>  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
>  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
>  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
> --
> 2.33.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17 13:58 ` Philippe Mathieu-Daudé
@ 2021-12-17 14:30   ` Alexander Bulekov
  2021-12-17 15:25     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Bulekov @ 2021-12-17 14:30 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, qemu-devel,
	Peter Xu, Qiuhao Li, Darren Kenny, Bandan Das, Gerd Hoffmann,
	Stefan Hajnoczi, Edgar E . Iglesias, Thomas Huth, Paolo Bonzini,
	Eduardo Habkost

On 211217 1458, Philippe Mathieu-Daudé wrote:
> On 12/17/21 04:08, Alexander Bulekov wrote:
> > Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> > the DeviceState, which is set/checked when we call an accessor
> > associated with the device's IO MRs.
> 
> Your approach is exactly what Gerd suggested:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html

Yes - my bad for not searching my mail more carefully.

> 
> > The problem, in short, as I understand it: For the vast majority of
> > cases, we want to prevent a device from accessing it's own PIO/MMIO
> > regions over DMA.
> > 
> > This patch/solution is based on some assumptions:
> > 1. DMA accesses that hit mmio regions are only dangerous if they end up
> > interacting with memory-regions belonging to the device initiating the
> > DMA.
> > Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> > Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> 
> It doesn't have to be dangerous, see Paolo's example which
> invalidated my previous attempt and forced me to write 24
> patches in multiples series to keep the "niche" cases working:
> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html

I don't understand what IO accesses this decodes to. This is loading a
picture into VRAM?

-Alex

> 
> > 
> > 2. Most devices do not interact with their own PIO/MMIO memory-regions
> > using DMA.
> > 
> > 3. There is no way for there to be multiple simultaneous accesses to a
> > device's PIO/MMIO memory-regions.
> > 
> > 4. All devices are QOMified :-)
> > 
> > With this patch, I wasn't able to reproduce the issues being tracked
> > here, with QTest reproducers:
> > https://gitlab.com/qemu-project/qemu/-/issues/556
> > 
> > This passes the i386 qos/qtests for me and I was able to boot some linux/windows
> > VMs with basic devices configured, without any apparent problems.
> > 
> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> > Cc: Mauro Matteo Cascella <mcascell@redhat.com>
> > Cc: Qiuhao Li <Qiuhao.Li@outlook.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Peter Maydell <peter.maydell@linaro.org>
> > Cc: Li Qiang <liq3ea@gmail.com>
> > Cc: Thomas Huth <thuth@redhat.com>
> > Cc: Laurent Vivier <lvivier@redhat.com>
> > Cc: Bandan Das <bsd@redhat.com>
> > Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> > Cc: Darren Kenny <darren.kenny@oracle.com>
> > Cc: Bin Meng <bin.meng@windriver.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > ---
> >  include/hw/qdev-core.h |  1 +
> >  softmmu/memory.c       | 15 +++++++++++++++
> >  softmmu/trace-events   |  1 +
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 20d3066595..32f7c779ab 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -193,6 +193,7 @@ struct DeviceState {
> >      int instance_id_alias;
> >      int alias_required_for_version;
> >      ResettableState reset;
> > +    int engaged_in_direct_io;
> >  };
> >  
> >  struct DeviceListener {
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 7340e19ff5..255c3c602f 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -532,6 +532,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >      uint64_t access_mask;
> >      unsigned access_size;
> >      unsigned i;
> > +    DeviceState *dev = NULL;
> >      MemTxResult r = MEMTX_OK;
> >  
> >      if (!access_size_min) {
> > @@ -541,6 +542,17 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >          access_size_max = 4;
> >      }
> >  
> > +    /* Do not allow more than one simultanous access to a device's IO Regions */
> > +    if (mr->owner &&
> > +        !mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
> > +        dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
> > +        if (dev->engaged_in_direct_io) {
> > +            trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, size);
> > +            return MEMTX_ERROR;
> > +        }
> > +        dev->engaged_in_direct_io = true;
> > +    }
> > +
> >      /* FIXME: support unaligned access? */
> >      access_size = MAX(MIN(size, access_size_max), access_size_min);
> >      access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> > @@ -555,6 +567,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
> >                          access_mask, attrs);
> >          }
> >      }
> > +    if (dev) {
> > +        dev->engaged_in_direct_io = false;
> > +    }
> >      return r;
> >  }
> >  
> > diff --git a/softmmu/trace-events b/softmmu/trace-events
> > index 9c88887b3c..d7228316db 100644
> > --- a/softmmu/trace-events
> > +++ b/softmmu/trace-events
> > @@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
> >  memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
> >  memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size %u"
> > +memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
> >  memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
> >  memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
> > 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17 14:30   ` Alexander Bulekov
@ 2021-12-17 15:25     ` Philippe Mathieu-Daudé
  2021-12-17 16:51       ` Alexander Bulekov
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-12-17 15:25 UTC (permalink / raw)
  To: Alexander Bulekov
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, qemu-devel,
	Peter Xu, Qiuhao Li, Darren Kenny, Bandan Das, Gerd Hoffmann,
	Stefan Hajnoczi, Edgar E . Iglesias, Thomas Huth, Paolo Bonzini,
	Eduardo Habkost

On 12/17/21 15:30, Alexander Bulekov wrote:
> On 211217 1458, Philippe Mathieu-Daudé wrote:
>> On 12/17/21 04:08, Alexander Bulekov wrote:
>>> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
>>> the DeviceState, which is set/checked when we call an accessor
>>> associated with the device's IO MRs.
>>
>> Your approach is exactly what Gerd suggested:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html
> 
> Yes - my bad for not searching my mail more carefully.

Well it is not "exactly" the same, but almost.

>>
>>> The problem, in short, as I understand it: For the vast majority of
>>> cases, we want to prevent a device from accessing it's own PIO/MMIO
>>> regions over DMA.
>>>
>>> This patch/solution is based on some assumptions:
>>> 1. DMA accesses that hit mmio regions are only dangerous if they end up
>>> interacting with memory-regions belonging to the device initiating the
>>> DMA.
>>> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
>>> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
>>
>> It doesn't have to be dangerous, see Paolo's example which
>> invalidated my previous attempt and forced me to write 24
>> patches in multiples series to keep the "niche" cases working:
>> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html
> 
> I don't understand what IO accesses this decodes to. This is loading a
> picture into VRAM?

I'd say "loading a picture into VRAM via the DMA" but am not sure :)

This link is helpful:
http://petesqbsite.com/sections/tutorials/tutorials/peekpoke.txt



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
  2021-12-17 15:25     ` Philippe Mathieu-Daudé
@ 2021-12-17 16:51       ` Alexander Bulekov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Bulekov @ 2021-12-17 16:51 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	David Hildenbrand, Jason Wang, Bin Meng, Li Qiang, qemu-devel,
	Peter Xu, Qiuhao Li, Darren Kenny, Bandan Das, Gerd Hoffmann,
	Stefan Hajnoczi, Edgar E . Iglesias, Thomas Huth, Paolo Bonzini,
	Eduardo Habkost

On 211217 1625, Philippe Mathieu-Daudé wrote:
> On 12/17/21 15:30, Alexander Bulekov wrote:
> > On 211217 1458, Philippe Mathieu-Daudé wrote:
> >> On 12/17/21 04:08, Alexander Bulekov wrote:
> >>> Here's my shot at fixing dma-reentracy issues. This patch adds a flag to
> >>> the DeviceState, which is set/checked when we call an accessor
> >>> associated with the device's IO MRs.
> >>
> >> Your approach is exactly what Gerd suggested:
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg831437.html
> > 
> > Yes - my bad for not searching my mail more carefully.
> 
> Well it is not "exactly" the same, but almost.
> 
> >>
> >>> The problem, in short, as I understand it: For the vast majority of
> >>> cases, we want to prevent a device from accessing it's own PIO/MMIO
> >>> regions over DMA.
> >>>
> >>> This patch/solution is based on some assumptions:
> >>> 1. DMA accesses that hit mmio regions are only dangerous if they end up
> >>> interacting with memory-regions belonging to the device initiating the
> >>> DMA.
> >>> Not dangerous:  sdhci_pio->dma_write->e1000_mmio
> >>> Dangerous:      sdhci_pio->dma_write->sdhci_mmio
> >>
> >> It doesn't have to be dangerous, see Paolo's example which
> >> invalidated my previous attempt and forced me to write 24
> >> patches in multiples series to keep the "niche" cases working:
> >> https://www.mail-archive.com/qemu-block@nongnu.org/msg72939.html
> > 
> > I don't understand what IO accesses this decodes to. This is loading a
> > picture into VRAM?
> 
> I'd say "loading a picture into VRAM via the DMA" but am not sure :)
> 
> This link is helpful:
> http://petesqbsite.com/sections/tutorials/tutorials/peekpoke.txt
>

https://github.com/microsoft/GW-BASIC/blob/edf82c2ebf6bfe099c2054e0ae125c3efe5769c4/GIO86.ASM#L333

AFAICT this would just do repeated MMIO writes to VRAM - no DMA
involved?

Maybe there is some way to log when a device performs a DMA access to
it's own IO regions, so that we could identify these niche cases? We
would still need a way to actually trigger that behavior...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level
@ 2021-12-17  8:51 Qiuhao Li
  0 siblings, 0 replies; 10+ messages in thread
From: Qiuhao Li @ 2021-12-17  8:51 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Laurent Vivier, Peter Maydell, Mauro Matteo Cascella,
	Daniel P. Berrangé,
	Darren Kenny, David Hildenbrand, Jason Wang, Bin Meng, Li Qiang,
	qemu-devel, Peter Xu, Eduardo Habkost, Alexander Bulekov,
	Bandan Das, Gerd Hoffmann, Stefan Hajnoczi, Edgar E . Iglesias,
	Thomas Huth, Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 4486 bytes --]

Yes, it still works. Now it looks orthodox:

cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
-machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
-device nvme,drive=disk0,serial=1 -qtest stdio \

outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
outl 0xcf8 0x80000804               /* CMD - Command */
outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
clock_step
EOF

I also wrote a PoC in the guest OS which led to worse result, but the QTest reproducer may be enough.


________________________________
From: Klaus Jensen
Sent: Friday, December 17, 2021 16:37
To: Qiuhao Li
Cc: Alexander Bulekov; qemu-devel@nongnu.org; Laurent Vivier; Peter Maydell; Mauro Matteo Cascella; Daniel P. Berrangé; David Hildenbrand; Jason Wang; Bin Meng; Li Qiang; Thomas Huth; Peter Xu; Eduardo Habkost; Darren Kenny; Bandan Das; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini; Edgar E . Iglesias; Philippe Mathieu-Daudé
Subject: Re: [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level

On Dec 17 06:27, Qiuhao Li wrote:
> Thanks Alex. It seems this patch sets and checks if the destination device is busy. But how about the data transfers not triggered directly by PMIO/MMIO handlers? For example:
>
> 1. Device A Timer's callback -> Device A MMIO handler
> 2. Device A BH's callback -> Device A MMIO handler
>
> In these situations, when A launches a DMA to itself, the dev->engaged_in_direct_io is not set, so the operation is allowed. Maybe we should log the source and check the destination when we launch data transfers. Is there a way to do that?
>
> Below is a reproducer in NVMe which triggers DMA in a timer's callback (nvme_process_sq). I can still trigger use-after-free exception with this patch on qemu-6.1.0:
>
> cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest \
> -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 \
> -device nvme,drive=disk0,serial=1 -qtest stdio \
>
> outl 0xcf8 0x80000810               /* MLBAR (BAR0) – Memory Register Base Address, lower 32-bits */
> outl 0xcfc 0xe0000000               /* MMIO Base Address = 0xe0000000 */
> outl 0xcf8 0x80000804               /* CMD - Command */
> outw 0xcfc 0x06                     /* Bus Master Enable, Memory Space Enable */
> write 0xe0000024 0x4 0x02000200     /* [3] 3.1.8, Admin Queue Attributes */
> write 0xe0000028 0x4 0x00100000     /* asq = 0x1000 */
> write 0xe0000030 0x4 0x00200000     /* acq = 0x2000 */
> write 0xe0000014 0x4 0x01004600     /* [3] 3.1.5, Controller Configuration, start ctrl */
> write 0xe0001000 0x1 0x01           /* [3] 3.1.24, SQyTDBL – Submission Queue y Tail Doorbell */
> write 0x1000 0x1 0x02               /* cmd->opcode, NVME_ADM_CMD_GET_LOG_PAGE, nvme_get_log() */
> write 0x1018 0x4 0x140000e0         /* prp1 = 0xe0000014, NVME_REG_CC, nvme_ctrl_reset() */
> write 0x1028 0x4 0x03000004         /* cmd->cdw10, lid = 3 NVME_LOG_FW_SLOT_INFO, nvme_fw_log_info, buf_len = 4 */
> write 0x1030 0x4 0xfc010000         /* cmd->cdw12 = 0x1fc, Log Page Offset, trans_len = sizeof(fw_log) - 0x1fc = 4 */
> clock_step
> EOF
>
> CC: Mauro Matteo Cascella and Philippe Mathieu-Daudé. Should we put the reproducer above to https://gitlab.com/qemu-project/qemu/-/issues/556?
>

This is a good reproducer. Does it still work if you do the `write
0xe0001000 0x1 0x01` at the end instead? It looks weird that you ring
the doorbell prior to writing the command in the queue.

[-- Attachment #2: Type: text/html, Size: 7207 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-17 16:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17  3:08 [RFC PATCH] memory: Fix dma-reentrancy issues at the MMIO level Alexander Bulekov
2021-12-17  6:27 ` Qiuhao Li
2021-12-17  8:37   ` Klaus Jensen
2021-12-17  9:44   ` Mauro Matteo Cascella
2021-12-17 14:20   ` Alexander Bulekov
2021-12-17 13:58 ` Philippe Mathieu-Daudé
2021-12-17 14:30   ` Alexander Bulekov
2021-12-17 15:25     ` Philippe Mathieu-Daudé
2021-12-17 16:51       ` Alexander Bulekov
2021-12-17  8:51 Qiuhao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.