All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] iommu series
@ 2012-06-19  6:39 Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
                   ` (13 more replies)
  0 siblings, 14 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony

This is a rebase of the iommu series and the barrier patch together
on top of current qemu.

As for our discussions about doing things with Memory Regions etc
I eventually came to the conclusion that we should just apply this
first :-)

My reasons (other than it makes my life much easier which it does)
are that:

 - We already have PCI DMA accessors, so devices using those
will be unaffected by further changes

 - The few devices that are modified in this series to use the
DMA accessors directly are ... few, and need to do it essentially
because they either deal with multiple bus types (AHCI, EHCI,...)
or because they are in a separate layer (bdev). Fixing them to
use some other interfaces would be easy (they are few)) and might
be unnecessary as well as we might want (or can) easily keep an
object of type "DMAContext" to represent the DMA capabilities of a
device as the head of the chain of MemoryRegions in a future
more flexible design.

 - It provides a good spot to stick our memory barrier

 - It gives us something working now for 1.2, I know that at least
freescale powerpc and a number of ARM folks are waiting for it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:14   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set() Benjamin Herrenschmidt
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

A while back, we introduced the dma_addr_t type, which is supposed to
be used for bus visible memory addresses.  At present, this is an
alias for target_phys_addr_t, but this will change when we eventually
add support for guest visible IOMMUs.

There are some instances of target_phys_addr_t in the code now which
should really be dma_addr_t, but can't be trivially converted due to
missing features which this patch corrects.

 * We add DMA_ADDR_BITS analagous to TARGET_PHYS_ADDR_BITS.  This is
   important where we need to make a compile-time (#if) based on the
   size of dma_addr_t.

 * We add a new helper macro to create device properties which take a
   dma_addr_t, currently an alias to DEFINE_PROP_TADDR().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma.h         |    1 +
 hw/qdev-dma.h |   12 ++++++++++++
 2 files changed, 13 insertions(+)
 create mode 100644 hw/qdev-dma.h

diff --git a/dma.h b/dma.h
index 8c1ec8f..fe08b72 100644
--- a/dma.h
+++ b/dma.h
@@ -31,6 +31,7 @@ struct QEMUSGList {
 #if defined(TARGET_PHYS_ADDR_BITS)
 typedef target_phys_addr_t dma_addr_t;
 
+#define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
 #define DMA_ADDR_FMT TARGET_FMT_plx
 
 struct ScatterGatherEntry {
diff --git a/hw/qdev-dma.h b/hw/qdev-dma.h
new file mode 100644
index 0000000..f0ff558
--- /dev/null
+++ b/hw/qdev-dma.h
@@ -0,0 +1,12 @@
+/*
+ * Support for dma_addr_t typed properties
+ *
+ * Copyright (C) 2012 David Gibson, IBM Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qdev-addr.h"
+
+#define DEFINE_PROP_DMAADDR(_n, _s, _f, _d)                               \
+    DEFINE_PROP_TADDR(_n, _s, _f, _d)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:15   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds cpu_physical_memory_set() function.  This is equivalent to
calling cpu_physical_memory_write() with a buffer filled with a character,
ie, a memset of target memory.

It uses a small temporary buffer on the stack.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 cpu-common.h |    1 +
 exec.c       |   15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/cpu-common.h b/cpu-common.h
index 1fe3280..8d3596a 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -53,6 +53,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 
 void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
                             int len, int is_write);
+void cpu_physical_memory_set(target_phys_addr_t addr, uint8_t c, int len);
 static inline void cpu_physical_memory_read(target_phys_addr_t addr,
                                             void *buf, int len)
 {
diff --git a/exec.c b/exec.c
index b5d6885..cfd7008 100644
--- a/exec.c
+++ b/exec.c
@@ -3601,6 +3601,21 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
     }
 }
 
+void cpu_physical_memory_set(target_phys_addr_t addr, uint8_t c, int len)
+{
+#define FILLBUF_SIZE 512
+    uint8_t fillbuf[FILLBUF_SIZE];
+    int l;
+
+    memset(fillbuf, c, FILLBUF_SIZE);
+    while (len > 0) {
+        l = len < FILLBUF_SIZE ? len : FILLBUF_SIZE;
+        cpu_physical_memory_rw(addr, fillbuf, l, true);
+        len -= len;
+        addr += len;
+    }
+}
+
 /* used for ROM loading : can write in RAM and ROM */
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
                                    const uint8_t *buf, int len)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set() Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:16   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, anthony, Eduard - Gabriel Munteanu,
	David Gibson, Richard Henderson

From: David Gibson <david@gibson.dropbear.id.au>

Not that long ago, every device implementation using DMA directly
accessed guest memory using cpu_physical_memory_*().  This meant that
adding support for a guest visible IOMMU would require changing every
one of these devices to go through IOMMU translation.

Shortly before qemu 1.0, I made a start on fixing this by providing
helper functions for PCI DMA.  These are currently just stubs which
call the direct access functions, but mean that an IOMMU can be
implemented in one place, rather than for every PCI device.

Clearly, this doesn't help for non PCI devices, which could also be
IOMMU translated on some platforms.  It is also problematic for the
devices which have both PCI and non-PCI version (e.g. OHCI, AHCI) - we
cannot use the the pci_dma_*() functions, because they assume the
presence of a PCIDevice, but we don't want to have to check between
pci_dma_*() and cpu_physical_memory_*() every time we do a DMA in the
device code.

This patch makes the first step on addressing both these problems, by
introducing new (stub) dma helper functions which can be used for any
DMA capable device.

These dma functions take a DMAContext *, a new (currently empty)
variable describing the DMA address space in which the operation is to
take place.  NULL indicates untranslated DMA directly into guest
physical address space.  The intention is that in future non-NULL
values will given information about any necessary IOMMU translation.

DMA using devices must obtain a DMAContext (or, potentially, contexts)
from their bus or platform.  For now this patch just converts the PCI
wrappers to be implemented in terms of the universal wrappers,
converting other drivers can take place over time.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Richard Henderson <rth@twiddle.net>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma.h         |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h      |   21 ++++++------
 qemu-common.h |    1 +
 3 files changed, 113 insertions(+), 9 deletions(-)

diff --git a/dma.h b/dma.h
index fe08b72..4449a0c 100644
--- a/dma.h
+++ b/dma.h
@@ -34,6 +34,106 @@ typedef target_phys_addr_t dma_addr_t;
 #define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
 #define DMA_ADDR_FMT TARGET_FMT_plx
 
+/* Checks that the given range of addresses is valid for DMA.  This is
+ * useful for certain cases, but usually you should just use
+ * dma_memory_{read,write}() and check for errors */
+static inline bool dma_memory_valid(DMAContext *dma, dma_addr_t addr,
+                                    dma_addr_t len, DMADirection dir)
+{
+    /* Stub version, with no iommu we assume all bus addresses are valid */
+    return true;
+}
+
+static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+                                void *buf, dma_addr_t len, DMADirection dir)
+{
+    /* Stub version when we have no iommu support */
+    cpu_physical_memory_rw(addr, buf, (target_phys_addr_t)len,
+                           dir == DMA_DIRECTION_FROM_DEVICE);
+    return 0;
+}
+
+static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
+                                  void *buf, dma_addr_t len)
+{
+    return dma_memory_rw(dma, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+}
+
+static inline int dma_memory_write(DMAContext *dma, dma_addr_t addr,
+                                   const void *buf, dma_addr_t len)
+{
+    return dma_memory_rw(dma, addr, (void *)buf, len,
+                         DMA_DIRECTION_FROM_DEVICE);
+}
+
+static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
+                                 uint8_t c, dma_addr_t len)
+{
+    /* Stub version when we have no iommu support */
+    cpu_physical_memory_set(addr, c, len);
+    return 0;
+}
+
+static inline void *dma_memory_map(DMAContext *dma,
+                                   dma_addr_t addr, dma_addr_t *len,
+                                   DMADirection dir)
+{
+    target_phys_addr_t xlen = *len;
+    void *p;
+
+    p = cpu_physical_memory_map(addr, &xlen,
+                                dir == DMA_DIRECTION_FROM_DEVICE);
+    *len = xlen;
+    return p;
+}
+
+static inline void dma_memory_unmap(DMAContext *dma,
+                                    void *buffer, dma_addr_t len,
+                                    DMADirection dir, dma_addr_t access_len)
+{
+    return cpu_physical_memory_unmap(buffer, (target_phys_addr_t)len,
+                                     dir == DMA_DIRECTION_FROM_DEVICE,
+                                     access_len);
+}
+
+#define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
+    static inline uint##_bits##_t ld##_lname##_##_end##_dma(DMAContext *dma, \
+                                                            dma_addr_t addr) \
+    {                                                                   \
+        uint##_bits##_t val;                                            \
+        dma_memory_read(dma, addr, &val, (_bits) / 8);                  \
+        return _end##_bits##_to_cpu(val);                               \
+    }                                                                   \
+    static inline void st##_sname##_##_end##_dma(DMAContext *dma,       \
+                                                 dma_addr_t addr,       \
+                                                 uint##_bits##_t val)   \
+    {                                                                   \
+        val = cpu_to_##_end##_bits(val);                                \
+        dma_memory_write(dma, addr, &val, (_bits) / 8);                 \
+    }
+
+static inline uint8_t ldub_dma(DMAContext *dma, dma_addr_t addr)
+{
+    uint8_t val;
+
+    dma_memory_read(dma, addr, &val, 1);
+    return val;
+}
+
+static inline void stb_dma(DMAContext *dma, dma_addr_t addr, uint8_t val)
+{
+    dma_memory_write(dma, addr, &val, 1);
+}
+
+DEFINE_LDST_DMA(uw, w, 16, le);
+DEFINE_LDST_DMA(l, l, 32, le);
+DEFINE_LDST_DMA(q, q, 64, le);
+DEFINE_LDST_DMA(uw, w, 16, be);
+DEFINE_LDST_DMA(l, l, 32, be);
+DEFINE_LDST_DMA(q, q, 64, be);
+
+#undef DEFINE_LDST_DMA
+
 struct ScatterGatherEntry {
     dma_addr_t base;
     dma_addr_t len;
diff --git a/hw/pci.h b/hw/pci.h
index 7f223c0..ee669d9 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
 }
 
 /* DMA access functions */
+static inline DMAContext *pci_dma_context(PCIDevice *dev)
+{
+    /* Stub for when we have no PCI iommu support */
+    return NULL;
+}
+
 static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
                              void *buf, dma_addr_t len, DMADirection dir)
 {
-    cpu_physical_memory_rw(addr, buf, len, dir == DMA_DIRECTION_FROM_DEVICE);
+    dma_memory_rw(pci_dma_context(dev), addr, buf, len, dir);
     return 0;
 }
 
@@ -581,12 +587,12 @@ static inline int pci_dma_write(PCIDevice *dev, dma_addr_t addr,
     static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,      \
                                                    dma_addr_t addr)     \
     {                                                                   \
-        return ld##_l##_phys(addr);                                     \
+        return ld##_l##_dma(pci_dma_context(dev), addr);                \
     }                                                                   \
     static inline void st##_s##_pci_dma(PCIDevice *dev,                 \
-                          dma_addr_t addr, uint##_bits##_t val)         \
+                                        dma_addr_t addr, uint##_bits##_t val) \
     {                                                                   \
-        st##_s##_phys(addr, val);                                       \
+        st##_s##_dma(pci_dma_context(dev), addr, val);                  \
     }
 
 PCI_DMA_DEFINE_LDST(ub, b, 8);
@@ -602,19 +608,16 @@ PCI_DMA_DEFINE_LDST(q_be, q_be, 64);
 static inline void *pci_dma_map(PCIDevice *dev, dma_addr_t addr,
                                 dma_addr_t *plen, DMADirection dir)
 {
-    target_phys_addr_t len = *plen;
     void *buf;
 
-    buf = cpu_physical_memory_map(addr, &len, dir == DMA_DIRECTION_FROM_DEVICE);
-    *plen = len;
+    buf = dma_memory_map(pci_dma_context(dev), addr, plen, dir);
     return buf;
 }
 
 static inline void pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len,
                                  DMADirection dir, dma_addr_t access_len)
 {
-    cpu_physical_memory_unmap(buffer, len, dir == DMA_DIRECTION_FROM_DEVICE,
-                              access_len);
+    dma_memory_unmap(pci_dma_context(dev), buffer, len, dir, access_len);
 }
 
 static inline void pci_dma_sglist_init(QEMUSGList *qsg, PCIDevice *dev,
diff --git a/qemu-common.h b/qemu-common.h
index 8f87e41..80026af 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
+typedef struct DMAContext DMAContext;
 
 typedef uint64_t pcibus_t;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (2 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:18   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael S. Tsirkin, Gerd Hoffmann, anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The OHCI device emulation can provide both PCI and SysBus OHCI
implementations.  Because of this, it was not previously converted to
use the PCI DMA helper functions.

This patch converts it to use the new universal DMA helper functions.
In the PCI case, it obtains its DMAContext from pci_dma_context(), in
the SysBus case, it uses NULL - i.e. assumes for now that there will
be no IOMMU translation for a SysBus OHCI.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/usb/hcd-ohci.c |   93 +++++++++++++++++++++++++++++------------------------
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index 1a1cc88..844e7ed 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -31,7 +31,7 @@
 #include "hw/usb.h"
 #include "hw/pci.h"
 #include "hw/sysbus.h"
-#include "hw/qdev-addr.h"
+#include "hw/qdev-dma.h"
 
 //#define DEBUG_OHCI
 /* Dump packet contents.  */
@@ -62,6 +62,7 @@ typedef struct {
     USBBus bus;
     qemu_irq irq;
     MemoryRegion mem;
+    DMAContext *dma;
     int num_ports;
     const char *name;
 
@@ -104,7 +105,7 @@ typedef struct {
     uint32_t htest;
 
     /* SM501 local memory offset */
-    target_phys_addr_t localmem_base;
+    dma_addr_t localmem_base;
 
     /* Active packets.  */
     uint32_t old_ctl;
@@ -482,14 +483,14 @@ static void ohci_reset(void *opaque)
 
 /* Get an array of dwords from main memory */
 static inline int get_dwords(OHCIState *ohci,
-                             uint32_t addr, uint32_t *buf, int num)
+                             dma_addr_t addr, uint32_t *buf, int num)
 {
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_read(addr, buf, sizeof(*buf));
+        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
         *buf = le32_to_cpu(*buf);
     }
 
@@ -498,7 +499,7 @@ static inline int get_dwords(OHCIState *ohci,
 
 /* Put an array of dwords in to main memory */
 static inline int put_dwords(OHCIState *ohci,
-                             uint32_t addr, uint32_t *buf, int num)
+                             dma_addr_t addr, uint32_t *buf, int num)
 {
     int i;
 
@@ -506,7 +507,7 @@ static inline int put_dwords(OHCIState *ohci,
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint32_t tmp = cpu_to_le32(*buf);
-        cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
+        dma_memory_write(ohci->dma, addr, &tmp, sizeof(tmp));
     }
 
     return 1;
@@ -514,14 +515,14 @@ static inline int put_dwords(OHCIState *ohci,
 
 /* Get an array of words from main memory */
 static inline int get_words(OHCIState *ohci,
-                            uint32_t addr, uint16_t *buf, int num)
+                            dma_addr_t addr, uint16_t *buf, int num)
 {
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_read(addr, buf, sizeof(*buf));
+        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
         *buf = le16_to_cpu(*buf);
     }
 
@@ -530,7 +531,7 @@ static inline int get_words(OHCIState *ohci,
 
 /* Put an array of words in to main memory */
 static inline int put_words(OHCIState *ohci,
-                            uint32_t addr, uint16_t *buf, int num)
+                            dma_addr_t addr, uint16_t *buf, int num)
 {
     int i;
 
@@ -538,40 +539,40 @@ static inline int put_words(OHCIState *ohci,
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint16_t tmp = cpu_to_le16(*buf);
-        cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
+        dma_memory_write(ohci->dma, addr, &tmp, sizeof(tmp));
     }
 
     return 1;
 }
 
 static inline int ohci_read_ed(OHCIState *ohci,
-                               uint32_t addr, struct ohci_ed *ed)
+                               dma_addr_t addr, struct ohci_ed *ed)
 {
     return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed) >> 2);
 }
 
 static inline int ohci_read_td(OHCIState *ohci,
-                               uint32_t addr, struct ohci_td *td)
+                               dma_addr_t addr, struct ohci_td *td)
 {
     return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td) >> 2);
 }
 
 static inline int ohci_read_iso_td(OHCIState *ohci,
-                                   uint32_t addr, struct ohci_iso_td *td)
+                                   dma_addr_t addr, struct ohci_iso_td *td)
 {
     return (get_dwords(ohci, addr, (uint32_t *)td, 4) &&
             get_words(ohci, addr + 16, td->offset, 8));
 }
 
 static inline int ohci_read_hcca(OHCIState *ohci,
-                                 uint32_t addr, struct ohci_hcca *hcca)
+                                 dma_addr_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
+    dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, sizeof(*hcca));
     return 1;
 }
 
 static inline int ohci_put_ed(OHCIState *ohci,
-                              uint32_t addr, struct ohci_ed *ed)
+                              dma_addr_t addr, struct ohci_ed *ed)
 {
     /* ed->tail is under control of the HCD.
      * Since just ed->head is changed by HC, just write back this
@@ -583,64 +584,63 @@ static inline int ohci_put_ed(OHCIState *ohci,
 }
 
 static inline int ohci_put_td(OHCIState *ohci,
-                              uint32_t addr, struct ohci_td *td)
+                              dma_addr_t addr, struct ohci_td *td)
 {
     return put_dwords(ohci, addr, (uint32_t *)td, sizeof(*td) >> 2);
 }
 
 static inline int ohci_put_iso_td(OHCIState *ohci,
-                                  uint32_t addr, struct ohci_iso_td *td)
+                                  dma_addr_t addr, struct ohci_iso_td *td)
 {
     return (put_dwords(ohci, addr, (uint32_t *)td, 4) &&
             put_words(ohci, addr + 16, td->offset, 8));
 }
 
 static inline int ohci_put_hcca(OHCIState *ohci,
-                                uint32_t addr, struct ohci_hcca *hcca)
+                                dma_addr_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_write(addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
-                              (char *)hcca + HCCA_WRITEBACK_OFFSET,
-                              HCCA_WRITEBACK_SIZE);
+    dma_memory_write(ohci->dma,
+                     addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
+                     (char *)hcca + HCCA_WRITEBACK_OFFSET,
+                     HCCA_WRITEBACK_SIZE);
     return 1;
 }
 
 /* Read/Write the contents of a TD from/to main memory.  */
 static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
-                         uint8_t *buf, int len, int write)
+                         uint8_t *buf, int len, DMADirection dir)
 {
-    uint32_t ptr;
-    uint32_t n;
+    dma_addr_t ptr, n;
 
     ptr = td->cbp;
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
     if (n == len)
         return;
     ptr = td->be & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
 }
 
 /* Read/Write the contents of an ISO TD from/to main memory.  */
 static void ohci_copy_iso_td(OHCIState *ohci,
                              uint32_t start_addr, uint32_t end_addr,
-                             uint8_t *buf, int len, int write)
+                             uint8_t *buf, int len, DMADirection dir)
 {
-    uint32_t ptr;
-    uint32_t n;
+    dma_addr_t ptr, n;
 
     ptr = start_addr;
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
     if (n == len)
         return;
     ptr = end_addr & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
 }
 
 static void ohci_process_lists(OHCIState *ohci, int completion);
@@ -803,7 +803,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
     }
 
     if (len && dir != OHCI_TD_DIR_IN) {
-        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len, 0);
+        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len,
+                         DMA_DIRECTION_TO_DEVICE);
     }
 
     if (completion) {
@@ -827,7 +828,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
     /* Writeback */
     if (dir == OHCI_TD_DIR_IN && ret >= 0 && ret <= len) {
         /* IN transfer succeeded */
-        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret, 1);
+        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret,
+                         DMA_DIRECTION_FROM_DEVICE);
         OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_CC,
                     OHCI_CC_NOERROR);
         OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_SIZE, ret);
@@ -971,7 +973,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
                 pktlen = len;
             }
             if (!completion) {
-                ohci_copy_td(ohci, &td, ohci->usb_buf, pktlen, 0);
+                ohci_copy_td(ohci, &td, ohci->usb_buf, pktlen,
+                             DMA_DIRECTION_TO_DEVICE);
             }
         }
     }
@@ -1021,7 +1024,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
     }
     if (ret >= 0) {
         if (dir == OHCI_TD_DIR_IN) {
-            ohci_copy_td(ohci, &td, ohci->usb_buf, ret, 1);
+            ohci_copy_td(ohci, &td, ohci->usb_buf, ret,
+                         DMA_DIRECTION_FROM_DEVICE);
 #ifdef DEBUG_PACKET
             DPRINTF("  data:");
             for (i = 0; i < ret; i++)
@@ -1748,11 +1752,14 @@ static USBBusOps ohci_bus_ops = {
 };
 
 static int usb_ohci_init(OHCIState *ohci, DeviceState *dev,
-                         int num_ports, uint32_t localmem_base,
-                         char *masterbus, uint32_t firstport)
+                         int num_ports, dma_addr_t localmem_base,
+                         char *masterbus, uint32_t firstport,
+                         DMAContext *dma)
 {
     int i;
 
+    ohci->dma = dma;
+
     if (usb_frame_time == 0) {
 #ifdef OHCI_TIME_WARP
         usb_frame_time = get_ticks_per_sec();
@@ -1817,7 +1824,8 @@ static int usb_ohci_initfn_pci(struct PCIDevice *dev)
     ohci->pci_dev.config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin A */
 
     if (usb_ohci_init(&ohci->state, &dev->qdev, ohci->num_ports, 0,
-                      ohci->masterbus, ohci->firstport) != 0) {
+                      ohci->masterbus, ohci->firstport,
+                      pci_dma_context(dev)) != 0) {
         return -1;
     }
     ohci->state.irq = ohci->pci_dev.irq[0];
@@ -1831,7 +1839,7 @@ typedef struct {
     SysBusDevice busdev;
     OHCIState ohci;
     uint32_t num_ports;
-    target_phys_addr_t dma_offset;
+    dma_addr_t dma_offset;
 } OHCISysBusState;
 
 static int ohci_init_pxa(SysBusDevice *dev)
@@ -1839,7 +1847,8 @@ static int ohci_init_pxa(SysBusDevice *dev)
     OHCISysBusState *s = FROM_SYSBUS(OHCISysBusState, dev);
 
     /* Cannot fail as we pass NULL for masterbus */
-    usb_ohci_init(&s->ohci, &dev->qdev, s->num_ports, s->dma_offset, NULL, 0);
+    usb_ohci_init(&s->ohci, &dev->qdev, s->num_ports, s->dma_offset, NULL, 0,
+                  NULL);
     sysbus_init_irq(dev, &s->ohci.irq);
     sysbus_init_mmio(dev, &s->ohci.mem);
 
@@ -1875,7 +1884,7 @@ static TypeInfo ohci_pci_info = {
 
 static Property ohci_sysbus_properties[] = {
     DEFINE_PROP_UINT32("num-ports", OHCISysBusState, num_ports, 3),
-    DEFINE_PROP_TADDR("dma-offset", OHCISysBusState, dma_offset, 3),
+    DEFINE_PROP_DMAADDR("dma-offset", OHCISysBusState, dma_offset, 3),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (3 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:21   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions Benjamin Herrenschmidt
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Michael S. Tsirkin, anthony, Paolo Bonzini, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

dma-helpers.c contains a number of helper functions for doing
scatter/gather DMA, and various block device related DMA.  Currently,
these directly access guest memory using cpu_physical_memory_*(),
assuming no IOMMU translation.

This patch updates this code to use the new universal DMA helper
functions.  qemu_sglist_init() now takes a DMAContext * to describe
the DMA address space in which the scatter/gather will take place.

We minimally update the callers qemu_sglist_init() to pass NULL
(i.e. no translation, same as current behaviour).  Some of those
callers should pass something else in some cases to allow proper IOMMU
translation in future, but that will be fixed in later patches.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma-helpers.c  |   24 ++++++++++++------------
 dma.h          |    3 ++-
 hw/ide/ahci.c  |    3 ++-
 hw/ide/macio.c |    4 ++--
 hw/pci.h       |    2 +-
 5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 7971a89..2dc4691 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,13 @@
 #include "dma.h"
 #include "trace.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma)
 {
     qsg->sg = g_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+    qsg->dma = dma;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len)
@@ -74,10 +75,9 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len,
-                                  dbs->dir != DMA_DIRECTION_TO_DEVICE,
-                                  dbs->iov.iov[i].iov_len);
+        dma_memory_unmap(dbs->sg->dma, dbs->iov.iov[i].iov_base,
+                         dbs->iov.iov[i].iov_len, dbs->dir,
+                         dbs->iov.iov[i].iov_len);
     }
     qemu_iovec_reset(&dbs->iov);
 }
@@ -106,7 +106,7 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
-    target_phys_addr_t cur_addr, cur_len;
+    dma_addr_t cur_addr, cur_len;
     void *mem;
 
     trace_dma_bdrv_cb(dbs, ret);
@@ -123,8 +123,7 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len,
-                                      dbs->dir != DMA_DIRECTION_TO_DEVICE);
+        mem = dma_memory_map(dbs->sg->dma, cur_addr, &cur_len, dbs->dir);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
@@ -209,7 +208,8 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
 }
 
 
-static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_dev)
+static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg,
+                           DMADirection dir)
 {
     uint64_t resid;
     int sg_cur_index;
@@ -220,7 +220,7 @@ static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_de
     while (len > 0) {
         ScatterGatherEntry entry = sg->sg[sg_cur_index++];
         int32_t xfer = MIN(len, entry.len);
-        cpu_physical_memory_rw(entry.base, ptr, xfer, !to_dev);
+        dma_memory_rw(sg->dma, entry.base, ptr, xfer, dir);
         ptr += xfer;
         len -= xfer;
         resid -= xfer;
@@ -231,12 +231,12 @@ static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_de
 
 uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg)
 {
-    return dma_buf_rw(ptr, len, sg, 0);
+    return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_FROM_DEVICE);
 }
 
 uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg)
 {
-    return dma_buf_rw(ptr, len, sg, 1);
+    return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE);
 }
 
 void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
diff --git a/dma.h b/dma.h
index 4449a0c..cd002c7 100644
--- a/dma.h
+++ b/dma.h
@@ -26,6 +26,7 @@ struct QEMUSGList {
     int nsg;
     int nalloc;
     size_t size;
+    DMAContext *dma;
 };
 
 #if defined(TARGET_PHYS_ADDR_BITS)
@@ -139,7 +140,7 @@ struct ScatterGatherEntry {
     dma_addr_t len;
 };
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma);
 void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e275e68..6c4226d 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -667,7 +667,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     if (sglist_alloc_hint > 0) {
         AHCI_SG *tbl = (AHCI_SG *)prdt;
 
-        qemu_sglist_init(sglist, sglist_alloc_hint);
+        /* FIXME: pass the correct DMAContext */
+        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
         for (i = 0; i < sglist_alloc_hint; i++) {
             /* flags_size is zero-based */
             qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 7b38d9e..848cb31 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -76,7 +76,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
@@ -133,7 +133,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
     s->io_buffer_index = 0;
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
diff --git a/hw/pci.h b/hw/pci.h
index ee669d9..99b7e61 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -623,7 +623,7 @@ static inline void pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len,
 static inline void pci_dma_sglist_init(QEMUSGList *qsg, PCIDevice *dev,
                                        int alloc_hint)
 {
-    qemu_sglist_init(qsg, alloc_hint);
+    qemu_sglist_init(qsg, alloc_hint, pci_dma_context(dev));
 }
 
 extern const VMStateDescription vmstate_pci_device;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (4 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Michael S. Tsirkin, anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The AHCI device can provide both PCI and SysBus AHCI device
emulations.  For this reason, it wasn't previously converted to use
the pci_dma_*() helper functions.  Now that we have universal DMA
helper functions, this converts AHCI to use them.

The DMAContext is obtained from pci_dma_context() in the PCI case and
set to NULL in the SysBus case (i.e. we assume for now that a SysBus
AHCI has no IOMMU translation).

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/ide/ahci.c |   34 ++++++++++++++++++++--------------
 hw/ide/ahci.h |    3 ++-
 hw/ide/ich.c  |    2 +-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 6c4226d..efea93f 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -588,7 +588,7 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis)
     AHCIPortRegs *pr = &ad->port_regs;
     uint8_t *d2h_fis;
     int i;
-    target_phys_addr_t cmd_len = 0x80;
+    dma_addr_t cmd_len = 0x80;
     int cmd_mapped = 0;
 
     if (!ad->res_fis || !(pr->cmd & PORT_CMD_FIS_RX)) {
@@ -598,7 +598,8 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis)
     if (!cmd_fis) {
         /* map cmd_fis */
         uint64_t tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);
-        cmd_fis = cpu_physical_memory_map(tbl_addr, &cmd_len, 0);
+        cmd_fis = dma_memory_map(ad->hba->dma, tbl_addr, &cmd_len,
+                                 DMA_DIRECTION_TO_DEVICE);
         cmd_mapped = 1;
     }
 
@@ -630,7 +631,8 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis)
     ahci_trigger_irq(ad->hba, ad, PORT_IRQ_D2H_REG_FIS);
 
     if (cmd_mapped) {
-        cpu_physical_memory_unmap(cmd_fis, cmd_len, 0, cmd_len);
+        dma_memory_unmap(ad->hba->dma, cmd_fis, cmd_len,
+                         DMA_DIRECTION_TO_DEVICE, cmd_len);
     }
 }
 
@@ -640,8 +642,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     uint32_t opts = le32_to_cpu(cmd->opts);
     uint64_t prdt_addr = le64_to_cpu(cmd->tbl_addr) + 0x80;
     int sglist_alloc_hint = opts >> AHCI_CMD_HDR_PRDT_LEN;
-    target_phys_addr_t prdt_len = (sglist_alloc_hint * sizeof(AHCI_SG));
-    target_phys_addr_t real_prdt_len = prdt_len;
+    dma_addr_t prdt_len = (sglist_alloc_hint * sizeof(AHCI_SG));
+    dma_addr_t real_prdt_len = prdt_len;
     uint8_t *prdt;
     int i;
     int r = 0;
@@ -652,7 +654,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     }
 
     /* map PRDT */
-    if (!(prdt = cpu_physical_memory_map(prdt_addr, &prdt_len, 0))){
+    if (!(prdt = dma_memory_map(ad->hba->dma, prdt_addr, &prdt_len,
+                                DMA_DIRECTION_TO_DEVICE))){
         DPRINTF(ad->port_no, "map failed\n");
         return -1;
     }
@@ -667,8 +670,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     if (sglist_alloc_hint > 0) {
         AHCI_SG *tbl = (AHCI_SG *)prdt;
 
-        /* FIXME: pass the correct DMAContext */
-        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
+        qemu_sglist_init(sglist, sglist_alloc_hint, ad->hba->dma);
         for (i = 0; i < sglist_alloc_hint; i++) {
             /* flags_size is zero-based */
             qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
@@ -677,7 +679,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     }
 
 out:
-    cpu_physical_memory_unmap(prdt, prdt_len, 0, prdt_len);
+    dma_memory_unmap(ad->hba->dma, prdt, prdt_len,
+                     DMA_DIRECTION_TO_DEVICE, prdt_len);
     return r;
 }
 
@@ -787,7 +790,7 @@ static int handle_cmd(AHCIState *s, int port, int slot)
     uint64_t tbl_addr;
     AHCICmdHdr *cmd;
     uint8_t *cmd_fis;
-    target_phys_addr_t cmd_len;
+    dma_addr_t cmd_len;
 
     if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
         /* Engine currently busy, try again later */
@@ -809,7 +812,8 @@ static int handle_cmd(AHCIState *s, int port, int slot)
     tbl_addr = le64_to_cpu(cmd->tbl_addr);
 
     cmd_len = 0x80;
-    cmd_fis = cpu_physical_memory_map(tbl_addr, &cmd_len, 1);
+    cmd_fis = dma_memory_map(s->dma, tbl_addr, &cmd_len,
+                             DMA_DIRECTION_FROM_DEVICE);
 
     if (!cmd_fis) {
         DPRINTF(port, "error: guest passed us an invalid cmd fis\n");
@@ -935,7 +939,8 @@ static int handle_cmd(AHCIState *s, int port, int slot)
     }
 
 out:
-    cpu_physical_memory_unmap(cmd_fis, cmd_len, 1, cmd_len);
+    dma_memory_unmap(s->dma, cmd_fis, cmd_len, DMA_DIRECTION_FROM_DEVICE,
+                     cmd_len);
 
     if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
         /* async command, complete later */
@@ -1115,11 +1120,12 @@ static const IDEDMAOps ahci_dma_ops = {
     .reset = ahci_dma_reset,
 };
 
-void ahci_init(AHCIState *s, DeviceState *qdev, int ports)
+void ahci_init(AHCIState *s, DeviceState *qdev, DMAContext *dma, int ports)
 {
     qemu_irq *irqs;
     int i;
 
+    s->dma = dma;
     s->ports = ports;
     s->dev = g_malloc0(sizeof(AHCIDevice) * ports);
     ahci_reg_init(s);
@@ -1188,7 +1194,7 @@ static void sysbus_ahci_reset(DeviceState *dev)
 static int sysbus_ahci_init(SysBusDevice *dev)
 {
     SysbusAHCIState *s = FROM_SYSBUS(SysbusAHCIState, dev);
-    ahci_init(&s->ahci, &dev->qdev, s->num_ports);
+    ahci_init(&s->ahci, &dev->qdev, NULL, s->num_ports);
 
     sysbus_init_mmio(dev, &s->ahci.mem);
     sysbus_init_irq(dev, &s->ahci.irq);
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index ec1b6a5..1200a56 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -299,6 +299,7 @@ typedef struct AHCIState {
     uint32_t idp_index;     /* Current IDP index */
     int ports;
     qemu_irq irq;
+    DMAContext *dma;
 } AHCIState;
 
 typedef struct AHCIPCIState {
@@ -329,7 +330,7 @@ typedef struct NCQFrame {
     uint8_t reserved10;
 } QEMU_PACKED NCQFrame;
 
-void ahci_init(AHCIState *s, DeviceState *qdev, int ports);
+void ahci_init(AHCIState *s, DeviceState *qdev, DMAContext *dma, int ports);
 void ahci_uninit(AHCIState *s);
 
 void ahci_reset(AHCIState *s);
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index e3eaaea..319bc2b 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -98,7 +98,7 @@ static int pci_ich9_ahci_init(PCIDevice *dev)
     uint8_t *sata_cap;
     d = DO_UPCAST(struct AHCIPCIState, card, dev);
 
-    ahci_init(&d->ahci, &dev->qdev, 6);
+    ahci_init(&d->ahci, &dev->qdev, pci_dma_context(dev), 6);
 
     pci_config_set_prog_interface(d->card.config, AHCI_PROGMODE_MAJOR_REV_1);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (5 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19 13:42   ` Gerd Hoffmann
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure Benjamin Herrenschmidt
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The USB UHCI and EHCI drivers were converted some time ago to use the
pci_dma_*() helper functions.  However, this conversion was not complete
because in some places both these drivers do DMA via the usb_packet_map()
function in usb-libhw.c.  That function directly used
cpu_physical_memory_map().

Now that the sglist code uses DMA wrappers properly, we can convert the
functions in usb-libhw.c, thus conpleting the conversion of UHCI and EHCI
to use the DMA wrappers.

Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
callback function.  When IOMMU support is added, this will mean that
usb_packet_map() and the corresponding usb_packet_unmap() must be called in
close proximity without dropping the qemu device lock - otherwise the guest
might invalidate IOMMU mappings while they are still in use by the device
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/usb.h          |    2 +-
 hw/usb/hcd-ehci.c |    4 ++--
 hw/usb/hcd-uhci.c |    2 +-
 hw/usb/libhw.c    |   21 +++++++++++----------
 4 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/usb.h b/hw/usb.h
index 2a56fe5..a5623d3 100644
--- a/hw/usb.h
+++ b/hw/usb.h
@@ -345,7 +345,7 @@ void usb_packet_check_state(USBPacket *p, USBPacketState expected);
 void usb_packet_setup(USBPacket *p, int pid, USBEndpoint *ep);
 void usb_packet_addbuf(USBPacket *p, void *ptr, size_t len);
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl);
-void usb_packet_unmap(USBPacket *p);
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl);
 void usb_packet_copy(USBPacket *p, void *ptr, size_t bytes);
 void usb_packet_skip(USBPacket *p, size_t bytes);
 void usb_packet_cleanup(USBPacket *p);
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 5298204..81bbc54 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1422,8 +1422,8 @@ static void ehci_execute_complete(EHCIQueue *q)
         set_field(&q->qh.token, p->tbytes, QTD_TOKEN_TBYTES);
     }
     ehci_finish_transfer(q, p->usb_status);
+    usb_packet_unmap(&p->packet, &p->sgl);
     qemu_sglist_destroy(&p->sgl);
-    usb_packet_unmap(&p->packet);
 
     q->qh.token ^= QTD_TOKEN_DTOGGLE;
     q->qh.token &= ~QTD_TOKEN_ACTIVE;
@@ -1547,7 +1547,7 @@ static int ehci_process_itd(EHCIState *ehci,
                 usb_packet_map(&ehci->ipacket, &ehci->isgl);
                 ret = usb_handle_packet(dev, &ehci->ipacket);
                 assert(ret != USB_RET_ASYNC);
-                usb_packet_unmap(&ehci->ipacket);
+                usb_packet_unmap(&ehci->ipacket, &ehci->isgl);
             } else {
                 DPRINTF("ISOCH: attempt to addess non-iso endpoint\n");
                 ret = USB_RET_NAK;
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 9871e24..86888ce 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -871,7 +871,7 @@ static int uhci_handle_td(UHCIState *s, uint32_t addr, UHCI_TD *td,
 
 done:
     len = uhci_complete_td(s, td, async, int_mask);
-    usb_packet_unmap(&async->packet);
+    usb_packet_unmap(&async->packet, &async->sgl);
     uhci_async_free(async);
     return len;
 }
diff --git a/hw/usb/libhw.c b/hw/usb/libhw.c
index 2462351..c0de30e 100644
--- a/hw/usb/libhw.c
+++ b/hw/usb/libhw.c
@@ -26,15 +26,15 @@
 
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
 {
-    int is_write = (p->pid == USB_TOKEN_IN);
-    target_phys_addr_t len;
+    DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+        DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
+    dma_addr_t len;
     void *mem;
     int i;
 
     for (i = 0; i < sgl->nsg; i++) {
         len = sgl->sg[i].len;
-        mem = cpu_physical_memory_map(sgl->sg[i].base, &len,
-                                      is_write);
+        mem = dma_memory_map(sgl->dma, sgl->sg[i].base, &len, dir);
         if (!mem) {
             goto err;
         }
@@ -46,18 +46,19 @@ int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
     return 0;
 
 err:
-    usb_packet_unmap(p);
+    usb_packet_unmap(p, sgl);
     return -1;
 }
 
-void usb_packet_unmap(USBPacket *p)
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl)
 {
-    int is_write = (p->pid == USB_TOKEN_IN);
+    DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+        DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
     int i;
 
     for (i = 0; i < p->iov.niov; i++) {
-        cpu_physical_memory_unmap(p->iov.iov[i].iov_base,
-                                  p->iov.iov[i].iov_len, is_write,
-                                  p->iov.iov[i].iov_len);
+        dma_memory_unmap(sgl->dma, p->iov.iov[i].iov_base,
+                         p->iov.iov[i].iov_len, dir,
+                         p->iov.iov[i].iov_len);
     }
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (6 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, anthony, Eduard - Gabriel Munteanu,
	David Gibson, Richard Henderson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the basic infrastructure necessary to emulate an IOMMU
visible to the guest.  The DMAContext structure is extended with
information and a callback describing the translation, and the various
DMA functions used by devices will now perform IOMMU translation using
this callback.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma-helpers.c |  155 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dma.h         |  118 ++++++++++++++++++++++++++++++++++---------
 hw/qdev-dma.h |    4 +-
 3 files changed, 250 insertions(+), 27 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 2dc4691..b4ee827 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -9,6 +9,10 @@
 
 #include "dma.h"
 #include "trace.h"
+#include "range.h"
+#include "qemu-thread.h"
+
+/* #define DEBUG_IOMMU */
 
 void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma)
 {
@@ -244,3 +248,154 @@ void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
 {
     bdrv_acct_start(bs, cookie, sg->size, type);
 }
+
+bool iommu_dma_memory_valid(DMAContext *dma, dma_addr_t addr, dma_addr_t len,
+                            DMADirection dir)
+{
+    target_phys_addr_t paddr, plen;
+
+#ifdef DEBUG_IOMMU
+    fprintf(stderr, "dma_memory_check context=%p addr=0x" DMA_ADDR_FMT
+            " len=0x" DMA_ADDR_FMT " dir=%d\n", dma, addr, len, dir);
+#endif
+
+    while (len) {
+        if (dma->translate(dma, addr, &paddr, &plen, dir) != 0) {
+            return false;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        len -= plen;
+        addr += plen;
+    }
+
+    return true;
+}
+
+int iommu_dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+                        void *buf, dma_addr_t len, DMADirection dir)
+{
+    target_phys_addr_t paddr, plen;
+    int err;
+
+#ifdef DEBUG_IOMMU
+    fprintf(stderr, "dma_memory_rw context=%p addr=0x" DMA_ADDR_FMT " len=0x"
+            DMA_ADDR_FMT " dir=%d\n", dma, addr, len, dir);
+#endif
+
+    while (len) {
+        err = dma->translate(dma, addr, &paddr, &plen, dir);
+        if (err) {
+            return -1;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen,
+                               dir == DMA_DIRECTION_FROM_DEVICE);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+
+    return 0;
+}
+
+int iommu_dma_memory_set(DMAContext *dma, dma_addr_t addr, uint8_t c,
+                         dma_addr_t len)
+{
+    target_phys_addr_t paddr, plen;
+    int err;
+
+#ifdef DEBUG_IOMMU
+    fprintf(stderr, "dma_memory_zero context=%p addr=0x" DMA_ADDR_FMT
+            " len=0x" DMA_ADDR_FMT "\n", dma, addr, len);
+#endif
+
+    while (len) {
+        err = dma->translate(dma, addr, &paddr, &plen,
+                             DMA_DIRECTION_FROM_DEVICE);
+        if (err) {
+            return err;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_set(paddr, c, plen);
+
+        len -= plen;
+        addr += plen;
+    }
+
+    return 0;
+}
+
+void dma_context_init(DMAContext *dma, DMATranslateFunc translate,
+                      DMAMapFunc map, DMAUnmapFunc unmap)
+{
+#ifdef DEBUG_IOMMU
+    fprintf(stderr, "dma_context_init(%p, %p, %p, %p)\n",
+            dma, translate, map, unmap);
+#endif
+    dma->translate = translate;
+    dma->map = map;
+    dma->unmap = unmap;
+}
+
+void *iommu_dma_memory_map(DMAContext *dma, dma_addr_t addr, dma_addr_t *len,
+                           DMADirection dir)
+{
+    int err;
+    target_phys_addr_t paddr, plen;
+    void *buf;
+
+    if (dma->map) {
+        return dma->map(dma, addr, len, dir);
+    }
+
+    plen = *len;
+    err = dma->translate(dma, addr, &paddr, &plen, dir);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    buf = cpu_physical_memory_map(paddr, &plen,
+                                  dir == DMA_DIRECTION_FROM_DEVICE);
+    *len = plen;
+
+    return buf;
+}
+
+void iommu_dma_memory_unmap(DMAContext *dma, void *buffer, dma_addr_t len,
+                            DMADirection dir, dma_addr_t access_len)
+{
+    if (dma->unmap) {
+        dma->unmap(dma, buffer, len, dir, access_len);
+        return;
+    }
+
+    cpu_physical_memory_unmap(buffer, len,
+                              dir == DMA_DIRECTION_FROM_DEVICE,
+                              access_len);
+
+}
diff --git a/dma.h b/dma.h
index cd002c7..14fe17d 100644
--- a/dma.h
+++ b/dma.h
@@ -14,6 +14,7 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef struct DMAContext DMAContext;
 typedef struct ScatterGatherEntry ScatterGatherEntry;
 
 typedef enum {
@@ -30,28 +31,74 @@ struct QEMUSGList {
 };
 
 #if defined(TARGET_PHYS_ADDR_BITS)
-typedef target_phys_addr_t dma_addr_t;
 
-#define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
-#define DMA_ADDR_FMT TARGET_FMT_plx
+/*
+ * When an IOMMU is present, bus addresses become distinct from
+ * CPU/memory physical addresses and may be a different size.  Because
+ * the IOVA size depends more on the bus than on the platform, we more
+ * or less have to treat these as 64-bit always to cover all (or at
+ * least most) cases.
+ */
+typedef uint64_t dma_addr_t;
+
+#define DMA_ADDR_BITS 64
+#define DMA_ADDR_FMT "%" PRIx64
+
+typedef int DMATranslateFunc(DMAContext *dma,
+                             dma_addr_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             DMADirection dir);
+typedef void* DMAMapFunc(DMAContext *dma,
+                         dma_addr_t addr,
+                         dma_addr_t *len,
+                         DMADirection dir);
+typedef void DMAUnmapFunc(DMAContext *dma,
+                          void *buffer,
+                          dma_addr_t len,
+                          DMADirection dir,
+                          dma_addr_t access_len);
+
+typedef struct DMAContext {
+    DMATranslateFunc *translate;
+    DMAMapFunc *map;
+    DMAUnmapFunc *unmap;
+} DMAContext;
+
+static inline bool dma_has_iommu(DMAContext *dma)
+{
+    return !!dma;
+}
 
 /* Checks that the given range of addresses is valid for DMA.  This is
  * useful for certain cases, but usually you should just use
  * dma_memory_{read,write}() and check for errors */
-static inline bool dma_memory_valid(DMAContext *dma, dma_addr_t addr,
-                                    dma_addr_t len, DMADirection dir)
+bool iommu_dma_memory_valid(DMAContext *dma, dma_addr_t addr, dma_addr_t len,
+                            DMADirection dir);
+static inline bool dma_memory_valid(DMAContext *dma,
+                                    dma_addr_t addr, dma_addr_t len,
+                                    DMADirection dir)
 {
-    /* Stub version, with no iommu we assume all bus addresses are valid */
-    return true;
+    if (!dma_has_iommu(dma)) {
+        return true;
+    } else {
+        return iommu_dma_memory_valid(dma, addr, len, dir);
+    }
 }
 
+int iommu_dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+                        void *buf, dma_addr_t len, DMADirection dir);
 static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
                                 void *buf, dma_addr_t len, DMADirection dir)
 {
-    /* Stub version when we have no iommu support */
-    cpu_physical_memory_rw(addr, buf, (target_phys_addr_t)len,
-                           dir == DMA_DIRECTION_FROM_DEVICE);
-    return 0;
+    if (!dma_has_iommu(dma)) {
+        /* Fast-path for no IOMMU */
+        cpu_physical_memory_rw(addr, buf, len,
+                               dir == DMA_DIRECTION_FROM_DEVICE);
+        return 0;
+    } else {
+        return iommu_dma_memory_rw(dma, addr, buf, len, dir);
+    }
 }
 
 static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
@@ -67,34 +114,54 @@ static inline int dma_memory_write(DMAContext *dma, dma_addr_t addr,
                          DMA_DIRECTION_FROM_DEVICE);
 }
 
+int iommu_dma_memory_set(DMAContext *dma, dma_addr_t addr, uint8_t c,
+                         dma_addr_t len);
 static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
                                  uint8_t c, dma_addr_t len)
 {
-    /* Stub version when we have no iommu support */
-    cpu_physical_memory_set(addr, c, len);
-    return 0;
+    if (!dma_has_iommu(dma)) {
+        /* Fast-path for no IOMMU */
+        cpu_physical_memory_set(addr, c, len);
+        return 0;
+    } else {
+        return iommu_dma_memory_set(dma, addr, c, len);
+    }
 }
 
+void *iommu_dma_memory_map(DMAContext *dma,
+                           dma_addr_t addr, dma_addr_t *len,
+                           DMADirection dir);
 static inline void *dma_memory_map(DMAContext *dma,
                                    dma_addr_t addr, dma_addr_t *len,
                                    DMADirection dir)
 {
-    target_phys_addr_t xlen = *len;
-    void *p;
-
-    p = cpu_physical_memory_map(addr, &xlen,
-                                dir == DMA_DIRECTION_FROM_DEVICE);
-    *len = xlen;
-    return p;
+    if (!dma_has_iommu(dma)) {
+        target_phys_addr_t xlen = *len;
+        void *p;
+
+        p = cpu_physical_memory_map(addr, &xlen,
+                                    dir == DMA_DIRECTION_FROM_DEVICE);
+        *len = xlen;
+        return p;
+    } else {
+        return iommu_dma_memory_map(dma, addr, len, dir);
+    }
 }
 
+void iommu_dma_memory_unmap(DMAContext *dma,
+                            void *buffer, dma_addr_t len,
+                            DMADirection dir, dma_addr_t access_len);
 static inline void dma_memory_unmap(DMAContext *dma,
                                     void *buffer, dma_addr_t len,
                                     DMADirection dir, dma_addr_t access_len)
 {
-    return cpu_physical_memory_unmap(buffer, (target_phys_addr_t)len,
-                                     dir == DMA_DIRECTION_FROM_DEVICE,
-                                     access_len);
+    if (!dma_has_iommu(dma)) {
+        return cpu_physical_memory_unmap(buffer, (target_phys_addr_t)len,
+                                         dir == DMA_DIRECTION_FROM_DEVICE,
+                                         access_len);
+    } else {
+        iommu_dma_memory_unmap(dma, buffer, len, dir, access_len);
+    }
 }
 
 #define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
@@ -135,6 +202,9 @@ DEFINE_LDST_DMA(q, q, 64, be);
 
 #undef DEFINE_LDST_DMA
 
+void dma_context_init(DMAContext *dma, DMATranslateFunc translate,
+                      DMAMapFunc map, DMAUnmapFunc unmap);
+
 struct ScatterGatherEntry {
     dma_addr_t base;
     dma_addr_t len;
diff --git a/hw/qdev-dma.h b/hw/qdev-dma.h
index f0ff558..6812735 100644
--- a/hw/qdev-dma.h
+++ b/hw/qdev-dma.h
@@ -6,7 +6,5 @@
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
  */
-#include "qdev-addr.h"
-
 #define DEFINE_PROP_DMAADDR(_n, _s, _f, _d)                               \
-    DEFINE_PROP_TADDR(_n, _s, _f, _d)
+    DEFINE_PROP_HEX64(_n, _s, _f, _d)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (7 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:25   ` Anthony Liguori
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure Benjamin Herrenschmidt
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

One new complication raised by IOMMU support over only handling DMA
directly to physical addresses is handling dma_memory_map() case
(replacing cpu_physical_memory_map()) when the IOMMU translation the
IOVAs covered by such a map are invalidated or changed while the map
is active.  This should never happen with correct guest software, but
we do need to handle buggy guests.  This case might also occur during
handovers between different guest software stages if the handover
protocols aren't fully seamless.

The iommu implementation will have to wait for maps to be removed
before it can "complete" an invalidation of a translation, which
can take a long time. In order to make it possible to speed that
process up, we add a "Cancel" callback to the map function which
the clients can optionally provide.

The core makes no use of that, but the iommu backend implementation
may choose to keep track of maps and call the respective cancel
callback whenever a translation within a map is removed, allowing
the driver to do things like cancel async IOs etc.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma-helpers.c |   49 ++++++++++++++++++++++++++++---------------------
 dma.h         |   23 +++++++++++++++++++----
 2 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index b4ee827..6e6c7b3 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -107,6 +107,28 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
     }
 }
 
+static void dma_aio_cancel(BlockDriverAIOCB *acb)
+{
+    DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
+
+    trace_dma_aio_cancel(dbs);
+
+    if (dbs->acb) {
+        BlockDriverAIOCB *acb = dbs->acb;
+        dbs->acb = NULL;
+        dbs->in_cancel = true;
+        bdrv_aio_cancel(acb);
+        dbs->in_cancel = false;
+    }
+    dbs->common.cb = NULL;
+    dma_complete(dbs, 0);
+}
+
+static void dma_bdrv_cancel_cb(void *opaque)
+{
+    dma_aio_cancel(&((DMAAIOCB *)opaque)->common);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -127,7 +149,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = dma_memory_map(dbs->sg->dma, cur_addr, &cur_len, dbs->dir);
+        mem = dma_memory_map_with_cancel(dbs->sg->dma, dma_bdrv_cancel_cb, dbs,
+                                         cur_addr, &cur_len, dbs->dir);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
@@ -149,23 +172,6 @@ static void dma_bdrv_cb(void *opaque, int ret)
     assert(dbs->acb);
 }
 
-static void dma_aio_cancel(BlockDriverAIOCB *acb)
-{
-    DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
-
-    trace_dma_aio_cancel(dbs);
-
-    if (dbs->acb) {
-        BlockDriverAIOCB *acb = dbs->acb;
-        dbs->acb = NULL;
-        dbs->in_cancel = true;
-        bdrv_aio_cancel(acb);
-        dbs->in_cancel = false;
-    }
-    dbs->common.cb = NULL;
-    dma_complete(dbs, 0);
-}
-
 static AIOPool dma_aio_pool = {
     .aiocb_size         = sizeof(DMAAIOCB),
     .cancel             = dma_aio_cancel,
@@ -353,7 +359,9 @@ void dma_context_init(DMAContext *dma, DMATranslateFunc translate,
     dma->unmap = unmap;
 }
 
-void *iommu_dma_memory_map(DMAContext *dma, dma_addr_t addr, dma_addr_t *len,
+void *iommu_dma_memory_map(DMAContext *dma,
+                           DMACancelMapFunc cb, void *cb_opaque,
+                           dma_addr_t addr, dma_addr_t *len,
                            DMADirection dir)
 {
     int err;
@@ -361,7 +369,7 @@ void *iommu_dma_memory_map(DMAContext *dma, dma_addr_t addr, dma_addr_t *len,
     void *buf;
 
     if (dma->map) {
-        return dma->map(dma, addr, len, dir);
+        return dma->map(dma, cb, cb_opaque, addr, len, dir);
     }
 
     plen = *len;
@@ -397,5 +405,4 @@ void iommu_dma_memory_unmap(DMAContext *dma, void *buffer, dma_addr_t len,
     cpu_physical_memory_unmap(buffer, len,
                               dir == DMA_DIRECTION_FROM_DEVICE,
                               access_len);
-
 }
diff --git a/dma.h b/dma.h
index 14fe17d..f1fcb71 100644
--- a/dma.h
+++ b/dma.h
@@ -49,10 +49,15 @@ typedef int DMATranslateFunc(DMAContext *dma,
                              target_phys_addr_t *paddr,
                              target_phys_addr_t *len,
                              DMADirection dir);
+
+typedef void DMACancelMapFunc(void *);
 typedef void* DMAMapFunc(DMAContext *dma,
+                         DMACancelMapFunc cb,
+                         void *cb_opaque,	 
                          dma_addr_t addr,
                          dma_addr_t *len,
                          DMADirection dir);
+
 typedef void DMAUnmapFunc(DMAContext *dma,
                           void *buffer,
                           dma_addr_t len,
@@ -129,11 +134,15 @@ static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
 }
 
 void *iommu_dma_memory_map(DMAContext *dma,
+                           DMACancelMapFunc *cb, void *opaque,
                            dma_addr_t addr, dma_addr_t *len,
                            DMADirection dir);
-static inline void *dma_memory_map(DMAContext *dma,
-                                   dma_addr_t addr, dma_addr_t *len,
-                                   DMADirection dir)
+static inline void *dma_memory_map_with_cancel(DMAContext *dma,
+                                               DMACancelMapFunc *cb,
+                                               void *opaque,
+                                               dma_addr_t addr,
+                                               dma_addr_t *len,
+                                               DMADirection dir)
 {
     if (!dma_has_iommu(dma)) {
         target_phys_addr_t xlen = *len;
@@ -144,9 +153,15 @@ static inline void *dma_memory_map(DMAContext *dma,
         *len = xlen;
         return p;
     } else {
-        return iommu_dma_memory_map(dma, addr, len, dir);
+        return iommu_dma_memory_map(dma, cb, opaque, addr, len, dir);
     }
 }
+static inline void *dma_memory_map(DMAContext *dma,
+                                   dma_addr_t addr, dma_addr_t *len,
+                                   DMADirection dir)
+{
+    return dma_memory_map_with_cancel(dma, NULL, NULL, addr, len, dir);
+}
 
 void iommu_dma_memory_unmap(DMAContext *dma,
                             void *buffer, dma_addr_t len,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (8 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use " Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Graf, anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The pseries platform already contains an IOMMU implementation, since it is
essential for the platform's paravirtualized VIO devices.  This IOMMU
support is currently built into the implementation of the VIO "bus" and
the various VIO devices.

This patch converts this code to make use of the new common IOMMU
infrastructure.

We don't yet handle synchronization of map/unmap callbacks vs. invalidations,
this will require some complex interaction with the kernel and is not a
major concern at this stage.

Cc: Alex Graf <agraf@suse.de>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/ppc/Makefile.objs |    2 +-
 hw/spapr.c           |    3 +
 hw/spapr.h           |   16 +++
 hw/spapr_iommu.c     |  242 +++++++++++++++++++++++++++++++++++++++++++
 hw/spapr_llan.c      |   63 +++++------
 hw/spapr_vio.c       |  281 ++++----------------------------------------------
 hw/spapr_vio.h       |   73 ++++++-------
 hw/spapr_vscsi.c     |   26 ++---
 hw/spapr_vty.c       |    2 +-
 target-ppc/kvm.c     |    4 +-
 10 files changed, 369 insertions(+), 343 deletions(-)
 create mode 100644 hw/spapr_iommu.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 44a1e8c..f573a95 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -10,7 +10,7 @@ obj-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
 obj-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o pci-hotplug.o spapr_iommu.o
 # PowerPC 4xx boards
 obj-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o
 obj-y += ppc440_bamboo.o
diff --git a/hw/spapr.c b/hw/spapr.c
index d0bddbc..8bdf0d1 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -628,6 +628,9 @@ static void ppc_spapr_init(ram_addr_t ram_size,
     spapr->icp = xics_system_init(XICS_IRQS);
     spapr->next_irq = 16;
 
+    /* Set up IOMMU */
+    spapr_iommu_init();
+
     /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
 
diff --git a/hw/spapr.h b/hw/spapr.h
index 654a7a8..df3e8b1 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -319,4 +319,20 @@ target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
 int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
                                  target_phys_addr_t rtas_size);
 
+#define SPAPR_TCE_PAGE_SHIFT   12
+#define SPAPR_TCE_PAGE_SIZE    (1ULL << SPAPR_TCE_PAGE_SHIFT)
+#define SPAPR_TCE_PAGE_MASK    (SPAPR_TCE_PAGE_SIZE - 1)
+
+typedef struct sPAPRTCE {
+    uint64_t tce;
+} sPAPRTCE;
+
+#define SPAPR_VIO_BASE_LIOBN    0x00000000
+
+void spapr_iommu_init(void);
+DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
+void spapr_tce_free(DMAContext *dma);
+int spapr_dma_dt(void *fdt, int node_off, const char *propname,
+                 DMAContext *dma);
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
new file mode 100644
index 0000000..5a769b9
--- /dev/null
+++ b/hw/spapr_iommu.c
@@ -0,0 +1,242 @@
+/*
+ * QEMU sPAPR IOMMU (TCE) code
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "hw.h"
+#include "kvm.h"
+#include "qdev.h"
+#include "kvm_ppc.h"
+#include "dma.h"
+
+#include "hw/spapr.h"
+
+#include <libfdt.h>
+
+/* #define DEBUG_TCE */
+
+enum sPAPRTCEAccess {
+    SPAPR_TCE_FAULT = 0,
+    SPAPR_TCE_RO = 1,
+    SPAPR_TCE_WO = 2,
+    SPAPR_TCE_RW = 3,
+};
+
+typedef struct sPAPRTCETable sPAPRTCETable;
+
+struct sPAPRTCETable {
+    DMAContext dma;
+    uint32_t liobn;
+    uint32_t window_size;
+    sPAPRTCE *table;
+    int fd;
+    QLIST_ENTRY(sPAPRTCETable) list;
+};
+
+
+QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables;
+
+static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
+{
+    sPAPRTCETable *tcet;
+
+    QLIST_FOREACH(tcet, &spapr_tce_tables, list) {
+        if (tcet->liobn == liobn) {
+            return tcet;
+        }
+    }
+
+    return NULL;
+}
+
+static int spapr_tce_translate(DMAContext *dma,
+                               dma_addr_t addr,
+                               target_phys_addr_t *paddr,
+                               target_phys_addr_t *len,
+                               DMADirection dir)
+{
+    sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+    enum sPAPRTCEAccess access = (dir == DMA_DIRECTION_FROM_DEVICE)
+        ? SPAPR_TCE_WO : SPAPR_TCE_RO;
+    uint64_t tce;
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_tce_translate liobn=0x%" PRIx32 " addr=0x"
+            DMA_ADDR_FMT "\n", tcet->liobn, addr);
+#endif
+
+    /* Check if we are in bound */
+    if (addr >= tcet->window_size) {
+#ifdef DEBUG_TCE
+        fprintf(stderr, "spapr_tce_translate out of bounds\n");
+#endif
+        return -EFAULT;
+    }
+
+    tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT].tce;
+
+    /* Check TCE */
+    if (!(tce & access)) {
+        return -EPERM;
+    }
+
+    /* How much til end of page ? */
+    *len = ((~addr) & SPAPR_TCE_PAGE_MASK) + 1;
+
+    /* Translate */
+    *paddr = (tce & ~SPAPR_TCE_PAGE_MASK) |
+        (addr & SPAPR_TCE_PAGE_MASK);
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, " ->  *paddr=0x" TARGET_FMT_plx ", *len=0x"
+            TARGET_FMT_plx "\n", *paddr, *len);
+#endif
+
+    return 0;
+}
+
+DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size)
+{
+    sPAPRTCETable *tcet;
+
+    if (!window_size) {
+        return NULL;
+    }
+
+    tcet = g_malloc0(sizeof(*tcet));
+    dma_context_init(&tcet->dma, spapr_tce_translate, NULL, NULL);
+
+    tcet->liobn = liobn;
+    tcet->window_size = window_size;
+
+    if (kvm_enabled()) {
+        tcet->table = kvmppc_create_spapr_tce(liobn,
+                                              window_size,
+                                              &tcet->fd);
+    }
+
+    if (!tcet->table) {
+        size_t table_size = (window_size >> SPAPR_TCE_PAGE_SHIFT)
+            * sizeof(sPAPRTCE);
+        tcet->table = g_malloc0(table_size);
+    }
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_iommu: New TCE table, liobn=0x%x, context @ %p, "
+            "table @ %p, fd=%d\n", liobn, &tcet->dma, tcet->table, tcet->fd);
+#endif
+
+    QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
+
+    return &tcet->dma;
+}
+
+void spapr_tce_free(DMAContext *dma)
+{
+
+    if (dma) {
+        sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+
+        QLIST_REMOVE(tcet, list);
+
+        if (!kvm_enabled() ||
+            (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
+                                     tcet->window_size) != 0)) {
+            g_free(tcet->table);
+        }
+
+        g_free(tcet);
+    }
+}
+
+
+static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
+                              target_ulong opcode, target_ulong *args)
+{
+    target_ulong liobn = args[0];
+    target_ulong ioba = args[1];
+    target_ulong tce = args[2];
+    sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
+    sPAPRTCE *tcep;
+
+    if (liobn & 0xFFFFFFFF00000000ULL) {
+        hcall_dprintf("spapr_vio_put_tce on out-of-boundsw LIOBN "
+                      TARGET_FMT_lx "\n", liobn);
+        return H_PARAMETER;
+    }
+    if (!tcet) {
+        hcall_dprintf("spapr_vio_put_tce on non-existent LIOBN "
+                      TARGET_FMT_lx "\n", liobn);
+        return H_PARAMETER;
+    }
+
+    ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
+
+#ifdef DEBUG_TCE
+    fprintf(stderr, "spapr_vio_put_tce on liobn=" TARGET_FMT_lx /*%s*/
+            "  ioba 0x" TARGET_FMT_lx "  TCE 0x" TARGET_FMT_lx "\n",
+            liobn, /*dev->qdev.id, */ioba, tce);
+#endif
+
+    if (ioba >= tcet->window_size) {
+        hcall_dprintf("spapr_vio_put_tce on out-of-boards IOBA 0x"
+                      TARGET_FMT_lx "\n", ioba);
+        return H_PARAMETER;
+    }
+
+    tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
+    tcep->tce = tce;
+
+    return H_SUCCESS;
+}
+
+void spapr_iommu_init(void)
+{
+    QLIST_INIT(&spapr_tce_tables);
+
+    /* hcall-tce */
+    spapr_register_hypercall(H_PUT_TCE, h_put_tce);
+}
+
+int spapr_dma_dt(void *fdt, int node_off, const char *propname,
+                 DMAContext *dma)
+{
+    if (dma) {
+        sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
+        uint32_t dma_prop[] = {cpu_to_be32(tcet->liobn),
+                               0, 0,
+                               0, cpu_to_be32(tcet->window_size)};
+        int ret;
+
+        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-address-cells", 2);
+        if (ret < 0) {
+            return ret;
+        }
+
+        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-size-cells", 2);
+        if (ret < 0) {
+            return ret;
+        }
+
+        ret = fdt_setprop(fdt, node_off, propname, dma_prop,
+                          sizeof(dma_prop));
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
diff --git a/hw/spapr_llan.c b/hw/spapr_llan.c
index 8313043..d26fe9f 100644
--- a/hw/spapr_llan.c
+++ b/hw/spapr_llan.c
@@ -71,7 +71,7 @@ typedef uint64_t vlan_bd_t;
 #define VLAN_RXQ_BD_OFF      0
 #define VLAN_FILTER_BD_OFF   8
 #define VLAN_RX_BDS_OFF      16
-#define VLAN_MAX_BUFS        ((SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF) / 8)
+#define VLAN_MAX_BUFS        ((SPAPR_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF) / 8)
 
 typedef struct VIOsPAPRVLANDevice {
     VIOsPAPRDevice sdev;
@@ -95,7 +95,7 @@ static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
 {
     VIOsPAPRDevice *sdev = DO_UPCAST(NICState, nc, nc)->opaque;
     VIOsPAPRVLANDevice *dev = (VIOsPAPRVLANDevice *)sdev;
-    vlan_bd_t rxq_bd = ldq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
+    vlan_bd_t rxq_bd = vio_ldq(sdev, dev->buf_list + VLAN_RXQ_BD_OFF);
     vlan_bd_t bd;
     int buf_ptr = dev->use_buf_ptr;
     uint64_t handle;
@@ -114,11 +114,11 @@ static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
 
     do {
         buf_ptr += 8;
-        if (buf_ptr >= SPAPR_VIO_TCE_PAGE_SIZE) {
+        if (buf_ptr >= SPAPR_TCE_PAGE_SIZE) {
             buf_ptr = VLAN_RX_BDS_OFF;
         }
 
-        bd = ldq_tce(sdev, dev->buf_list + buf_ptr);
+        bd = vio_ldq(sdev, dev->buf_list + buf_ptr);
         dprintf("use_buf_ptr=%d bd=0x%016llx\n",
                 buf_ptr, (unsigned long long)bd);
     } while ((!(bd & VLAN_BD_VALID) || (VLAN_BD_LEN(bd) < (size + 8)))
@@ -132,12 +132,12 @@ static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
     /* Remove the buffer from the pool */
     dev->rx_bufs--;
     dev->use_buf_ptr = buf_ptr;
-    stq_tce(sdev, dev->buf_list + dev->use_buf_ptr, 0);
+    vio_stq(sdev, dev->buf_list + dev->use_buf_ptr, 0);
 
     dprintf("Found buffer: ptr=%d num=%d\n", dev->use_buf_ptr, dev->rx_bufs);
 
     /* Transfer the packet data */
-    if (spapr_tce_dma_write(sdev, VLAN_BD_ADDR(bd) + 8, buf, size) < 0) {
+    if (spapr_vio_dma_write(sdev, VLAN_BD_ADDR(bd) + 8, buf, size) < 0) {
         return -1;
     }
 
@@ -149,23 +149,23 @@ static ssize_t spapr_vlan_receive(VLANClientState *nc, const uint8_t *buf,
         control ^= VLAN_RXQC_TOGGLE;
     }
 
-    handle = ldq_tce(sdev, VLAN_BD_ADDR(bd));
-    stq_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 8, handle);
-    stw_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 4, size);
-    sth_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 2, 8);
-    stb_tce(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr, control);
+    handle = vio_ldq(sdev, VLAN_BD_ADDR(bd));
+    vio_stq(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 8, handle);
+    vio_stl(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 4, size);
+    vio_sth(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr + 2, 8);
+    vio_stb(sdev, VLAN_BD_ADDR(rxq_bd) + dev->rxq_ptr, control);
 
     dprintf("wrote rxq entry (ptr=0x%llx): 0x%016llx 0x%016llx\n",
             (unsigned long long)dev->rxq_ptr,
-            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
+            (unsigned long long)vio_ldq(sdev, VLAN_BD_ADDR(rxq_bd) +
                                         dev->rxq_ptr),
-            (unsigned long long)ldq_tce(sdev, VLAN_BD_ADDR(rxq_bd) +
+            (unsigned long long)vio_ldq(sdev, VLAN_BD_ADDR(rxq_bd) +
                                         dev->rxq_ptr + 8));
 
     dev->rxq_ptr += 16;
     if (dev->rxq_ptr >= VLAN_BD_LEN(rxq_bd)) {
         dev->rxq_ptr = 0;
-        stq_tce(sdev, dev->buf_list + VLAN_RXQ_BD_OFF, rxq_bd ^ VLAN_BD_TOGGLE);
+        vio_stq(sdev, dev->buf_list + VLAN_RXQ_BD_OFF, rxq_bd ^ VLAN_BD_TOGGLE);
     }
 
     if (sdev->signal_state & 1) {
@@ -254,8 +254,10 @@ static int check_bd(VIOsPAPRVLANDevice *dev, vlan_bd_t bd,
         return -1;
     }
 
-    if (spapr_vio_check_tces(&dev->sdev, VLAN_BD_ADDR(bd),
-                             VLAN_BD_LEN(bd), SPAPR_TCE_RW) != 0) {
+    if (!spapr_vio_dma_valid(&dev->sdev, VLAN_BD_ADDR(bd),
+                             VLAN_BD_LEN(bd), DMA_DIRECTION_FROM_DEVICE)
+        || !spapr_vio_dma_valid(&dev->sdev, VLAN_BD_ADDR(bd),
+                                VLAN_BD_LEN(bd), DMA_DIRECTION_TO_DEVICE)) {
         return -1;
     }
 
@@ -285,14 +287,14 @@ static target_ulong h_register_logical_lan(CPUPPCState *env,
         return H_RESOURCE;
     }
 
-    if (check_bd(dev, VLAN_VALID_BD(buf_list, SPAPR_VIO_TCE_PAGE_SIZE),
-                 SPAPR_VIO_TCE_PAGE_SIZE) < 0) {
+    if (check_bd(dev, VLAN_VALID_BD(buf_list, SPAPR_TCE_PAGE_SIZE),
+                 SPAPR_TCE_PAGE_SIZE) < 0) {
         hcall_dprintf("Bad buf_list 0x" TARGET_FMT_lx "\n", buf_list);
         return H_PARAMETER;
     }
 
-    filter_list_bd = VLAN_VALID_BD(filter_list, SPAPR_VIO_TCE_PAGE_SIZE);
-    if (check_bd(dev, filter_list_bd, SPAPR_VIO_TCE_PAGE_SIZE) < 0) {
+    filter_list_bd = VLAN_VALID_BD(filter_list, SPAPR_TCE_PAGE_SIZE);
+    if (check_bd(dev, filter_list_bd, SPAPR_TCE_PAGE_SIZE) < 0) {
         hcall_dprintf("Bad filter_list 0x" TARGET_FMT_lx "\n", filter_list);
         return H_PARAMETER;
     }
@@ -309,17 +311,17 @@ static target_ulong h_register_logical_lan(CPUPPCState *env,
     rec_queue &= ~VLAN_BD_TOGGLE;
 
     /* Initialize the buffer list */
-    stq_tce(sdev, buf_list, rec_queue);
-    stq_tce(sdev, buf_list + 8, filter_list_bd);
-    spapr_tce_dma_zero(sdev, buf_list + VLAN_RX_BDS_OFF,
-                       SPAPR_VIO_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF);
+    vio_stq(sdev, buf_list, rec_queue);
+    vio_stq(sdev, buf_list + 8, filter_list_bd);
+    spapr_vio_dma_set(sdev, buf_list + VLAN_RX_BDS_OFF, 0,
+                      SPAPR_TCE_PAGE_SIZE - VLAN_RX_BDS_OFF);
     dev->add_buf_ptr = VLAN_RX_BDS_OFF - 8;
     dev->use_buf_ptr = VLAN_RX_BDS_OFF - 8;
     dev->rx_bufs = 0;
     dev->rxq_ptr = 0;
 
     /* Initialize the receive queue */
-    spapr_tce_dma_zero(sdev, VLAN_BD_ADDR(rec_queue), VLAN_BD_LEN(rec_queue));
+    spapr_vio_dma_set(sdev, VLAN_BD_ADDR(rec_queue), 0, VLAN_BD_LEN(rec_queue));
 
     dev->isopen = 1;
     return H_SUCCESS;
@@ -378,14 +380,14 @@ static target_ulong h_add_logical_lan_buffer(CPUPPCState *env,
 
     do {
         dev->add_buf_ptr += 8;
-        if (dev->add_buf_ptr >= SPAPR_VIO_TCE_PAGE_SIZE) {
+        if (dev->add_buf_ptr >= SPAPR_TCE_PAGE_SIZE) {
             dev->add_buf_ptr = VLAN_RX_BDS_OFF;
         }
 
-        bd = ldq_tce(sdev, dev->buf_list + dev->add_buf_ptr);
+        bd = vio_ldq(sdev, dev->buf_list + dev->add_buf_ptr);
     } while (bd & VLAN_BD_VALID);
 
-    stq_tce(sdev, dev->buf_list + dev->add_buf_ptr, buf);
+    vio_stq(sdev, dev->buf_list + dev->add_buf_ptr, buf);
 
     dev->rx_bufs++;
 
@@ -451,7 +453,7 @@ static target_ulong h_send_logical_lan(CPUPPCState *env, sPAPREnvironment *spapr
     lbuf = alloca(total_len);
     p = lbuf;
     for (i = 0; i < nbufs; i++) {
-        ret = spapr_tce_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
+        ret = spapr_vio_dma_read(sdev, VLAN_BD_ADDR(bufs[i]),
                                  p, VLAN_BD_LEN(bufs[i]));
         if (ret < 0) {
             return ret;
@@ -479,7 +481,7 @@ static target_ulong h_multicast_ctrl(CPUPPCState *env, sPAPREnvironment *spapr,
 }
 
 static Property spapr_vlan_properties[] = {
-    DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev, 0x10000000),
+    DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev),
     DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -497,6 +499,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void *data)
     k->dt_compatible = "IBM,l-lan";
     k->signal_mask = 0x1;
     dc->props = spapr_vlan_properties;
+    k->rtce_window_size = 0x10000000;
 }
 
 static TypeInfo spapr_vlan_info = {
diff --git a/hw/spapr_vio.c b/hw/spapr_vio.c
index c8271c6..05b5503 100644
--- a/hw/spapr_vio.c
+++ b/hw/spapr_vio.c
@@ -39,7 +39,6 @@
 #endif /* CONFIG_FDT */
 
 /* #define DEBUG_SPAPR */
-/* #define DEBUG_TCE */
 
 #ifdef DEBUG_SPAPR
 #define dprintf(fmt, ...) \
@@ -143,26 +142,9 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
         }
     }
 
-    if (dev->rtce_window_size) {
-        uint32_t dma_prop[] = {cpu_to_be32(dev->reg),
-                               0, 0,
-                               0, cpu_to_be32(dev->rtce_window_size)};
-
-        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-address-cells", 2);
-        if (ret < 0) {
-            return ret;
-        }
-
-        ret = fdt_setprop_cell(fdt, node_off, "ibm,#dma-size-cells", 2);
-        if (ret < 0) {
-            return ret;
-        }
-
-        ret = fdt_setprop(fdt, node_off, "ibm,my-dma-window", dma_prop,
-                          sizeof(dma_prop));
-        if (ret < 0) {
-            return ret;
-        }
+    ret = spapr_dma_dt(fdt, node_off, "ibm,my-dma-window", dev->dma);
+    if (ret < 0) {
+        return ret;
     }
 
     if (pc->devnode) {
@@ -177,232 +159,6 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
 #endif /* CONFIG_FDT */
 
 /*
- * RTCE handling
- */
-
-static void rtce_init(VIOsPAPRDevice *dev)
-{
-    size_t size = (dev->rtce_window_size >> SPAPR_VIO_TCE_PAGE_SHIFT)
-        * sizeof(VIOsPAPR_RTCE);
-
-    if (size) {
-        dev->rtce_table = kvmppc_create_spapr_tce(dev->reg,
-                                                  dev->rtce_window_size,
-                                                  &dev->kvmtce_fd);
-
-        if (!dev->rtce_table) {
-            dev->rtce_table = g_malloc0(size);
-        }
-    }
-}
-
-static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
-                              target_ulong opcode, target_ulong *args)
-{
-    target_ulong liobn = args[0];
-    target_ulong ioba = args[1];
-    target_ulong tce = args[2];
-    VIOsPAPRDevice *dev = spapr_vio_find_by_reg(spapr->vio_bus, liobn);
-    VIOsPAPR_RTCE *rtce;
-
-    if (!dev) {
-        hcall_dprintf("LIOBN 0x" TARGET_FMT_lx " does not exist\n", liobn);
-        return H_PARAMETER;
-    }
-
-    ioba &= ~(SPAPR_VIO_TCE_PAGE_SIZE - 1);
-
-#ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_vio_put_tce on %s  ioba 0x" TARGET_FMT_lx
-            "  TCE 0x" TARGET_FMT_lx "\n", dev->qdev.id, ioba, tce);
-#endif
-
-    if (ioba >= dev->rtce_window_size) {
-        hcall_dprintf("Out-of-bounds IOBA 0x" TARGET_FMT_lx "\n", ioba);
-        return H_PARAMETER;
-    }
-
-    rtce = dev->rtce_table + (ioba >> SPAPR_VIO_TCE_PAGE_SHIFT);
-    rtce->tce = tce;
-
-    return H_SUCCESS;
-}
-
-int spapr_vio_check_tces(VIOsPAPRDevice *dev, target_ulong ioba,
-                         target_ulong len, enum VIOsPAPR_TCEAccess access)
-{
-    int start, end, i;
-
-    start = ioba >> SPAPR_VIO_TCE_PAGE_SHIFT;
-    end = (ioba + len - 1) >> SPAPR_VIO_TCE_PAGE_SHIFT;
-
-    for (i = start; i <= end; i++) {
-        if ((dev->rtce_table[i].tce & access) != access) {
-#ifdef DEBUG_TCE
-            fprintf(stderr, "FAIL on %d\n", i);
-#endif
-            return -1;
-        }
-    }
-
-    return 0;
-}
-
-int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr, const void *buf,
-                        uint32_t size)
-{
-#ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_tce_dma_write taddr=0x%llx size=0x%x\n",
-            (unsigned long long)taddr, size);
-#endif
-
-    /* Check for bypass */
-    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
-        cpu_physical_memory_write(taddr, buf, size);
-        return 0;
-    }
-
-    while (size) {
-        uint64_t tce;
-        uint32_t lsize;
-        uint64_t txaddr;
-
-        /* Check if we are in bound */
-        if (taddr >= dev->rtce_window_size) {
-#ifdef DEBUG_TCE
-            fprintf(stderr, "spapr_tce_dma_write out of bounds\n");
-#endif
-            return H_DEST_PARM;
-        }
-        tce = dev->rtce_table[taddr >> SPAPR_VIO_TCE_PAGE_SHIFT].tce;
-
-        /* How much til end of page ? */
-        lsize = MIN(size, ((~taddr) & SPAPR_VIO_TCE_PAGE_MASK) + 1);
-
-        /* Check TCE */
-        if (!(tce & 2)) {
-            return H_DEST_PARM;
-        }
-
-        /* Translate */
-        txaddr = (tce & ~SPAPR_VIO_TCE_PAGE_MASK) |
-            (taddr & SPAPR_VIO_TCE_PAGE_MASK);
-
-#ifdef DEBUG_TCE
-        fprintf(stderr, " -> write to txaddr=0x%llx, size=0x%x\n",
-                (unsigned long long)txaddr, lsize);
-#endif
-
-        /* Do it */
-        cpu_physical_memory_write(txaddr, buf, lsize);
-        buf += lsize;
-        taddr += lsize;
-        size -= lsize;
-    }
-    return 0;
-}
-
-int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t size)
-{
-    /* FIXME: allocating a temp buffer is nasty, but just stepping
-     * through writing zeroes is awkward.  This will do for now. */
-    uint8_t zeroes[size];
-
-#ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_tce_dma_zero taddr=0x%llx size=0x%x\n",
-            (unsigned long long)taddr, size);
-#endif
-
-    memset(zeroes, 0, size);
-    return spapr_tce_dma_write(dev, taddr, zeroes, size);
-}
-
-void stb_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint8_t val)
-{
-    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
-}
-
-void sth_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint16_t val)
-{
-    val = tswap16(val);
-    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
-}
-
-
-void stw_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t val)
-{
-    val = tswap32(val);
-    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
-}
-
-void stq_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint64_t val)
-{
-    val = tswap64(val);
-    spapr_tce_dma_write(dev, taddr, &val, sizeof(val));
-}
-
-int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr, void *buf,
-                       uint32_t size)
-{
-#ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_tce_dma_write taddr=0x%llx size=0x%x\n",
-            (unsigned long long)taddr, size);
-#endif
-
-    /* Check for bypass */
-    if (dev->flags & VIO_PAPR_FLAG_DMA_BYPASS) {
-        cpu_physical_memory_read(taddr, buf, size);
-        return 0;
-    }
-
-    while (size) {
-        uint64_t tce;
-        uint32_t lsize;
-        uint64_t txaddr;
-
-        /* Check if we are in bound */
-        if (taddr >= dev->rtce_window_size) {
-#ifdef DEBUG_TCE
-            fprintf(stderr, "spapr_tce_dma_read out of bounds\n");
-#endif
-            return H_DEST_PARM;
-        }
-        tce = dev->rtce_table[taddr >> SPAPR_VIO_TCE_PAGE_SHIFT].tce;
-
-        /* How much til end of page ? */
-        lsize = MIN(size, ((~taddr) & SPAPR_VIO_TCE_PAGE_MASK) + 1);
-
-        /* Check TCE */
-        if (!(tce & 1)) {
-            return H_DEST_PARM;
-        }
-
-        /* Translate */
-        txaddr = (tce & ~SPAPR_VIO_TCE_PAGE_MASK) |
-            (taddr & SPAPR_VIO_TCE_PAGE_MASK);
-
-#ifdef DEBUG_TCE
-        fprintf(stderr, " -> write to txaddr=0x%llx, size=0x%x\n",
-                (unsigned long long)txaddr, lsize);
-#endif
-        /* Do it */
-        cpu_physical_memory_read(txaddr, buf, lsize);
-        buf += lsize;
-        taddr += lsize;
-        size -= lsize;
-    }
-    return H_SUCCESS;
-}
-
-uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr)
-{
-    uint64_t val;
-
-    spapr_tce_dma_read(dev, taddr, &val, sizeof(val));
-    return tswap64(val);
-}
-
-/*
  * CRQ handling
  */
 static target_ulong h_reg_crq(CPUPPCState *env, sPAPREnvironment *spapr,
@@ -526,7 +282,7 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
     }
 
     /* Maybe do a fast path for KVM just writing to the pages */
-    rc = spapr_tce_dma_read(dev, dev->crq.qladdr + dev->crq.qnext, &byte, 1);
+    rc = spapr_vio_dma_read(dev, dev->crq.qladdr + dev->crq.qnext, &byte, 1);
     if (rc) {
         return rc;
     }
@@ -534,7 +290,7 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
         return 1;
     }
 
-    rc = spapr_tce_dma_write(dev, dev->crq.qladdr + dev->crq.qnext + 8,
+    rc = spapr_vio_dma_write(dev, dev->crq.qladdr + dev->crq.qnext + 8,
                              &crq[8], 8);
     if (rc) {
         return rc;
@@ -542,7 +298,7 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
 
     kvmppc_eieio();
 
-    rc = spapr_tce_dma_write(dev, dev->crq.qladdr + dev->crq.qnext, crq, 8);
+    rc = spapr_vio_dma_write(dev, dev->crq.qladdr + dev->crq.qnext, crq, 8);
     if (rc) {
         return rc;
     }
@@ -560,13 +316,13 @@ int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq)
 
 static void spapr_vio_quiesce_one(VIOsPAPRDevice *dev)
 {
-    dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+    VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
+    uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
 
-    if (dev->rtce_table) {
-        size_t size = (dev->rtce_window_size >> SPAPR_VIO_TCE_PAGE_SHIFT)
-            * sizeof(VIOsPAPR_RTCE);
-        memset(dev->rtce_table, 0, size);
+    if (dev->dma) {
+        spapr_tce_free(dev->dma);
     }
+    dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
 
     dev->crq.qladdr = 0;
     dev->crq.qsize = 0;
@@ -593,9 +349,13 @@ static void rtas_set_tce_bypass(sPAPREnvironment *spapr, uint32_t token,
         return;
     }
     if (enable) {
-        dev->flags |= VIO_PAPR_FLAG_DMA_BYPASS;
+        spapr_tce_free(dev->dma);
+        dev->dma = NULL;
     } else {
-        dev->flags &= ~VIO_PAPR_FLAG_DMA_BYPASS;
+        VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
+        uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
+
+        dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
     }
 
     rtas_st(rets, 0, 0);
@@ -662,6 +422,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
 {
     VIOsPAPRDevice *dev = (VIOsPAPRDevice *)qdev;
     VIOsPAPRDeviceClass *pc = VIO_SPAPR_DEVICE_GET_CLASS(dev);
+    uint32_t liobn;
     char *id;
 
     if (dev->reg != -1) {
@@ -703,7 +464,8 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
         return -1;
     }
 
-    rtce_init(dev);
+    liobn = SPAPR_VIO_BASE_LIOBN | dev->reg;
+    dev->dma = spapr_tce_new_dma_context(liobn, pc->rtce_window_size);
 
     return pc->init(dev);
 }
@@ -751,9 +513,6 @@ VIOsPAPRBus *spapr_vio_bus_init(void)
     /* hcall-vio */
     spapr_register_hypercall(H_VIO_SIGNAL, h_vio_signal);
 
-    /* hcall-tce */
-    spapr_register_hypercall(H_PUT_TCE, h_put_tce);
-
     /* hcall-crq */
     spapr_register_hypercall(H_REG_CRQ, h_reg_crq);
     spapr_register_hypercall(H_FREE_CRQ, h_free_crq);
diff --git a/hw/spapr_vio.h b/hw/spapr_vio.h
index 2adad77..6f9a498 100644
--- a/hw/spapr_vio.h
+++ b/hw/spapr_vio.h
@@ -21,16 +21,7 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
-#define SPAPR_VIO_TCE_PAGE_SHIFT   12
-#define SPAPR_VIO_TCE_PAGE_SIZE    (1ULL << SPAPR_VIO_TCE_PAGE_SHIFT)
-#define SPAPR_VIO_TCE_PAGE_MASK    (SPAPR_VIO_TCE_PAGE_SIZE - 1)
-
-enum VIOsPAPR_TCEAccess {
-    SPAPR_TCE_FAULT = 0,
-    SPAPR_TCE_RO = 1,
-    SPAPR_TCE_WO = 2,
-    SPAPR_TCE_RW = 3,
-};
+#include "dma.h"
 
 #define TYPE_VIO_SPAPR_DEVICE "vio-spapr-device"
 #define VIO_SPAPR_DEVICE(obj) \
@@ -45,10 +36,6 @@ enum VIOsPAPR_TCEAccess {
 
 struct VIOsPAPRDevice;
 
-typedef struct VIOsPAPR_RTCE {
-    uint64_t tce;
-} VIOsPAPR_RTCE;
-
 typedef struct VIOsPAPR_CRQ {
     uint64_t qladdr;
     uint32_t qsize;
@@ -64,6 +51,7 @@ typedef struct VIOsPAPRDeviceClass {
 
     const char *dt_name, *dt_type, *dt_compatible;
     target_ulong signal_mask;
+    uint32_t rtce_window_size;
     int (*init)(VIOsPAPRDevice *dev);
     void (*reset)(VIOsPAPRDevice *dev);
     int (*devnode)(VIOsPAPRDevice *dev, void *fdt, int node_off);
@@ -73,20 +61,15 @@ struct VIOsPAPRDevice {
     DeviceState qdev;
     uint32_t reg;
     uint32_t flags;
-#define VIO_PAPR_FLAG_DMA_BYPASS        0x1
     qemu_irq qirq;
     uint32_t vio_irq_num;
     target_ulong signal_state;
-    uint32_t rtce_window_size;
-    VIOsPAPR_RTCE *rtce_table;
-    int kvmtce_fd;
     VIOsPAPR_CRQ crq;
+    DMAContext *dma;
 };
 
-#define DEFINE_SPAPR_PROPERTIES(type, field, default_dma_window)       \
-        DEFINE_PROP_UINT32("reg", type, field.reg, -1),                \
-        DEFINE_PROP_UINT32("dma-window", type, field.rtce_window_size, \
-                           default_dma_window)
+#define DEFINE_SPAPR_PROPERTIES(type, field)           \
+        DEFINE_PROP_UINT32("reg", type, field.reg, -1)
 
 struct VIOsPAPRBus {
     BusState bus;
@@ -102,20 +85,38 @@ extern int spapr_populate_chosen_stdout(void *fdt, VIOsPAPRBus *bus);
 
 extern int spapr_vio_signal(VIOsPAPRDevice *dev, target_ulong mode);
 
-int spapr_vio_check_tces(VIOsPAPRDevice *dev, target_ulong ioba,
-                         target_ulong len,
-                         enum VIOsPAPR_TCEAccess access);
-
-int spapr_tce_dma_read(VIOsPAPRDevice *dev, uint64_t taddr,
-                       void *buf, uint32_t size);
-int spapr_tce_dma_write(VIOsPAPRDevice *dev, uint64_t taddr,
-                        const void *buf, uint32_t size);
-int spapr_tce_dma_zero(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t size);
-void stb_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint8_t val);
-void sth_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint16_t val);
-void stw_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint32_t val);
-void stq_tce(VIOsPAPRDevice *dev, uint64_t taddr, uint64_t val);
-uint64_t ldq_tce(VIOsPAPRDevice *dev, uint64_t taddr);
+static inline bool spapr_vio_dma_valid(VIOsPAPRDevice *dev, uint64_t taddr,
+                                       uint32_t size, DMADirection dir)
+{
+    return dma_memory_valid(dev->dma, taddr, size, dir);
+}
+
+static inline int spapr_vio_dma_read(VIOsPAPRDevice *dev, uint64_t taddr,
+                                     void *buf, uint32_t size)
+{
+    return (dma_memory_read(dev->dma, taddr, buf, size) != 0) ?
+        H_DEST_PARM : H_SUCCESS;
+}
+
+static inline int spapr_vio_dma_write(VIOsPAPRDevice *dev, uint64_t taddr,
+                                      const void *buf, uint32_t size)
+{
+    return (dma_memory_write(dev->dma, taddr, buf, size) != 0) ?
+        H_DEST_PARM : H_SUCCESS;
+}
+
+static inline int spapr_vio_dma_set(VIOsPAPRDevice *dev, uint64_t taddr,
+                                    uint8_t c, uint32_t size)
+{
+    return (dma_memory_set(dev->dma, taddr, c, size) != 0) ?
+        H_DEST_PARM : H_SUCCESS;
+}
+
+#define vio_stb(_dev, _addr, _val) (stb_dma((_dev)->dma, (_addr), (_val)))
+#define vio_sth(_dev, _addr, _val) (stw_be_dma((_dev)->dma, (_addr), (_val)))
+#define vio_stl(_dev, _addr, _val) (stl_be_dma((_dev)->dma, (_addr), (_val)))
+#define vio_stq(_dev, _addr, _val) (stq_be_dma((_dev)->dma, (_addr), (_val)))
+#define vio_ldq(_dev, _addr) (ldq_be_dma((_dev)->dma, (_addr)))
 
 int spapr_vio_send_crq(VIOsPAPRDevice *dev, uint8_t *crq);
 
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
index 037867a..d2fe3e5 100644
--- a/hw/spapr_vscsi.c
+++ b/hw/spapr_vscsi.c
@@ -165,7 +165,7 @@ static int vscsi_send_iu(VSCSIState *s, vscsi_req *req,
     long rc, rc1;
 
     /* First copy the SRP */
-    rc = spapr_tce_dma_write(&s->vdev, req->crq.s.IU_data_ptr,
+    rc = spapr_vio_dma_write(&s->vdev, req->crq.s.IU_data_ptr,
                              &req->iu, length);
     if (rc) {
         fprintf(stderr, "vscsi_send_iu: DMA write failure !\n");
@@ -281,9 +281,9 @@ static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
     llen = MIN(len, md->len);
     if (llen) {
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
         } else {
-            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
         }
     }
     md->len -= llen;
@@ -329,10 +329,11 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
             md = req->cur_desc = &req->ext_desc;
             dprintf("VSCSI:   Reading desc from 0x%llx\n",
                     (unsigned long long)td->va);
-            rc = spapr_tce_dma_read(&s->vdev, td->va, md,
+            rc = spapr_vio_dma_read(&s->vdev, td->va, md,
                                     sizeof(struct srp_direct_buf));
             if (rc) {
-                dprintf("VSCSI: tce_dma_read -> %d reading ext_desc\n", rc);
+                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
+                        rc);
                 break;
             }
             vscsi_swap_desc(md);
@@ -345,12 +346,12 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
         /* Perform transfer */
         llen = MIN(len, md->len);
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_tce_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
         } else {
-            rc = spapr_tce_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
         }
         if (rc) {
-            dprintf("VSCSI: tce_dma_r/w(%d) -> %d\n", req->writing, rc);
+            dprintf("VSCSI: spapr_vio_dma_r/w(%d) -> %d\n", req->writing, rc);
             break;
         }
         dprintf("VSCSI:     data: %02x %02x %02x %02x...\n",
@@ -728,7 +729,7 @@ static int vscsi_send_adapter_info(VSCSIState *s, vscsi_req *req)
     sinfo = &req->iu.mad.adapter_info;
 
 #if 0 /* What for ? */
-    rc = spapr_tce_dma_read(&s->vdev, be64_to_cpu(sinfo->buffer),
+    rc = spapr_vio_dma_read(&s->vdev, be64_to_cpu(sinfo->buffer),
                             &info, be16_to_cpu(sinfo->common.length));
     if (rc) {
         fprintf(stderr, "vscsi_send_adapter_info: DMA read failure !\n");
@@ -742,7 +743,7 @@ static int vscsi_send_adapter_info(VSCSIState *s, vscsi_req *req)
     info.os_type = cpu_to_be32(2);
     info.port_max_txu[0] = cpu_to_be32(VSCSI_MAX_SECTORS << 9);
 
-    rc = spapr_tce_dma_write(&s->vdev, be64_to_cpu(sinfo->buffer),
+    rc = spapr_vio_dma_write(&s->vdev, be64_to_cpu(sinfo->buffer),
                              &info, be16_to_cpu(sinfo->common.length));
     if (rc)  {
         fprintf(stderr, "vscsi_send_adapter_info: DMA write failure !\n");
@@ -804,7 +805,7 @@ static void vscsi_got_payload(VSCSIState *s, vscsi_crq *crq)
     }
 
     /* XXX Handle failure differently ? */
-    if (spapr_tce_dma_read(&s->vdev, crq->s.IU_data_ptr, &req->iu,
+    if (spapr_vio_dma_read(&s->vdev, crq->s.IU_data_ptr, &req->iu,
                            crq->s.IU_length)) {
         fprintf(stderr, "vscsi_got_payload: DMA read failure !\n");
         g_free(req);
@@ -945,7 +946,7 @@ static int spapr_vscsi_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
 }
 
 static Property spapr_vscsi_properties[] = {
-    DEFINE_SPAPR_PROPERTIES(VSCSIState, vdev, 0x10000000),
+    DEFINE_SPAPR_PROPERTIES(VSCSIState, vdev),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -962,6 +963,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
     k->dt_compatible = "IBM,v-scsi";
     k->signal_mask = 0x00000001;
     dc->props = spapr_vscsi_properties;
+    k->rtce_window_size = 0x10000000;
 }
 
 static TypeInfo spapr_vscsi_info = {
diff --git a/hw/spapr_vty.c b/hw/spapr_vty.c
index f340b83..99e52cc 100644
--- a/hw/spapr_vty.c
+++ b/hw/spapr_vty.c
@@ -133,7 +133,7 @@ void spapr_vty_create(VIOsPAPRBus *bus, CharDriverState *chardev)
 }
 
 static Property spapr_vty_properties[] = {
-    DEFINE_SPAPR_PROPERTIES(VIOsPAPRVTYDevice, sdev, 0),
+    DEFINE_SPAPR_PROPERTIES(VIOsPAPRVTYDevice, sdev),
     DEFINE_PROP_CHR("chardev", VIOsPAPRVTYDevice, chardev),
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c09cc39..0ab7630 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -859,7 +859,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t window_size, int *pfd)
         return NULL;
     }
 
-    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE) * sizeof(VIOsPAPR_RTCE);
+    len = (window_size / SPAPR_TCE_PAGE_SIZE) * sizeof(sPAPRTCE);
     /* FIXME: round this up to page size */
 
     table = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
@@ -882,7 +882,7 @@ int kvmppc_remove_spapr_tce(void *table, int fd, uint32_t window_size)
         return -1;
     }
 
-    len = (window_size / SPAPR_VIO_TCE_PAGE_SIZE)*sizeof(VIOsPAPR_RTCE);
+    len = (window_size / SPAPR_TCE_PAGE_SIZE)*sizeof(sPAPRTCE);
     if ((munmap(table, len) < 0) ||
         (close(fd) < 0)) {
         fprintf(stderr, "KVM: Unexpected error removing TCE table: %s",
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use IOMMU infrastructure
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (9 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, anthony, Eduard - Gabriel Munteanu,
	David Gibson, Richard Henderson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds some hooks to let PCI devices and busses use the new IOMMU
infrastructure.  When IOMMU support is enabled, each PCI device now
contains a DMAContext * which is used by the pci_dma_*() wrapper functions.

By default, the contexts are initialized to NULL, assuming no IOMMU.
However the platform or host bridge code which sets up the PCI bus can use
pci_setup_iommu() to set a function which will determine the correct
DMAContext for a given PCI device.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/pci.c           |    9 +++++++++
 hw/pci.h           |    9 +++++++--
 hw/pci_internals.h |    2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index bdfb3d6..c8d16a4 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -775,6 +775,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
         return NULL;
     }
     pci_dev->bus = bus;
+    if (bus->dma_context_fn) {
+        pci_dev->dma = bus->dma_context_fn(bus, bus->dma_context_opaque, devfn);
+    }
     pci_dev->devfn = devfn;
     pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
     pci_dev->irq_state = 0;
@@ -2021,6 +2024,12 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     k->props = pci_props;
 }
 
+void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque)
+{
+    bus->dma_context_fn = fn;
+    bus->dma_context_opaque = opaque;
+}
+
 static TypeInfo pci_device_type_info = {
     .name = TYPE_PCI_DEVICE,
     .parent = TYPE_DEVICE,
diff --git a/hw/pci.h b/hw/pci.h
index 99b7e61..c099766 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -179,6 +179,7 @@ typedef void (*MSIVectorReleaseNotifier)(PCIDevice *dev, unsigned int vector);
 
 struct PCIDevice {
     DeviceState qdev;
+
     /* PCI config space */
     uint8_t *config;
 
@@ -200,6 +201,7 @@ struct PCIDevice {
     int32_t devfn;
     char name[64];
     PCIIORegion io_regions[PCI_NUM_REGIONS];
+    DMAContext *dma;
 
     /* do not access the following fields */
     PCIConfigReadFunc *config_read;
@@ -324,6 +326,10 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+typedef DMAContext *(*PCIDMAContextFunc)(PCIBus *, void *, int);
+
+void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
@@ -560,8 +566,7 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
 /* DMA access functions */
 static inline DMAContext *pci_dma_context(PCIDevice *dev)
 {
-    /* Stub for when we have no PCI iommu support */
-    return NULL;
+    return dev->dma;
 }
 
 static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index 399c6d4..e8bc9f6 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -17,6 +17,8 @@
 
 struct PCIBus {
     BusState qbus;
+    PCIDMAContextFunc dma_context_fn;
+    void *dma_context_opaque;
     uint8_t devfn_min;
     pci_set_irq_fn set_irq;
     pci_map_irq_fn map_irq;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (10 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use " Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 13/13] Add a memory barrier to DMA functions Benjamin Herrenschmidt
  2012-06-20 21:12 ` [Qemu-devel] [PATCH 00/13] iommu series Anthony Liguori
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Graf, anthony, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Currently the pseries machine emulation does not support DMA for emulated
PCI devices, because the PAPR spec always requires a (guest visible,
paravirtualized) IOMMU which was not implemented.  Now that we have
infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
for pseries.

With the existing PAPR IOMMU code used for VIO devices, this is almost
trivial. We use a single DMAContext for each (virtual) PCI host bridge,
which is the usual configuration on real PAPR machines (which often have
_many_ PCI host bridges).

Cc: Alex Graf <agraf@suse.de>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/spapr.h       |    1 +
 hw/spapr_iommu.c |   40 ++++++++++++++++++++++------------------
 hw/spapr_pci.c   |   15 +++++++++++++++
 hw/spapr_pci.h   |    1 +
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/hw/spapr.h b/hw/spapr.h
index df3e8b1..7c497aa 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -328,6 +328,7 @@ typedef struct sPAPRTCE {
 } sPAPRTCE;
 
 #define SPAPR_VIO_BASE_LIOBN    0x00000000
+#define SPAPR_PCI_BASE_LIOBN    0x80000000
 
 void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 5a769b9..388ffa4 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -162,6 +162,22 @@ void spapr_tce_free(DMAContext *dma)
     }
 }
 
+static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
+                                target_ulong tce)
+{
+    sPAPRTCE *tcep;
+
+    if (ioba >= tcet->window_size) {
+        hcall_dprintf("spapr_vio_put_tce on out-of-boards IOBA 0x"
+                      TARGET_FMT_lx "\n", ioba);
+        return H_PARAMETER;
+    }
+
+    tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
+    tcep->tce = tce;
+
+    return H_SUCCESS;
+}
 
 static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
                               target_ulong opcode, target_ulong *args)
@@ -170,37 +186,25 @@ static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
     target_ulong ioba = args[1];
     target_ulong tce = args[2];
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
-    sPAPRTCE *tcep;
 
     if (liobn & 0xFFFFFFFF00000000ULL) {
         hcall_dprintf("spapr_vio_put_tce on out-of-boundsw LIOBN "
                       TARGET_FMT_lx "\n", liobn);
         return H_PARAMETER;
     }
-    if (!tcet) {
-        hcall_dprintf("spapr_vio_put_tce on non-existent LIOBN "
-                      TARGET_FMT_lx "\n", liobn);
-        return H_PARAMETER;
-    }
 
     ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
 
+    if (tcet) {
+        return put_tce_emu(tcet, ioba, tce);
+    }
 #ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_vio_put_tce on liobn=" TARGET_FMT_lx /*%s*/
+    fprintf(stderr, "%s on liobn=" TARGET_FMT_lx /*%s*/
             "  ioba 0x" TARGET_FMT_lx "  TCE 0x" TARGET_FMT_lx "\n",
-            liobn, /*dev->qdev.id, */ioba, tce);
+            __func__, liobn, /*dev->qdev.id, */ioba, tce);
 #endif
 
-    if (ioba >= tcet->window_size) {
-        hcall_dprintf("spapr_vio_put_tce on out-of-boards IOBA 0x"
-                      TARGET_FMT_lx "\n", ioba);
-        return H_PARAMETER;
-    }
-
-    tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
-    tcep->tce = tce;
-
-    return H_SUCCESS;
+    return H_PARAMETER;
 }
 
 void spapr_iommu_init(void)
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 97d417a..47ba5ff 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -266,12 +266,21 @@ static const MemoryRegionOps spapr_io_ops = {
 /*
  * PHB PCI device
  */
+static DMAContext *spapr_pci_dma_context_fn(PCIBus *bus, void *opaque,
+                                            int devfn)
+{
+    sPAPRPHBState *phb = opaque;
+
+    return phb->dma;
+}
+
 static int spapr_phb_init(SysBusDevice *s)
 {
     sPAPRPHBState *phb = FROM_SYSBUS(sPAPRPHBState, s);
     char *namebuf;
     int i;
     PCIBus *bus;
+    uint32_t liobn;
 
     phb->dtbusname = g_strdup_printf("pci@%" PRIx64, phb->buid);
     namebuf = alloca(strlen(phb->dtbusname) + 32);
@@ -312,6 +321,10 @@ static int spapr_phb_init(SysBusDevice *s)
                            PCI_DEVFN(0, 0), PCI_NUM_PINS);
     phb->host_state.bus = bus;
 
+    liobn = SPAPR_PCI_BASE_LIOBN | (pci_find_domain(bus) << 16);
+    phb->dma = spapr_tce_new_dma_context(liobn, 0x40000000);
+    pci_setup_iommu(bus, spapr_pci_dma_context_fn, phb);
+
     QLIST_INSERT_HEAD(&spapr->phbs, phb, list);
 
     /* Initialize the LSI table */
@@ -472,6 +485,8 @@ int spapr_populate_pci_devices(sPAPRPHBState *phb,
     _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
                      sizeof(interrupt_map)));
 
+    spapr_dma_dt(fdt, bus_off, "ibm,dma-window", phb->dma);
+
     return 0;
 }
 
diff --git a/hw/spapr_pci.h b/hw/spapr_pci.h
index f54c2e8..d9e46e2 100644
--- a/hw/spapr_pci.h
+++ b/hw/spapr_pci.h
@@ -38,6 +38,7 @@ typedef struct sPAPRPHBState {
     MemoryRegion memspace, iospace;
     target_phys_addr_t mem_win_addr, mem_win_size, io_win_addr, io_win_size;
     MemoryRegion memwindow, iowindow;
+    DMAContext *dma;
 
     struct {
         uint32_t dt_irq;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 13/13] Add a memory barrier to DMA functions
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (11 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt
@ 2012-06-19  6:39 ` Benjamin Herrenschmidt
  2012-06-20 21:12 ` [Qemu-devel] [PATCH 00/13] iommu series Anthony Liguori
  13 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19  6:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: anthony

The emulated devices can run simultaneously with the guest, so
we need to be careful with ordering of load and stores done by
them to the guest system memory, which need to be observed in
the right order by the guest operating system.

This adds a barrier call to the basic DMA read/write ops which
is currently implemented as a smp_mb(), but could be later
improved for more fine grained control of barriers.

Additionally, a _relaxed() variant of the accessors is provided
to easily convert devices who would be performance sensitive
and negatively impacted by the change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 dma.h |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/dma.h b/dma.h
index f1fcb71..0d57e50 100644
--- a/dma.h
+++ b/dma.h
@@ -13,6 +13,7 @@
 #include <stdio.h>
 #include "hw/hw.h"
 #include "block.h"
+#include "kvm.h"
 
 typedef struct DMAContext DMAContext;
 typedef struct ScatterGatherEntry ScatterGatherEntry;
@@ -70,6 +71,30 @@ typedef struct DMAContext {
     DMAUnmapFunc *unmap;
 } DMAContext;
 
+static inline void dma_barrier(DMAContext *dma, DMADirection dir)
+{
+    /*
+     * This is called before DMA read and write operations
+     * unless the _relaxed form is used and is responsible
+     * for providing some sane ordering of accesses vs
+     * concurrently running VCPUs.
+     *
+     * Users of map(), unmap() or lower level st/ld_*
+     * operations are responsible for providing their own
+     * ordering via barriers.
+     *
+     * This primitive implementation does a simple smp_mb()
+     * before each operation which provides pretty much full
+     * ordering.
+     *
+     * A smarter implementation can be devised if needed to
+     * use lighter barriers based on the direction of the
+     * transfer, the DMA context, etc...
+     */
+    if (kvm_enabled())
+        smp_mb();
+}
+
 static inline bool dma_has_iommu(DMAContext *dma)
 {
     return !!dma;
@@ -93,8 +118,9 @@ static inline bool dma_memory_valid(DMAContext *dma,
 
 int iommu_dma_memory_rw(DMAContext *dma, dma_addr_t addr,
                         void *buf, dma_addr_t len, DMADirection dir);
-static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
-                                void *buf, dma_addr_t len, DMADirection dir)
+static inline int dma_memory_rw_relaxed(DMAContext *dma, dma_addr_t addr,
+                                        void *buf, dma_addr_t len,
+                                        DMADirection dir)
 {
     if (!dma_has_iommu(dma)) {
         /* Fast-path for no IOMMU */
@@ -106,6 +132,28 @@ static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
     }
 }
 
+static inline int dma_memory_read_relaxed(DMAContext *dma, dma_addr_t addr,
+                                          void *buf, dma_addr_t len)
+{
+    return dma_memory_rw_relaxed(dma, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+}
+
+static inline int dma_memory_write_relaxed(DMAContext *dma, dma_addr_t addr,
+                                           const void *buf, dma_addr_t len)
+{
+    return dma_memory_rw_relaxed(dma, addr, (void *)buf, len,
+                                 DMA_DIRECTION_FROM_DEVICE);
+}
+
+static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+                                void *buf, dma_addr_t len,
+                                DMADirection dir)
+{
+    dma_barrier(dma, dir);
+
+    return dma_memory_rw_relaxed(dma, addr, buf, len, dir);
+}
+
 static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
                                   void *buf, dma_addr_t len)
 {
@@ -124,6 +172,8 @@ int iommu_dma_memory_set(DMAContext *dma, dma_addr_t addr, uint8_t c,
 static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
                                  uint8_t c, dma_addr_t len)
 {
+    dma_barrier(dma, DMA_DIRECTION_FROM_DEVICE);
+
     if (!dma_has_iommu(dma)) {
         /* Fast-path for no IOMMU */
         cpu_physical_memory_set(addr, c, len);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
@ 2012-06-19 13:42   ` Gerd Hoffmann
  2012-06-19 20:23     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 63+ messages in thread
From: Gerd Hoffmann @ 2012-06-19 13:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, anthony, David Gibson

  Hi,

> Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
> callback function.  When IOMMU support is added, this will mean that
> usb_packet_map() and the corresponding usb_packet_unmap() must be called in
> close proximity without dropping the qemu device lock

Well, that isn't guaranteed ...

> - otherwise the guest
> might invalidate IOMMU mappings while they are still in use by the device
> code.

Guest tearing down mapping while usb packets using them are still in
flight would be a guest bug.  Still not impossible to happen though. How
is this case supposed to be handled?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-19 13:42   ` Gerd Hoffmann
@ 2012-06-19 20:23     ` Benjamin Herrenschmidt
  2012-06-20  3:14       ` David Gibson
  0 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-19 20:23 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel, anthony, David Gibson

On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote:
> Well, that isn't guaranteed ...
> 
> > - otherwise the guest
> > might invalidate IOMMU mappings while they are still in use by the device
> > code.
> 
> Guest tearing down mapping while usb packets using them are still in
> flight would be a guest bug.  Still not impossible to happen though. How
> is this case supposed to be handled? 

Like with any other device, it's hard ... what would happen on real
hardware is that the USB controller will get a target abort, which will
result in the controller reporting an error (typically in the PCI status
register) and stopping.

In qemu we tend not to deal with DMA failures at all.

If the scenario above happens, we will potentially continue accessing
the guest memory after it has been unmapped. While this is bad, in
practice, it's not a huge deal because the USB controller is only
accessed by the guest kernel so it's a matter of the guest kernel
shooting itself in the foot.

So we don't have to fix it as a pre-req to merging the patches, though
it would be nice if we did in the long run.

The way to fix it is to register a cancel callback
(dma_memory_map_with_cancel), which will be called by the iommu code
when the translation is invalidated, and which can be used to cancel
pending transactions etc... and generally prevent further access to the
memory.

However the current implementation never calls cancel.

Cheers.
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-19 20:23     ` Benjamin Herrenschmidt
@ 2012-06-20  3:14       ` David Gibson
  2012-06-20  3:52         ` Benjamin Herrenschmidt
  2012-06-20  6:25         ` Gerd Hoffmann
  0 siblings, 2 replies; 63+ messages in thread
From: David Gibson @ 2012-06-20  3:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Gerd Hoffmann, anthony, qemu-devel

On Wed, Jun 20, 2012 at 06:23:58AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote:
> > Well, that isn't guaranteed ...
> > 
> > > - otherwise the guest
> > > might invalidate IOMMU mappings while they are still in use by the device
> > > code.
> > 
> > Guest tearing down mapping while usb packets using them are still in
> > flight would be a guest bug.  Still not impossible to happen though. How
> > is this case supposed to be handled? 
> 
> Like with any other device, it's hard ... what would happen on real
> hardware is that the USB controller will get a target abort, which will
> result in the controller reporting an error (typically in the PCI status
> register) and stopping.
> 
> In qemu we tend not to deal with DMA failures at all.
> 
> If the scenario above happens, we will potentially continue accessing
> the guest memory after it has been unmapped. While this is bad, in
> practice, it's not a huge deal because the USB controller is only
> accessed by the guest kernel so it's a matter of the guest kernel
> shooting itself in the foot.
> 
> So we don't have to fix it as a pre-req to merging the patches, though
> it would be nice if we did in the long run.
> 
> The way to fix it is to register a cancel callback
> (dma_memory_map_with_cancel), which will be called by the iommu code
> when the translation is invalidated, and which can be used to cancel
> pending transactions etc... and generally prevent further access to the
> memory.

So, in fact the original comment is a bit out of date.  With the
current version of this series, then a guest attempt to invalidate
will be delayed until the unmap occurs.  If we discover that leads to
delays which are too long then we can add the cancel callback to
handle this.  However, the USB case should be ok - it may not be
theoretically guaranteed that the calls are close, but it's certainly
the case at the moment.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-20  3:14       ` David Gibson
@ 2012-06-20  3:52         ` Benjamin Herrenschmidt
  2012-06-21  1:42           ` David Gibson
  2012-06-20  6:25         ` Gerd Hoffmann
  1 sibling, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20  3:52 UTC (permalink / raw)
  To: David Gibson; +Cc: Gerd Hoffmann, anthony, qemu-devel

On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote:
> So, in fact the original comment is a bit out of date.  With the
> current version of this series, then a guest attempt to invalidate
> will be delayed until the unmap occurs. 

No, this code was dropped, including the tracking of the maps, following
comments from Anthony and others. The API for providing a cancel
callback is still there but nothing will call it unless the backend does
its own tracking and decides to do so.

As it is, the race exist but:

 - It will only hurt the guest

 - And only for a very buggy guest

So the worst case is that it hurts something like kdump.


I plan to re-introduce some of the mechanisms for cancellation
eventually, but we agreed that it wasn't going to be a show stopper and
that we could work on getting that sorted in a second phase. I'm looking
at a more efficient way to deal with the tracking of the maps as well
since some devices uses them often.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-20  3:14       ` David Gibson
  2012-06-20  3:52         ` Benjamin Herrenschmidt
@ 2012-06-20  6:25         ` Gerd Hoffmann
  2012-06-20  9:25           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 63+ messages in thread
From: Gerd Hoffmann @ 2012-06-20  6:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, qemu-devel, anthony

  Hi,

>> Like with any other device, it's hard ... what would happen on real
>> hardware is that the USB controller will get a target abort, which will
>> result in the controller reporting an error (typically in the PCI status
>> register) and stopping.

Not that hard, code to cancel in-flight transactions is in place already
as this can happen for other reasons too.

> handle this.  However, the USB case should be ok - it may not be
> theoretically guaranteed that the calls are close, but it's certainly
> the case at the moment.

Depends on the device.  For the usb hid devices (which is the most
important use case for power I think) packets will be processed
synchronously, so there is no problem here.

usb-storage can keep packets in flight without holding the qemu lock
(waiting for async block I/O finish).  Shouldn't be too long though.

usb-host keeps pretty much every packet in flight without holding the
qemu lock as it passes on the requests to the hosts usbfs, then waits
asynchronously for the request finish before returning the result to the
guest.  Depending on the kind of device you are passing though this can
be *very* long (minutes).

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-20  6:25         ` Gerd Hoffmann
@ 2012-06-20  9:25           ` Benjamin Herrenschmidt
  2012-06-20  9:54             ` Gerd Hoffmann
  0 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20  9:25 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel, anthony

On Wed, 2012-06-20 at 08:25 +0200, Gerd Hoffmann wrote:
> Hi,
> 
> >> Like with any other device, it's hard ... what would happen on real
> >> hardware is that the USB controller will get a target abort, which will
> >> result in the controller reporting an error (typically in the PCI status
> >> register) and stopping.
> 
> Not that hard, code to cancel in-flight transactions is in place already
> as this can happen for other reasons too.

Ok, good,.

> > handle this.  However, the USB case should be ok - it may not be
> > theoretically guaranteed that the calls are close, but it's certainly
> > the case at the moment.
> 
> Depends on the device.  For the usb hid devices (which is the most
> important use case for power I think) packets will be processed
> synchronously, so there is no problem here.
> 
> usb-storage can keep packets in flight without holding the qemu lock
> (waiting for async block I/O finish).  Shouldn't be too long though.
> 
> usb-host keeps pretty much every packet in flight without holding the
> qemu lock as it passes on the requests to the hosts usbfs, then waits
> asynchronously for the request finish before returning the result to the
> guest.  Depending on the kind of device you are passing though this can
> be *very* long (minutes).

Right so with the initial patch series I sent, nothing will happen in
that we don't actually keep track of mappings, don't call the cancel
callback and anyways, OHCI/EHCI don't register a cancel callback.

As I wrote earlier, this is not very harmful so it's good to get merged,
and we can look into improving it and add the cancellation mechanism on
top. There was some original invalidation code from David that was
trying to wait on all pending maps but that had issues, Anthony wasn't
too happy about it, so I decided to attempt to submit/merge the patch
series without solving that issue.

To properly implement cancel without too much overhead, we need some
tracking of qemu maps and we need a quick way to know when the guest
invalidates a translation, if that translation has maps associated with
it.

The best way to do that, from my little experience messing around with
it, is going to essentially be implementation specific (ie depends on
the actual iommu backend).

For example, on TCEs, I could keep a parallel bitmap indicating when a
map is present for a given entry. That could be very efficient if I know
that there won't be more than one qemu map at a time for a given entry
though, so we should discuss whether that's an acceptable limitation.

There's also some "interesting" issues due to the fact that we populate
the TCE tables directly from KVM in "real mode" for speed, so that
bitmap would need to be some kind of shared memory with the kernel
(without locks !) and the kernel would have to be updated to fallback to
sending the invalidation hypercalls to qemu when it collides with a
populated map entry.

It's all doable, it's also a bit tricky, potentially quite a bit of
code, new KVM/qemu interfaces, etc... for a problem that's going to be a
non-issue pretty much 99.9% of the time :-) We still need to address it,
but I haven't convinced myself yet that I have come up with the best
solution :-)

Cheers,
Ben. 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-20  9:25           ` Benjamin Herrenschmidt
@ 2012-06-20  9:54             ` Gerd Hoffmann
  0 siblings, 0 replies; 63+ messages in thread
From: Gerd Hoffmann @ 2012-06-20  9:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, anthony

  Hi,

> [ lots of interesting background info snipped ]

> It's all doable, it's also a bit tricky, potentially quite a bit of
> code, new KVM/qemu interfaces, etc... for a problem that's going to be a
> non-issue pretty much 99.9% of the time :-) We still need to address it,
> but I haven't convinced myself yet that I have come up with the best
> solution :-)

Yea, sure, it's in the "nice-to-have" not "must-have" category.  And
adding complex code which almost never actually runs needs some care
indeed.  Not having that for the initial merge is perfectly fine with
me, just wanted to know what dragons might be lurking there ;)

Oh, and: Acked-by: Gerd Hoffmann <kraxel@redhat.com>

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] iommu series
  2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
                   ` (12 preceding siblings ...)
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 13/13] Add a memory barrier to DMA functions Benjamin Herrenschmidt
@ 2012-06-20 21:12 ` Anthony Liguori
  13 siblings, 0 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> This is a rebase of the iommu series and the barrier patch together
> on top of current qemu.
>
> As for our discussions about doing things with Memory Regions etc
> I eventually came to the conclusion that we should just apply this
> first :-)
>
> My reasons (other than it makes my life much easier which it does)
> are that:

This series sucks pretty bad.  I don't think it can be a lot better though 
without major rearchitecting so I'm in favor of applying this now and dealing 
with the fall-out later.

I'll respond in detail where all of the problems are.  I don't have easy 
solutions to offer though.

Regards,

Anthony Liguori

>
>   - We already have PCI DMA accessors, so devices using those
> will be unaffected by further changes
>
>   - The few devices that are modified in this series to use the
> DMA accessors directly are ... few, and need to do it essentially
> because they either deal with multiple bus types (AHCI, EHCI,...)
> or because they are in a separate layer (bdev). Fixing them to
> use some other interfaces would be easy (they are few)) and might
> be unnecessary as well as we might want (or can) easily keep an
> object of type "DMAContext" to represent the DMA capabilities of a
> device as the head of the chain of MemoryRegions in a future
> more flexible design.
>
>   - It provides a good spot to stick our memory barrier
>
>   - It gives us something working now for 1.2, I know that at least
> freescale powerpc and a number of ARM folks are waiting for it.
>
> Cheers,
> Ben.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
@ 2012-06-20 21:14   ` Anthony Liguori
  2012-06-20 21:29     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> A while back, we introduced the dma_addr_t type, which is supposed to
> be used for bus visible memory addresses.  At present, this is an
> alias for target_phys_addr_t, but this will change when we eventually
> add support for guest visible IOMMUs.
>
> There are some instances of target_phys_addr_t in the code now which
> should really be dma_addr_t, but can't be trivially converted due to
> missing features which this patch corrects.
>
>   * We add DMA_ADDR_BITS analagous to TARGET_PHYS_ADDR_BITS.  This is
>     important where we need to make a compile-time (#if) based on the
>     size of dma_addr_t.
>
>   * We add a new helper macro to create device properties which take a
>     dma_addr_t, currently an alias to DEFINE_PROP_TADDR().
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> ---
>   dma.h         |    1 +
>   hw/qdev-dma.h |   12 ++++++++++++
>   2 files changed, 13 insertions(+)
>   create mode 100644 hw/qdev-dma.h
>
> diff --git a/dma.h b/dma.h
> index 8c1ec8f..fe08b72 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -31,6 +31,7 @@ struct QEMUSGList {
>   #if defined(TARGET_PHYS_ADDR_BITS)
>   typedef target_phys_addr_t dma_addr_t;
>
> +#define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
>   #define DMA_ADDR_FMT TARGET_FMT_plx
>
>   struct ScatterGatherEntry {
> diff --git a/hw/qdev-dma.h b/hw/qdev-dma.h
> new file mode 100644
> index 0000000..f0ff558
> --- /dev/null
> +++ b/hw/qdev-dma.h
> @@ -0,0 +1,12 @@
> +/*
> + * Support for dma_addr_t typed properties
> + *
> + * Copyright (C) 2012 David Gibson, IBM Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qdev-addr.h"
> +
> +#define DEFINE_PROP_DMAADDR(_n, _s, _f, _d)                               \
> +    DEFINE_PROP_TADDR(_n, _s, _f, _d)

Why not make life easy and fix dma_addr_t to 64-bit?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set() Benjamin Herrenschmidt
@ 2012-06-20 21:15   ` Anthony Liguori
  2012-06-20 21:30     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> This patch adds cpu_physical_memory_set() function.  This is equivalent to
> calling cpu_physical_memory_write() with a buffer filled with a character,
> ie, a memset of target memory.
>
> It uses a small temporary buffer on the stack.
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>

Why should this be in the core API?  Shouldn't this be a helper on top of the 
DMA API?

Regards,

Anthony Liguori

> ---
>   cpu-common.h |    1 +
>   exec.c       |   15 +++++++++++++++
>   2 files changed, 16 insertions(+)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index 1fe3280..8d3596a 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -53,6 +53,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
>
>   void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>                               int len, int is_write);
> +void cpu_physical_memory_set(target_phys_addr_t addr, uint8_t c, int len);
>   static inline void cpu_physical_memory_read(target_phys_addr_t addr,
>                                               void *buf, int len)
>   {
> diff --git a/exec.c b/exec.c
> index b5d6885..cfd7008 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3601,6 +3601,21 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
>       }
>   }
>
> +void cpu_physical_memory_set(target_phys_addr_t addr, uint8_t c, int len)
> +{
> +#define FILLBUF_SIZE 512
> +    uint8_t fillbuf[FILLBUF_SIZE];
> +    int l;
> +
> +    memset(fillbuf, c, FILLBUF_SIZE);
> +    while (len>  0) {
> +        l = len<  FILLBUF_SIZE ? len : FILLBUF_SIZE;
> +        cpu_physical_memory_rw(addr, fillbuf, l, true);
> +        len -= len;
> +        addr += len;
> +    }
> +}
> +
>   /* used for ROM loading : can write in RAM and ROM */
>   void cpu_physical_memory_write_rom(target_phys_addr_t addr,
>                                      const uint8_t *buf, int len)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
@ 2012-06-20 21:16   ` Anthony Liguori
  2012-06-20 21:32     ` Michael S. Tsirkin
                       ` (4 more replies)
  0 siblings, 5 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Eduard - Gabriel Munteanu, Richard Henderson, Michael S. Tsirkin,
	qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> Not that long ago, every device implementation using DMA directly
> accessed guest memory using cpu_physical_memory_*().  This meant that
> adding support for a guest visible IOMMU would require changing every
> one of these devices to go through IOMMU translation.
>
> Shortly before qemu 1.0, I made a start on fixing this by providing
> helper functions for PCI DMA.  These are currently just stubs which
> call the direct access functions, but mean that an IOMMU can be
> implemented in one place, rather than for every PCI device.
>
> Clearly, this doesn't help for non PCI devices, which could also be
> IOMMU translated on some platforms.  It is also problematic for the
> devices which have both PCI and non-PCI version (e.g. OHCI, AHCI) - we
> cannot use the the pci_dma_*() functions, because they assume the
> presence of a PCIDevice, but we don't want to have to check between
> pci_dma_*() and cpu_physical_memory_*() every time we do a DMA in the
> device code.
>
> This patch makes the first step on addressing both these problems, by
> introducing new (stub) dma helper functions which can be used for any
> DMA capable device.
>
> These dma functions take a DMAContext *, a new (currently empty)
> variable describing the DMA address space in which the operation is to
> take place.  NULL indicates untranslated DMA directly into guest
> physical address space.  The intention is that in future non-NULL
> values will given information about any necessary IOMMU translation.
>
> DMA using devices must obtain a DMAContext (or, potentially, contexts)
> from their bus or platform.  For now this patch just converts the PCI
> wrappers to be implemented in terms of the universal wrappers,
> converting other drivers can take place over time.
>
> Cc: Michael S. Tsirkin<mst@redhat.com>
> Cc: Eduard - Gabriel Munteanu<eduard.munteanu@linux360.ro>
> Cc: Richard Henderson<rth@twiddle.net>
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> ---
>   dma.h         |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/pci.h      |   21 ++++++------
>   qemu-common.h |    1 +
>   3 files changed, 113 insertions(+), 9 deletions(-)
>
> diff --git a/dma.h b/dma.h
> index fe08b72..4449a0c 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -34,6 +34,106 @@ typedef target_phys_addr_t dma_addr_t;
>   #define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
>   #define DMA_ADDR_FMT TARGET_FMT_plx
>
> +/* Checks that the given range of addresses is valid for DMA.  This is
> + * useful for certain cases, but usually you should just use
> + * dma_memory_{read,write}() and check for errors */
> +static inline bool dma_memory_valid(DMAContext *dma, dma_addr_t addr,
> +                                    dma_addr_t len, DMADirection dir)
> +{
> +    /* Stub version, with no iommu we assume all bus addresses are valid */
> +    return true;
> +}
> +
> +static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
> +                                void *buf, dma_addr_t len, DMADirection dir)
> +{
> +    /* Stub version when we have no iommu support */
> +    cpu_physical_memory_rw(addr, buf, (target_phys_addr_t)len,
> +                           dir == DMA_DIRECTION_FROM_DEVICE);
> +    return 0;
> +}
> +
> +static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
> +                                  void *buf, dma_addr_t len)
> +{
> +    return dma_memory_rw(dma, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
> +}
> +
> +static inline int dma_memory_write(DMAContext *dma, dma_addr_t addr,
> +                                   const void *buf, dma_addr_t len)
> +{
> +    return dma_memory_rw(dma, addr, (void *)buf, len,
> +                         DMA_DIRECTION_FROM_DEVICE);
> +}
> +
> +static inline int dma_memory_set(DMAContext *dma, dma_addr_t addr,
> +                                 uint8_t c, dma_addr_t len)
> +{
> +    /* Stub version when we have no iommu support */
> +    cpu_physical_memory_set(addr, c, len);
> +    return 0;
> +}
> +
> +static inline void *dma_memory_map(DMAContext *dma,
> +                                   dma_addr_t addr, dma_addr_t *len,
> +                                   DMADirection dir)
> +{
> +    target_phys_addr_t xlen = *len;
> +    void *p;
> +
> +    p = cpu_physical_memory_map(addr,&xlen,
> +                                dir == DMA_DIRECTION_FROM_DEVICE);
> +    *len = xlen;
> +    return p;
> +}
> +
> +static inline void dma_memory_unmap(DMAContext *dma,
> +                                    void *buffer, dma_addr_t len,
> +                                    DMADirection dir, dma_addr_t access_len)
> +{
> +    return cpu_physical_memory_unmap(buffer, (target_phys_addr_t)len,
> +                                     dir == DMA_DIRECTION_FROM_DEVICE,
> +                                     access_len);
> +}
> +
> +#define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
> +    static inline uint##_bits##_t ld##_lname##_##_end##_dma(DMAContext *dma, \
> +                                                            dma_addr_t addr) \
> +    {                                                                   \
> +        uint##_bits##_t val;                                            \
> +        dma_memory_read(dma, addr,&val, (_bits) / 8);                  \
> +        return _end##_bits##_to_cpu(val);                               \
> +    }                                                                   \
> +    static inline void st##_sname##_##_end##_dma(DMAContext *dma,       \
> +                                                 dma_addr_t addr,       \
> +                                                 uint##_bits##_t val)   \
> +    {                                                                   \
> +        val = cpu_to_##_end##_bits(val);                                \
> +        dma_memory_write(dma, addr,&val, (_bits) / 8);                 \
> +    }
> +
> +static inline uint8_t ldub_dma(DMAContext *dma, dma_addr_t addr)
> +{
> +    uint8_t val;
> +
> +    dma_memory_read(dma, addr,&val, 1);
> +    return val;
> +}
> +
> +static inline void stb_dma(DMAContext *dma, dma_addr_t addr, uint8_t val)
> +{
> +    dma_memory_write(dma, addr,&val, 1);
> +}
> +
> +DEFINE_LDST_DMA(uw, w, 16, le);
> +DEFINE_LDST_DMA(l, l, 32, le);
> +DEFINE_LDST_DMA(q, q, 64, le);
> +DEFINE_LDST_DMA(uw, w, 16, be);
> +DEFINE_LDST_DMA(l, l, 32, be);
> +DEFINE_LDST_DMA(q, q, 64, be);
> +
> +#undef DEFINE_LDST_DMA
> +
>   struct ScatterGatherEntry {
>       dma_addr_t base;
>       dma_addr_t len;
> diff --git a/hw/pci.h b/hw/pci.h
> index 7f223c0..ee669d9 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
>   }
>
>   /* DMA access functions */
> +static inline DMAContext *pci_dma_context(PCIDevice *dev)
> +{
> +    /* Stub for when we have no PCI iommu support */
> +    return NULL;
> +}

Why is all of this stuff static inline?

> +
>   static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
>                                void *buf, dma_addr_t len, DMADirection dir)
>   {
> -    cpu_physical_memory_rw(addr, buf, len, dir == DMA_DIRECTION_FROM_DEVICE);
> +    dma_memory_rw(pci_dma_context(dev), addr, buf, len, dir);
>       return 0;
>   }
>
> @@ -581,12 +587,12 @@ static inline int pci_dma_write(PCIDevice *dev, dma_addr_t addr,
>       static inline uint##_bits##_t ld##_l##_pci_dma(PCIDevice *dev,      \
>                                                      dma_addr_t addr)     \
>       {                                                                   \
> -        return ld##_l##_phys(addr);                                     \
> +        return ld##_l##_dma(pci_dma_context(dev), addr);                \
>       }                                                                   \
>       static inline void st##_s##_pci_dma(PCIDevice *dev,                 \
> -                          dma_addr_t addr, uint##_bits##_t val)         \
> +                                        dma_addr_t addr, uint##_bits##_t val) \
>       {                                                                   \
> -        st##_s##_phys(addr, val);                                       \
> +        st##_s##_dma(pci_dma_context(dev), addr, val);                  \
>       }
>
>   PCI_DMA_DEFINE_LDST(ub, b, 8);
> @@ -602,19 +608,16 @@ PCI_DMA_DEFINE_LDST(q_be, q_be, 64);
>   static inline void *pci_dma_map(PCIDevice *dev, dma_addr_t addr,
>                                   dma_addr_t *plen, DMADirection dir)
>   {
> -    target_phys_addr_t len = *plen;
>       void *buf;
>
> -    buf = cpu_physical_memory_map(addr,&len, dir == DMA_DIRECTION_FROM_DEVICE);
> -    *plen = len;
> +    buf = dma_memory_map(pci_dma_context(dev), addr, plen, dir);
>       return buf;
>   }
>
>   static inline void pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len,
>                                    DMADirection dir, dma_addr_t access_len)
>   {
> -    cpu_physical_memory_unmap(buffer, len, dir == DMA_DIRECTION_FROM_DEVICE,
> -                              access_len);
> +    dma_memory_unmap(pci_dma_context(dev), buffer, len, dir, access_len);
>   }
>
>   static inline void pci_dma_sglist_init(QEMUSGList *qsg, PCIDevice *dev,
> diff --git a/qemu-common.h b/qemu-common.h
> index 8f87e41..80026af 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
>   typedef struct VirtIODevice VirtIODevice;
>   typedef struct QEMUSGList QEMUSGList;
>   typedef struct SHPCDevice SHPCDevice;
> +typedef struct DMAContext DMAContext;

Please don't put this in qemu-common.h.  Stick it in a dma-specific header.

Regards,

Anthony Liguori

>   typedef uint64_t pcibus_t;
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
@ 2012-06-20 21:18   ` Anthony Liguori
  2012-06-20 21:36     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Gerd Hoffmann, qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> The OHCI device emulation can provide both PCI and SysBus OHCI
> implementations.  Because of this, it was not previously converted to
> use the PCI DMA helper functions.
>
> This patch converts it to use the new universal DMA helper functions.
> In the PCI case, it obtains its DMAContext from pci_dma_context(), in
> the SysBus case, it uses NULL - i.e. assumes for now that there will
> be no IOMMU translation for a SysBus OHCI.
>
> Cc: Gerd Hoffmann<kraxel@redhat.com>
> Cc: Michael S. Tsirkin<mst@redhat.com>
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>


So...  the DMA api is designed to allow for partial result returns which I 
presume an implementation would use as a simplification.

But none of these callers actually check the return code?

Either errors are important and we need to adjust callees to check them or 
errors aren't important and we should return void.

Why leave pci accessors and not implement usb_memory_rw() wrappers?

Regards,

Anthony Liguori

> ---
>   hw/usb/hcd-ohci.c |   93 +++++++++++++++++++++++++++++------------------------
>   1 file changed, 51 insertions(+), 42 deletions(-)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index 1a1cc88..844e7ed 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -31,7 +31,7 @@
>   #include "hw/usb.h"
>   #include "hw/pci.h"
>   #include "hw/sysbus.h"
> -#include "hw/qdev-addr.h"
> +#include "hw/qdev-dma.h"
>
>   //#define DEBUG_OHCI
>   /* Dump packet contents.  */
> @@ -62,6 +62,7 @@ typedef struct {
>       USBBus bus;
>       qemu_irq irq;
>       MemoryRegion mem;
> +    DMAContext *dma;
>       int num_ports;
>       const char *name;
>
> @@ -104,7 +105,7 @@ typedef struct {
>       uint32_t htest;
>
>       /* SM501 local memory offset */
> -    target_phys_addr_t localmem_base;
> +    dma_addr_t localmem_base;
>
>       /* Active packets.  */
>       uint32_t old_ctl;
> @@ -482,14 +483,14 @@ static void ohci_reset(void *opaque)
>
>   /* Get an array of dwords from main memory */
>   static inline int get_dwords(OHCIState *ohci,
> -                             uint32_t addr, uint32_t *buf, int num)
> +                             dma_addr_t addr, uint32_t *buf, int num)
>   {
>       int i;
>
>       addr += ohci->localmem_base;
>
>       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
>           *buf = le32_to_cpu(*buf);
>       }
>
> @@ -498,7 +499,7 @@ static inline int get_dwords(OHCIState *ohci,
>
>   /* Put an array of dwords in to main memory */
>   static inline int put_dwords(OHCIState *ohci,
> -                             uint32_t addr, uint32_t *buf, int num)
> +                             dma_addr_t addr, uint32_t *buf, int num)
>   {
>       int i;
>
> @@ -506,7 +507,7 @@ static inline int put_dwords(OHCIState *ohci,
>
>       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
>           uint32_t tmp = cpu_to_le32(*buf);
> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
>       }
>
>       return 1;
> @@ -514,14 +515,14 @@ static inline int put_dwords(OHCIState *ohci,
>
>   /* Get an array of words from main memory */
>   static inline int get_words(OHCIState *ohci,
> -                            uint32_t addr, uint16_t *buf, int num)
> +                            dma_addr_t addr, uint16_t *buf, int num)
>   {
>       int i;
>
>       addr += ohci->localmem_base;
>
>       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
>           *buf = le16_to_cpu(*buf);
>       }
>
> @@ -530,7 +531,7 @@ static inline int get_words(OHCIState *ohci,
>
>   /* Put an array of words in to main memory */
>   static inline int put_words(OHCIState *ohci,
> -                            uint32_t addr, uint16_t *buf, int num)
> +                            dma_addr_t addr, uint16_t *buf, int num)
>   {
>       int i;
>
> @@ -538,40 +539,40 @@ static inline int put_words(OHCIState *ohci,
>
>       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
>           uint16_t tmp = cpu_to_le16(*buf);
> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
>       }
>
>       return 1;
>   }
>
>   static inline int ohci_read_ed(OHCIState *ohci,
> -                               uint32_t addr, struct ohci_ed *ed)
> +                               dma_addr_t addr, struct ohci_ed *ed)
>   {
>       return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed)>>  2);
>   }
>
>   static inline int ohci_read_td(OHCIState *ohci,
> -                               uint32_t addr, struct ohci_td *td)
> +                               dma_addr_t addr, struct ohci_td *td)
>   {
>       return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>  2);
>   }
>
>   static inline int ohci_read_iso_td(OHCIState *ohci,
> -                                   uint32_t addr, struct ohci_iso_td *td)
> +                                   dma_addr_t addr, struct ohci_iso_td *td)
>   {
>       return (get_dwords(ohci, addr, (uint32_t *)td, 4)&&
>               get_words(ohci, addr + 16, td->offset, 8));
>   }
>
>   static inline int ohci_read_hcca(OHCIState *ohci,
> -                                 uint32_t addr, struct ohci_hcca *hcca)
> +                                 dma_addr_t addr, struct ohci_hcca *hcca)
>   {
> -    cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
> +    dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, sizeof(*hcca));
>       return 1;
>   }
>
>   static inline int ohci_put_ed(OHCIState *ohci,
> -                              uint32_t addr, struct ohci_ed *ed)
> +                              dma_addr_t addr, struct ohci_ed *ed)
>   {
>       /* ed->tail is under control of the HCD.
>        * Since just ed->head is changed by HC, just write back this
> @@ -583,64 +584,63 @@ static inline int ohci_put_ed(OHCIState *ohci,
>   }
>
>   static inline int ohci_put_td(OHCIState *ohci,
> -                              uint32_t addr, struct ohci_td *td)
> +                              dma_addr_t addr, struct ohci_td *td)
>   {
>       return put_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>  2);
>   }
>
>   static inline int ohci_put_iso_td(OHCIState *ohci,
> -                                  uint32_t addr, struct ohci_iso_td *td)
> +                                  dma_addr_t addr, struct ohci_iso_td *td)
>   {
>       return (put_dwords(ohci, addr, (uint32_t *)td, 4)&&
>               put_words(ohci, addr + 16, td->offset, 8));
>   }
>
>   static inline int ohci_put_hcca(OHCIState *ohci,
> -                                uint32_t addr, struct ohci_hcca *hcca)
> +                                dma_addr_t addr, struct ohci_hcca *hcca)
>   {
> -    cpu_physical_memory_write(addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> -                              (char *)hcca + HCCA_WRITEBACK_OFFSET,
> -                              HCCA_WRITEBACK_SIZE);
> +    dma_memory_write(ohci->dma,
> +                     addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> +                     (char *)hcca + HCCA_WRITEBACK_OFFSET,
> +                     HCCA_WRITEBACK_SIZE);
>       return 1;
>   }
>
>   /* Read/Write the contents of a TD from/to main memory.  */
>   static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
> -                         uint8_t *buf, int len, int write)
> +                         uint8_t *buf, int len, DMADirection dir)
>   {
> -    uint32_t ptr;
> -    uint32_t n;
> +    dma_addr_t ptr, n;
>
>       ptr = td->cbp;
>       n = 0x1000 - (ptr&  0xfff);
>       if (n>  len)
>           n = len;
> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
>       if (n == len)
>           return;
>       ptr = td->be&  ~0xfffu;
>       buf += n;
> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
>   }
>
>   /* Read/Write the contents of an ISO TD from/to main memory.  */
>   static void ohci_copy_iso_td(OHCIState *ohci,
>                                uint32_t start_addr, uint32_t end_addr,
> -                             uint8_t *buf, int len, int write)
> +                             uint8_t *buf, int len, DMADirection dir)
>   {
> -    uint32_t ptr;
> -    uint32_t n;
> +    dma_addr_t ptr, n;
>
>       ptr = start_addr;
>       n = 0x1000 - (ptr&  0xfff);
>       if (n>  len)
>           n = len;
> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
>       if (n == len)
>           return;
>       ptr = end_addr&  ~0xfffu;
>       buf += n;
> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
>   }
>
>   static void ohci_process_lists(OHCIState *ohci, int completion);
> @@ -803,7 +803,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
>       }
>
>       if (len&&  dir != OHCI_TD_DIR_IN) {
> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len, 0);
> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len,
> +                         DMA_DIRECTION_TO_DEVICE);
>       }
>
>       if (completion) {
> @@ -827,7 +828,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
>       /* Writeback */
>       if (dir == OHCI_TD_DIR_IN&&  ret>= 0&&  ret<= len) {
>           /* IN transfer succeeded */
> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret, 1);
> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret,
> +                         DMA_DIRECTION_FROM_DEVICE);
>           OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_CC,
>                       OHCI_CC_NOERROR);
>           OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_SIZE, ret);
> @@ -971,7 +973,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>                   pktlen = len;
>               }
>               if (!completion) {
> -                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen, 0);
> +                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen,
> +                             DMA_DIRECTION_TO_DEVICE);
>               }
>           }
>       }
> @@ -1021,7 +1024,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>       }
>       if (ret>= 0) {
>           if (dir == OHCI_TD_DIR_IN) {
> -            ohci_copy_td(ohci,&td, ohci->usb_buf, ret, 1);
> +            ohci_copy_td(ohci,&td, ohci->usb_buf, ret,
> +                         DMA_DIRECTION_FROM_DEVICE);
>   #ifdef DEBUG_PACKET
>               DPRINTF("  data:");
>               for (i = 0; i<  ret; i++)
> @@ -1748,11 +1752,14 @@ static USBBusOps ohci_bus_ops = {
>   };
>
>   static int usb_ohci_init(OHCIState *ohci, DeviceState *dev,
> -                         int num_ports, uint32_t localmem_base,
> -                         char *masterbus, uint32_t firstport)
> +                         int num_ports, dma_addr_t localmem_base,
> +                         char *masterbus, uint32_t firstport,
> +                         DMAContext *dma)
>   {
>       int i;
>
> +    ohci->dma = dma;
> +
>       if (usb_frame_time == 0) {
>   #ifdef OHCI_TIME_WARP
>           usb_frame_time = get_ticks_per_sec();
> @@ -1817,7 +1824,8 @@ static int usb_ohci_initfn_pci(struct PCIDevice *dev)
>       ohci->pci_dev.config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin A */
>
>       if (usb_ohci_init(&ohci->state,&dev->qdev, ohci->num_ports, 0,
> -                      ohci->masterbus, ohci->firstport) != 0) {
> +                      ohci->masterbus, ohci->firstport,
> +                      pci_dma_context(dev)) != 0) {
>           return -1;
>       }
>       ohci->state.irq = ohci->pci_dev.irq[0];
> @@ -1831,7 +1839,7 @@ typedef struct {
>       SysBusDevice busdev;
>       OHCIState ohci;
>       uint32_t num_ports;
> -    target_phys_addr_t dma_offset;
> +    dma_addr_t dma_offset;
>   } OHCISysBusState;
>
>   static int ohci_init_pxa(SysBusDevice *dev)
> @@ -1839,7 +1847,8 @@ static int ohci_init_pxa(SysBusDevice *dev)
>       OHCISysBusState *s = FROM_SYSBUS(OHCISysBusState, dev);
>
>       /* Cannot fail as we pass NULL for masterbus */
> -    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0);
> +    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0,
> +                  NULL);
>       sysbus_init_irq(dev,&s->ohci.irq);
>       sysbus_init_mmio(dev,&s->ohci.mem);
>
> @@ -1875,7 +1884,7 @@ static TypeInfo ohci_pci_info = {
>
>   static Property ohci_sysbus_properties[] = {
>       DEFINE_PROP_UINT32("num-ports", OHCISysBusState, num_ports, 3),
> -    DEFINE_PROP_TADDR("dma-offset", OHCISysBusState, dma_offset, 3),
> +    DEFINE_PROP_DMAADDR("dma-offset", OHCISysBusState, dma_offset, 3),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
@ 2012-06-20 21:21   ` Anthony Liguori
  2012-06-20 21:37     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> dma-helpers.c contains a number of helper functions for doing
> scatter/gather DMA, and various block device related DMA.  Currently,
> these directly access guest memory using cpu_physical_memory_*(),
> assuming no IOMMU translation.
>
> This patch updates this code to use the new universal DMA helper
> functions.  qemu_sglist_init() now takes a DMAContext * to describe
> the DMA address space in which the scatter/gather will take place.
>
> We minimally update the callers qemu_sglist_init() to pass NULL
> (i.e. no translation, same as current behaviour).  Some of those
> callers should pass something else in some cases to allow proper IOMMU
> translation in future, but that will be fixed in later patches.
>
> Cc: Kevin Wolf<kwolf@redhat.com>
> Cc: Michael S. Tsirkin<mst@redhat.com>
> Cc: Paolo Bonzini<pbonzini@redhat.com>
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> ---
>   dma-helpers.c  |   24 ++++++++++++------------
>   dma.h          |    3 ++-
>   hw/ide/ahci.c  |    3 ++-
>   hw/ide/macio.c |    4 ++--
>   hw/pci.h       |    2 +-
>   5 files changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 7971a89..2dc4691 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -10,12 +10,13 @@
>   #include "dma.h"
>   #include "trace.h"
>
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma)
>   {
>       qsg->sg = g_malloc(alloc_hint * sizeof(ScatterGatherEntry));
>       qsg->nsg = 0;
>       qsg->nalloc = alloc_hint;
>       qsg->size = 0;
> +    qsg->dma = dma;
>   }
>
>   void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len)
> @@ -74,10 +75,9 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
>       int i;
>
>       for (i = 0; i<  dbs->iov.niov; ++i) {
> -        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
> -                                  dbs->iov.iov[i].iov_len,
> -                                  dbs->dir != DMA_DIRECTION_TO_DEVICE,
> -                                  dbs->iov.iov[i].iov_len);
> +        dma_memory_unmap(dbs->sg->dma, dbs->iov.iov[i].iov_base,
> +                         dbs->iov.iov[i].iov_len, dbs->dir,
> +                         dbs->iov.iov[i].iov_len);
>       }
>       qemu_iovec_reset(&dbs->iov);
>   }
> @@ -106,7 +106,7 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
>   static void dma_bdrv_cb(void *opaque, int ret)
>   {
>       DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> -    target_phys_addr_t cur_addr, cur_len;
> +    dma_addr_t cur_addr, cur_len;
>       void *mem;
>
>       trace_dma_bdrv_cb(dbs, ret);
> @@ -123,8 +123,7 @@ static void dma_bdrv_cb(void *opaque, int ret)
>       while (dbs->sg_cur_index<  dbs->sg->nsg) {
>           cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
>           cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
> -        mem = cpu_physical_memory_map(cur_addr,&cur_len,
> -                                      dbs->dir != DMA_DIRECTION_TO_DEVICE);
> +        mem = dma_memory_map(dbs->sg->dma, cur_addr,&cur_len, dbs->dir);
>           if (!mem)
>               break;
>           qemu_iovec_add(&dbs->iov, mem, cur_len);
> @@ -209,7 +208,8 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
>   }
>
>
> -static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_dev)
> +static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg,
> +                           DMADirection dir)
>   {
>       uint64_t resid;
>       int sg_cur_index;
> @@ -220,7 +220,7 @@ static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_de
>       while (len>  0) {
>           ScatterGatherEntry entry = sg->sg[sg_cur_index++];
>           int32_t xfer = MIN(len, entry.len);
> -        cpu_physical_memory_rw(entry.base, ptr, xfer, !to_dev);
> +        dma_memory_rw(sg->dma, entry.base, ptr, xfer, dir);

Again, you return an error but ignore it now.

In the very least, on error you should scrub the passed in buffer to avoid 
leaking data to the guest.

You can imagine a malicious guest programming the IOMMU with invalid mappings 
and then doing DMA operations in order to read memory from the host QEMU process.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps
  2012-06-19  6:39 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
@ 2012-06-20 21:25   ` Anthony Liguori
  2012-06-20 21:52     ` Benjamin Herrenschmidt
  2012-06-22  3:18     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, David Gibson

On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> From: David Gibson<david@gibson.dropbear.id.au>
>
> One new complication raised by IOMMU support over only handling DMA
> directly to physical addresses is handling dma_memory_map() case
> (replacing cpu_physical_memory_map()) when the IOMMU translation the
> IOVAs covered by such a map are invalidated or changed while the map
> is active.  This should never happen with correct guest software, but
> we do need to handle buggy guests.  This case might also occur during
> handovers between different guest software stages if the handover
> protocols aren't fully seamless.
>
> The iommu implementation will have to wait for maps to be removed
> before it can "complete" an invalidation of a translation, which
> can take a long time. In order to make it possible to speed that
> process up, we add a "Cancel" callback to the map function which
> the clients can optionally provide.
>
> The core makes no use of that, but the iommu backend implementation
> may choose to keep track of maps and call the respective cancel
> callback whenever a translation within a map is removed, allowing
> the driver to do things like cancel async IOs etc.
>
> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> ---
>   dma-helpers.c |   49 ++++++++++++++++++++++++++++---------------------
>   dma.h         |   23 +++++++++++++++++++----
>   2 files changed, 47 insertions(+), 25 deletions(-)
>
> diff --git a/dma-helpers.c b/dma-helpers.c
> index b4ee827..6e6c7b3 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -107,6 +107,28 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
>       }
>   }
>
> +static void dma_aio_cancel(BlockDriverAIOCB *acb)
> +{
> +    DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
> +
> +    trace_dma_aio_cancel(dbs);
> +
> +    if (dbs->acb) {
> +        BlockDriverAIOCB *acb = dbs->acb;
> +        dbs->acb = NULL;
> +        dbs->in_cancel = true;
> +        bdrv_aio_cancel(acb);
> +        dbs->in_cancel = false;
> +    }
> +    dbs->common.cb = NULL;
> +    dma_complete(dbs, 0);

So this cancellation stuff is hopelessly broken

It's simply not possible to fully cancel pending DMA in a synchronous callback.

Indeed, bdrv_aio_cancel ends up having a nasty little loop in it:

     if (active) {
         /* fail safe: if the aio could not be canceled, we wait for
            it */
         while (qemu_paio_error(acb) == EINPROGRESS)
             ;
     }

That spins w/100% CPU.

Can you explain when DMA cancellation really happens and what the effect would 
be if we simply ignored it?

Can we do something more clever like use an asynchronous callback to handle 
flushing active DMA mappings?

There's just no way a callback like this is going to work.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 21:14   ` Anthony Liguori
@ 2012-06-20 21:29     ` Benjamin Herrenschmidt
  2012-06-21  1:44       ` David Gibson
  2012-06-20 22:26     ` Peter Maydell
  2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:29 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:14 -0500, Anthony Liguori wrote:

> Why not make life easy and fix dma_addr_t to 64-bit?

No opinion on my side, that's from the original patch series, I suppose
the goal was to avoid the overhead/bloat on 32-bit only
platforms/targets.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-20 21:15   ` Anthony Liguori
@ 2012-06-20 21:30     ` Benjamin Herrenschmidt
  2012-06-20 21:37       ` Anthony Liguori
  2012-06-21  1:45     ` David Gibson
  2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:30 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:15 -0500, Anthony Liguori wrote:
> On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> > From: David Gibson<david@gibson.dropbear.id.au>
> >
> > This patch adds cpu_physical_memory_set() function.  This is equivalent to
> > calling cpu_physical_memory_write() with a buffer filled with a character,
> > ie, a memset of target memory.
> >
> > It uses a small temporary buffer on the stack.
> >
> > Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> 
> Why should this be in the core API?  Shouldn't this be a helper on top of the 
> DMA API?

This comes from the original patch which hand implemented the "set" by
reproducing the logic inside cpu_physical_memory_rw(). I turned into a
wrapper on top of the latter based on (your ?) previous reviews on this
list. I don't care enough to argue to keep it if you want it gone, we do
have a "clear" accessors in the PAPR vio dma accessors which is handy
but I could implement it locally.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:16   ` Anthony Liguori
@ 2012-06-20 21:32     ` Michael S. Tsirkin
  2012-06-20 21:38       ` Anthony Liguori
  2012-06-20 21:33     ` Benjamin Herrenschmidt
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-20 21:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
> >diff --git a/hw/pci.h b/hw/pci.h
> >index 7f223c0..ee669d9 100644
> >--- a/hw/pci.h
> >+++ b/hw/pci.h
> >@@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
> >  }
> >
> >  /* DMA access functions */
> >+static inline DMAContext *pci_dma_context(PCIDevice *dev)
> >+{
> >+    /* Stub for when we have no PCI iommu support */
> >+    return NULL;
> >+}
> 
> Why is all of this stuff static inline?

Let's face it, most people don't need an MMU in their VM.
inline stubs help make double sure we are not adding
overhead for the sake of this niche case.

-- 
MST

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:16   ` Anthony Liguori
  2012-06-20 21:32     ` Michael S. Tsirkin
@ 2012-06-20 21:33     ` Benjamin Herrenschmidt
  2012-06-20 21:40     ` Michael S. Tsirkin
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:33 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eduard - Gabriel Munteanu, Richard Henderson, Michael S. Tsirkin,
	qemu-devel, David Gibson

> >   /* DMA access functions */
> > +static inline DMAContext *pci_dma_context(PCIDevice *dev)
> > +{
> > +    /* Stub for when we have no PCI iommu support */
> > +    return NULL;
> > +}
> 
> Why is all of this stuff static inline?

Why not ? Not doing so is gratuitous bloat & overhead....

> >   static inline void pci_dma_sglist_init(QEMUSGList *qsg, PCIDevice *dev,
> > diff --git a/qemu-common.h b/qemu-common.h
> > index 8f87e41..80026af 100644
> > --- a/qemu-common.h
> > +++ b/qemu-common.h
> > @@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
> >   typedef struct VirtIODevice VirtIODevice;
> >   typedef struct QEMUSGList QEMUSGList;
> >   typedef struct SHPCDevice SHPCDevice;
> > +typedef struct DMAContext DMAContext;
> 
> Please don't put this in qemu-common.h.  Stick it in a dma-specific header.

Hrm, the followup ISA DMA patches from Jason Baron seem to have some
cleanups based on the fact that this is in qemu-common.h :-)

The other typedef's in there don't seem to have any more reason to be
there either to be honest. I can try to move it, I don't care much :-)

dma.h sounds like the right place ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-20 21:18   ` Anthony Liguori
@ 2012-06-20 21:36     ` Benjamin Herrenschmidt
  2012-06-20 21:40       ` Anthony Liguori
  0 siblings, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, Gerd Hoffmann, qemu-devel, David Gibson

> > Cc: Gerd Hoffmann<kraxel@redhat.com>
> > Cc: Michael S. Tsirkin<mst@redhat.com>
> >
> > Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> 
> 
> So...  the DMA api is designed to allow for partial result returns which I 
> presume an implementation would use as a simplification.
> 
> But none of these callers actually check the return code?
> 
> Either errors are important and we need to adjust callees to check them or 
> errors aren't important and we should return void.

Errors should matter and I agree callers should check them, this is the
series I inherited and I believe it does need improvements in those
areas (though it would be easier if the driver authors were the one to
do those improvements).

> Why leave pci accessors and not implement usb_memory_rw() wrappers?

Well, "usb" is a bit too generic, ehci and ohci would each need to have
their own sets of wrappers. But yes, that's possible... is it really
worth it ? There's nothing fundamentally wrong with using the dma_*
accessors.

Cheers,
Ben.

> Regards,
> 
> Anthony Liguori
> 
> > ---
> >   hw/usb/hcd-ohci.c |   93 +++++++++++++++++++++++++++++------------------------
> >   1 file changed, 51 insertions(+), 42 deletions(-)
> >
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index 1a1cc88..844e7ed 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -31,7 +31,7 @@
> >   #include "hw/usb.h"
> >   #include "hw/pci.h"
> >   #include "hw/sysbus.h"
> > -#include "hw/qdev-addr.h"
> > +#include "hw/qdev-dma.h"
> >
> >   //#define DEBUG_OHCI
> >   /* Dump packet contents.  */
> > @@ -62,6 +62,7 @@ typedef struct {
> >       USBBus bus;
> >       qemu_irq irq;
> >       MemoryRegion mem;
> > +    DMAContext *dma;
> >       int num_ports;
> >       const char *name;
> >
> > @@ -104,7 +105,7 @@ typedef struct {
> >       uint32_t htest;
> >
> >       /* SM501 local memory offset */
> > -    target_phys_addr_t localmem_base;
> > +    dma_addr_t localmem_base;
> >
> >       /* Active packets.  */
> >       uint32_t old_ctl;
> > @@ -482,14 +483,14 @@ static void ohci_reset(void *opaque)
> >
> >   /* Get an array of dwords from main memory */
> >   static inline int get_dwords(OHCIState *ohci,
> > -                             uint32_t addr, uint32_t *buf, int num)
> > +                             dma_addr_t addr, uint32_t *buf, int num)
> >   {
> >       int i;
> >
> >       addr += ohci->localmem_base;
> >
> >       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> > -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> > +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
> >           *buf = le32_to_cpu(*buf);
> >       }
> >
> > @@ -498,7 +499,7 @@ static inline int get_dwords(OHCIState *ohci,
> >
> >   /* Put an array of dwords in to main memory */
> >   static inline int put_dwords(OHCIState *ohci,
> > -                             uint32_t addr, uint32_t *buf, int num)
> > +                             dma_addr_t addr, uint32_t *buf, int num)
> >   {
> >       int i;
> >
> > @@ -506,7 +507,7 @@ static inline int put_dwords(OHCIState *ohci,
> >
> >       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> >           uint32_t tmp = cpu_to_le32(*buf);
> > -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> > +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
> >       }
> >
> >       return 1;
> > @@ -514,14 +515,14 @@ static inline int put_dwords(OHCIState *ohci,
> >
> >   /* Get an array of words from main memory */
> >   static inline int get_words(OHCIState *ohci,
> > -                            uint32_t addr, uint16_t *buf, int num)
> > +                            dma_addr_t addr, uint16_t *buf, int num)
> >   {
> >       int i;
> >
> >       addr += ohci->localmem_base;
> >
> >       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> > -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> > +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
> >           *buf = le16_to_cpu(*buf);
> >       }
> >
> > @@ -530,7 +531,7 @@ static inline int get_words(OHCIState *ohci,
> >
> >   /* Put an array of words in to main memory */
> >   static inline int put_words(OHCIState *ohci,
> > -                            uint32_t addr, uint16_t *buf, int num)
> > +                            dma_addr_t addr, uint16_t *buf, int num)
> >   {
> >       int i;
> >
> > @@ -538,40 +539,40 @@ static inline int put_words(OHCIState *ohci,
> >
> >       for (i = 0; i<  num; i++, buf++, addr += sizeof(*buf)) {
> >           uint16_t tmp = cpu_to_le16(*buf);
> > -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> > +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
> >       }
> >
> >       return 1;
> >   }
> >
> >   static inline int ohci_read_ed(OHCIState *ohci,
> > -                               uint32_t addr, struct ohci_ed *ed)
> > +                               dma_addr_t addr, struct ohci_ed *ed)
> >   {
> >       return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed)>>  2);
> >   }
> >
> >   static inline int ohci_read_td(OHCIState *ohci,
> > -                               uint32_t addr, struct ohci_td *td)
> > +                               dma_addr_t addr, struct ohci_td *td)
> >   {
> >       return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>  2);
> >   }
> >
> >   static inline int ohci_read_iso_td(OHCIState *ohci,
> > -                                   uint32_t addr, struct ohci_iso_td *td)
> > +                                   dma_addr_t addr, struct ohci_iso_td *td)
> >   {
> >       return (get_dwords(ohci, addr, (uint32_t *)td, 4)&&
> >               get_words(ohci, addr + 16, td->offset, 8));
> >   }
> >
> >   static inline int ohci_read_hcca(OHCIState *ohci,
> > -                                 uint32_t addr, struct ohci_hcca *hcca)
> > +                                 dma_addr_t addr, struct ohci_hcca *hcca)
> >   {
> > -    cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
> > +    dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, sizeof(*hcca));
> >       return 1;
> >   }
> >
> >   static inline int ohci_put_ed(OHCIState *ohci,
> > -                              uint32_t addr, struct ohci_ed *ed)
> > +                              dma_addr_t addr, struct ohci_ed *ed)
> >   {
> >       /* ed->tail is under control of the HCD.
> >        * Since just ed->head is changed by HC, just write back this
> > @@ -583,64 +584,63 @@ static inline int ohci_put_ed(OHCIState *ohci,
> >   }
> >
> >   static inline int ohci_put_td(OHCIState *ohci,
> > -                              uint32_t addr, struct ohci_td *td)
> > +                              dma_addr_t addr, struct ohci_td *td)
> >   {
> >       return put_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>  2);
> >   }
> >
> >   static inline int ohci_put_iso_td(OHCIState *ohci,
> > -                                  uint32_t addr, struct ohci_iso_td *td)
> > +                                  dma_addr_t addr, struct ohci_iso_td *td)
> >   {
> >       return (put_dwords(ohci, addr, (uint32_t *)td, 4)&&
> >               put_words(ohci, addr + 16, td->offset, 8));
> >   }
> >
> >   static inline int ohci_put_hcca(OHCIState *ohci,
> > -                                uint32_t addr, struct ohci_hcca *hcca)
> > +                                dma_addr_t addr, struct ohci_hcca *hcca)
> >   {
> > -    cpu_physical_memory_write(addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> > -                              (char *)hcca + HCCA_WRITEBACK_OFFSET,
> > -                              HCCA_WRITEBACK_SIZE);
> > +    dma_memory_write(ohci->dma,
> > +                     addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> > +                     (char *)hcca + HCCA_WRITEBACK_OFFSET,
> > +                     HCCA_WRITEBACK_SIZE);
> >       return 1;
> >   }
> >
> >   /* Read/Write the contents of a TD from/to main memory.  */
> >   static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
> > -                         uint8_t *buf, int len, int write)
> > +                         uint8_t *buf, int len, DMADirection dir)
> >   {
> > -    uint32_t ptr;
> > -    uint32_t n;
> > +    dma_addr_t ptr, n;
> >
> >       ptr = td->cbp;
> >       n = 0x1000 - (ptr&  0xfff);
> >       if (n>  len)
> >           n = len;
> > -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> > +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
> >       if (n == len)
> >           return;
> >       ptr = td->be&  ~0xfffu;
> >       buf += n;
> > -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> > +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
> >   }
> >
> >   /* Read/Write the contents of an ISO TD from/to main memory.  */
> >   static void ohci_copy_iso_td(OHCIState *ohci,
> >                                uint32_t start_addr, uint32_t end_addr,
> > -                             uint8_t *buf, int len, int write)
> > +                             uint8_t *buf, int len, DMADirection dir)
> >   {
> > -    uint32_t ptr;
> > -    uint32_t n;
> > +    dma_addr_t ptr, n;
> >
> >       ptr = start_addr;
> >       n = 0x1000 - (ptr&  0xfff);
> >       if (n>  len)
> >           n = len;
> > -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> > +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
> >       if (n == len)
> >           return;
> >       ptr = end_addr&  ~0xfffu;
> >       buf += n;
> > -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> > +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
> >   }
> >
> >   static void ohci_process_lists(OHCIState *ohci, int completion);
> > @@ -803,7 +803,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
> >       }
> >
> >       if (len&&  dir != OHCI_TD_DIR_IN) {
> > -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len, 0);
> > +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len,
> > +                         DMA_DIRECTION_TO_DEVICE);
> >       }
> >
> >       if (completion) {
> > @@ -827,7 +828,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
> >       /* Writeback */
> >       if (dir == OHCI_TD_DIR_IN&&  ret>= 0&&  ret<= len) {
> >           /* IN transfer succeeded */
> > -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret, 1);
> > +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret,
> > +                         DMA_DIRECTION_FROM_DEVICE);
> >           OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_CC,
> >                       OHCI_CC_NOERROR);
> >           OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_SIZE, ret);
> > @@ -971,7 +973,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
> >                   pktlen = len;
> >               }
> >               if (!completion) {
> > -                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen, 0);
> > +                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen,
> > +                             DMA_DIRECTION_TO_DEVICE);
> >               }
> >           }
> >       }
> > @@ -1021,7 +1024,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
> >       }
> >       if (ret>= 0) {
> >           if (dir == OHCI_TD_DIR_IN) {
> > -            ohci_copy_td(ohci,&td, ohci->usb_buf, ret, 1);
> > +            ohci_copy_td(ohci,&td, ohci->usb_buf, ret,
> > +                         DMA_DIRECTION_FROM_DEVICE);
> >   #ifdef DEBUG_PACKET
> >               DPRINTF("  data:");
> >               for (i = 0; i<  ret; i++)
> > @@ -1748,11 +1752,14 @@ static USBBusOps ohci_bus_ops = {
> >   };
> >
> >   static int usb_ohci_init(OHCIState *ohci, DeviceState *dev,
> > -                         int num_ports, uint32_t localmem_base,
> > -                         char *masterbus, uint32_t firstport)
> > +                         int num_ports, dma_addr_t localmem_base,
> > +                         char *masterbus, uint32_t firstport,
> > +                         DMAContext *dma)
> >   {
> >       int i;
> >
> > +    ohci->dma = dma;
> > +
> >       if (usb_frame_time == 0) {
> >   #ifdef OHCI_TIME_WARP
> >           usb_frame_time = get_ticks_per_sec();
> > @@ -1817,7 +1824,8 @@ static int usb_ohci_initfn_pci(struct PCIDevice *dev)
> >       ohci->pci_dev.config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin A */
> >
> >       if (usb_ohci_init(&ohci->state,&dev->qdev, ohci->num_ports, 0,
> > -                      ohci->masterbus, ohci->firstport) != 0) {
> > +                      ohci->masterbus, ohci->firstport,
> > +                      pci_dma_context(dev)) != 0) {
> >           return -1;
> >       }
> >       ohci->state.irq = ohci->pci_dev.irq[0];
> > @@ -1831,7 +1839,7 @@ typedef struct {
> >       SysBusDevice busdev;
> >       OHCIState ohci;
> >       uint32_t num_ports;
> > -    target_phys_addr_t dma_offset;
> > +    dma_addr_t dma_offset;
> >   } OHCISysBusState;
> >
> >   static int ohci_init_pxa(SysBusDevice *dev)
> > @@ -1839,7 +1847,8 @@ static int ohci_init_pxa(SysBusDevice *dev)
> >       OHCISysBusState *s = FROM_SYSBUS(OHCISysBusState, dev);
> >
> >       /* Cannot fail as we pass NULL for masterbus */
> > -    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0);
> > +    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0,
> > +                  NULL);
> >       sysbus_init_irq(dev,&s->ohci.irq);
> >       sysbus_init_mmio(dev,&s->ohci.mem);
> >
> > @@ -1875,7 +1884,7 @@ static TypeInfo ohci_pci_info = {
> >
> >   static Property ohci_sysbus_properties[] = {
> >       DEFINE_PROP_UINT32("num-ports", OHCISysBusState, num_ports, 3),
> > -    DEFINE_PROP_TADDR("dma-offset", OHCISysBusState, dma_offset, 3),
> > +    DEFINE_PROP_DMAADDR("dma-offset", OHCISysBusState, dma_offset, 3),
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers
  2012-06-20 21:21   ` Anthony Liguori
@ 2012-06-20 21:37     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Paolo Bonzini, Michael S. Tsirkin, qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:21 -0500, Anthony Liguori wrote:

> Again, you return an error but ignore it now.
> 
> In the very least, on error you should scrub the passed in buffer to avoid 
> leaking data to the guest.
> 
> You can imagine a malicious guest programming the IOMMU with invalid mappings 
> and then doing DMA operations in order to read memory from the host QEMU process.

Cleaning the buffer is easy, I'll add that. Returning an error sounds
non-trivial with the current interface.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-20 21:30     ` Benjamin Herrenschmidt
@ 2012-06-20 21:37       ` Anthony Liguori
  0 siblings, 0 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, David Gibson

On 06/20/2012 04:30 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-20 at 16:15 -0500, Anthony Liguori wrote:
>> On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
>>> From: David Gibson<david@gibson.dropbear.id.au>
>>>
>>> This patch adds cpu_physical_memory_set() function.  This is equivalent to
>>> calling cpu_physical_memory_write() with a buffer filled with a character,
>>> ie, a memset of target memory.
>>>
>>> It uses a small temporary buffer on the stack.
>>>
>>> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
>>> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
>>
>> Why should this be in the core API?  Shouldn't this be a helper on top of the
>> DMA API?
>
> This comes from the original patch which hand implemented the "set" by
> reproducing the logic inside cpu_physical_memory_rw(). I turned into a
> wrapper on top of the latter based on (your ?) previous reviews on this
> list. I don't care enough to argue to keep it if you want it gone, we do
> have a "clear" accessors in the PAPR vio dma accessors which is handy
> but I could implement it locally.

I think it's better to just stick this with the other DMA helpers and not have a 
cpu_physical version.  People shouldn't use that API directly so not having it 
there encourages people to look elsewhere.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:32     ` Michael S. Tsirkin
@ 2012-06-20 21:38       ` Anthony Liguori
  2012-06-20 21:42         ` Michael S. Tsirkin
  0 siblings, 1 reply; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On 06/20/2012 04:32 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
>>> diff --git a/hw/pci.h b/hw/pci.h
>>> index 7f223c0..ee669d9 100644
>>> --- a/hw/pci.h
>>> +++ b/hw/pci.h
>>> @@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
>>>   }
>>>
>>>   /* DMA access functions */
>>> +static inline DMAContext *pci_dma_context(PCIDevice *dev)
>>> +{
>>> +    /* Stub for when we have no PCI iommu support */
>>> +    return NULL;
>>> +}
>>
>> Why is all of this stuff static inline?
>
> Let's face it, most people don't need an MMU in their VM.
> inline stubs help make double sure we are not adding
> overhead for the sake of this niche case.

It also makes for an overly complex pci.h with no obvious performance justification.

Let's not prematurely optimize here.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:16   ` Anthony Liguori
  2012-06-20 21:32     ` Michael S. Tsirkin
  2012-06-20 21:33     ` Benjamin Herrenschmidt
@ 2012-06-20 21:40     ` Michael S. Tsirkin
  2012-06-20 22:01       ` Anthony Liguori
  2012-06-21  1:48     ` David Gibson
  2012-06-22  2:02     ` Benjamin Herrenschmidt
  4 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-20 21:40 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
> >diff --git a/qemu-common.h b/qemu-common.h
> >index 8f87e41..80026af 100644
> >--- a/qemu-common.h
> >+++ b/qemu-common.h
> >@@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
> >  typedef struct VirtIODevice VirtIODevice;
> >  typedef struct QEMUSGList QEMUSGList;
> >  typedef struct SHPCDevice SHPCDevice;
> >+typedef struct DMAContext DMAContext;
> 
> Please don't put this in qemu-common.h.  Stick it in a dma-specific header.

Weird.

The point of typedefs in qemu-common.h is so people can
use type pointer *without pulling in the relevant header*.
If we put a typedef in specific header it defeats the purpose.

It used to even say this somewhere so I don't remember where.

-- 
MST

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-20 21:36     ` Benjamin Herrenschmidt
@ 2012-06-20 21:40       ` Anthony Liguori
  2012-06-20 22:02         ` Benjamin Herrenschmidt
  2012-06-21  6:43         ` Gerd Hoffmann
  0 siblings, 2 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Gerd Hoffmann, qemu-devel, David Gibson

On 06/20/2012 04:36 PM, Benjamin Herrenschmidt wrote:
>>> Cc: Gerd Hoffmann<kraxel@redhat.com>
>>> Cc: Michael S. Tsirkin<mst@redhat.com>
>>>
>>> Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
>>> Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
>>
>>
>> So...  the DMA api is designed to allow for partial result returns which I
>> presume an implementation would use as a simplification.
>>
>> But none of these callers actually check the return code?
>>
>> Either errors are important and we need to adjust callees to check them or
>> errors aren't important and we should return void.
>
> Errors should matter and I agree callers should check them, this is the
> series I inherited and I believe it does need improvements in those
> areas (though it would be easier if the driver authors were the one to
> do those improvements).

Well let's return void in the DMA methods and let the IOMMUs assert on error. 
At least that will avoid surprises until someone decides they care enough about 
errors to touch all callers.

I think silently failing a memcpy() can potentially lead to a vulnerability so 
I'd rather avoid that.

>
>> Why leave pci accessors and not implement usb_memory_rw() wrappers?
>
> Well, "usb" is a bit too generic, ehci and ohci would each need to have
> their own sets of wrappers. But yes, that's possible... is it really
> worth it ? There's nothing fundamentally wrong with using the dma_*
> accessors.

So is using the pci accessors wrong?

I'm not saying you should go and convert every caller of the pci_ functions, I 
just want a clear policy on what interface devices should use.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.
>
>> Regards,
>>
>> Anthony Liguori
>>
>>> ---
>>>    hw/usb/hcd-ohci.c |   93 +++++++++++++++++++++++++++++------------------------
>>>    1 file changed, 51 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
>>> index 1a1cc88..844e7ed 100644
>>> --- a/hw/usb/hcd-ohci.c
>>> +++ b/hw/usb/hcd-ohci.c
>>> @@ -31,7 +31,7 @@
>>>    #include "hw/usb.h"
>>>    #include "hw/pci.h"
>>>    #include "hw/sysbus.h"
>>> -#include "hw/qdev-addr.h"
>>> +#include "hw/qdev-dma.h"
>>>
>>>    //#define DEBUG_OHCI
>>>    /* Dump packet contents.  */
>>> @@ -62,6 +62,7 @@ typedef struct {
>>>        USBBus bus;
>>>        qemu_irq irq;
>>>        MemoryRegion mem;
>>> +    DMAContext *dma;
>>>        int num_ports;
>>>        const char *name;
>>>
>>> @@ -104,7 +105,7 @@ typedef struct {
>>>        uint32_t htest;
>>>
>>>        /* SM501 local memory offset */
>>> -    target_phys_addr_t localmem_base;
>>> +    dma_addr_t localmem_base;
>>>
>>>        /* Active packets.  */
>>>        uint32_t old_ctl;
>>> @@ -482,14 +483,14 @@ static void ohci_reset(void *opaque)
>>>
>>>    /* Get an array of dwords from main memory */
>>>    static inline int get_dwords(OHCIState *ohci,
>>> -                             uint32_t addr, uint32_t *buf, int num)
>>> +                             dma_addr_t addr, uint32_t *buf, int num)
>>>    {
>>>        int i;
>>>
>>>        addr += ohci->localmem_base;
>>>
>>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
>>> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
>>> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
>>>            *buf = le32_to_cpu(*buf);
>>>        }
>>>
>>> @@ -498,7 +499,7 @@ static inline int get_dwords(OHCIState *ohci,
>>>
>>>    /* Put an array of dwords in to main memory */
>>>    static inline int put_dwords(OHCIState *ohci,
>>> -                             uint32_t addr, uint32_t *buf, int num)
>>> +                             dma_addr_t addr, uint32_t *buf, int num)
>>>    {
>>>        int i;
>>>
>>> @@ -506,7 +507,7 @@ static inline int put_dwords(OHCIState *ohci,
>>>
>>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
>>>            uint32_t tmp = cpu_to_le32(*buf);
>>> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
>>> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
>>>        }
>>>
>>>        return 1;
>>> @@ -514,14 +515,14 @@ static inline int put_dwords(OHCIState *ohci,
>>>
>>>    /* Get an array of words from main memory */
>>>    static inline int get_words(OHCIState *ohci,
>>> -                            uint32_t addr, uint16_t *buf, int num)
>>> +                            dma_addr_t addr, uint16_t *buf, int num)
>>>    {
>>>        int i;
>>>
>>>        addr += ohci->localmem_base;
>>>
>>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
>>> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
>>> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
>>>            *buf = le16_to_cpu(*buf);
>>>        }
>>>
>>> @@ -530,7 +531,7 @@ static inline int get_words(OHCIState *ohci,
>>>
>>>    /* Put an array of words in to main memory */
>>>    static inline int put_words(OHCIState *ohci,
>>> -                            uint32_t addr, uint16_t *buf, int num)
>>> +                            dma_addr_t addr, uint16_t *buf, int num)
>>>    {
>>>        int i;
>>>
>>> @@ -538,40 +539,40 @@ static inline int put_words(OHCIState *ohci,
>>>
>>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
>>>            uint16_t tmp = cpu_to_le16(*buf);
>>> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
>>> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
>>>        }
>>>
>>>        return 1;
>>>    }
>>>
>>>    static inline int ohci_read_ed(OHCIState *ohci,
>>> -                               uint32_t addr, struct ohci_ed *ed)
>>> +                               dma_addr_t addr, struct ohci_ed *ed)
>>>    {
>>>        return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed)>>   2);
>>>    }
>>>
>>>    static inline int ohci_read_td(OHCIState *ohci,
>>> -                               uint32_t addr, struct ohci_td *td)
>>> +                               dma_addr_t addr, struct ohci_td *td)
>>>    {
>>>        return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>   2);
>>>    }
>>>
>>>    static inline int ohci_read_iso_td(OHCIState *ohci,
>>> -                                   uint32_t addr, struct ohci_iso_td *td)
>>> +                                   dma_addr_t addr, struct ohci_iso_td *td)
>>>    {
>>>        return (get_dwords(ohci, addr, (uint32_t *)td, 4)&&
>>>                get_words(ohci, addr + 16, td->offset, 8));
>>>    }
>>>
>>>    static inline int ohci_read_hcca(OHCIState *ohci,
>>> -                                 uint32_t addr, struct ohci_hcca *hcca)
>>> +                                 dma_addr_t addr, struct ohci_hcca *hcca)
>>>    {
>>> -    cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
>>> +    dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, sizeof(*hcca));
>>>        return 1;
>>>    }
>>>
>>>    static inline int ohci_put_ed(OHCIState *ohci,
>>> -                              uint32_t addr, struct ohci_ed *ed)
>>> +                              dma_addr_t addr, struct ohci_ed *ed)
>>>    {
>>>        /* ed->tail is under control of the HCD.
>>>         * Since just ed->head is changed by HC, just write back this
>>> @@ -583,64 +584,63 @@ static inline int ohci_put_ed(OHCIState *ohci,
>>>    }
>>>
>>>    static inline int ohci_put_td(OHCIState *ohci,
>>> -                              uint32_t addr, struct ohci_td *td)
>>> +                              dma_addr_t addr, struct ohci_td *td)
>>>    {
>>>        return put_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>   2);
>>>    }
>>>
>>>    static inline int ohci_put_iso_td(OHCIState *ohci,
>>> -                                  uint32_t addr, struct ohci_iso_td *td)
>>> +                                  dma_addr_t addr, struct ohci_iso_td *td)
>>>    {
>>>        return (put_dwords(ohci, addr, (uint32_t *)td, 4)&&
>>>                put_words(ohci, addr + 16, td->offset, 8));
>>>    }
>>>
>>>    static inline int ohci_put_hcca(OHCIState *ohci,
>>> -                                uint32_t addr, struct ohci_hcca *hcca)
>>> +                                dma_addr_t addr, struct ohci_hcca *hcca)
>>>    {
>>> -    cpu_physical_memory_write(addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
>>> -                              (char *)hcca + HCCA_WRITEBACK_OFFSET,
>>> -                              HCCA_WRITEBACK_SIZE);
>>> +    dma_memory_write(ohci->dma,
>>> +                     addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
>>> +                     (char *)hcca + HCCA_WRITEBACK_OFFSET,
>>> +                     HCCA_WRITEBACK_SIZE);
>>>        return 1;
>>>    }
>>>
>>>    /* Read/Write the contents of a TD from/to main memory.  */
>>>    static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
>>> -                         uint8_t *buf, int len, int write)
>>> +                         uint8_t *buf, int len, DMADirection dir)
>>>    {
>>> -    uint32_t ptr;
>>> -    uint32_t n;
>>> +    dma_addr_t ptr, n;
>>>
>>>        ptr = td->cbp;
>>>        n = 0x1000 - (ptr&   0xfff);
>>>        if (n>   len)
>>>            n = len;
>>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
>>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
>>>        if (n == len)
>>>            return;
>>>        ptr = td->be&   ~0xfffu;
>>>        buf += n;
>>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
>>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
>>>    }
>>>
>>>    /* Read/Write the contents of an ISO TD from/to main memory.  */
>>>    static void ohci_copy_iso_td(OHCIState *ohci,
>>>                                 uint32_t start_addr, uint32_t end_addr,
>>> -                             uint8_t *buf, int len, int write)
>>> +                             uint8_t *buf, int len, DMADirection dir)
>>>    {
>>> -    uint32_t ptr;
>>> -    uint32_t n;
>>> +    dma_addr_t ptr, n;
>>>
>>>        ptr = start_addr;
>>>        n = 0x1000 - (ptr&   0xfff);
>>>        if (n>   len)
>>>            n = len;
>>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
>>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
>>>        if (n == len)
>>>            return;
>>>        ptr = end_addr&   ~0xfffu;
>>>        buf += n;
>>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
>>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
>>>    }
>>>
>>>    static void ohci_process_lists(OHCIState *ohci, int completion);
>>> @@ -803,7 +803,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
>>>        }
>>>
>>>        if (len&&   dir != OHCI_TD_DIR_IN) {
>>> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len, 0);
>>> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len,
>>> +                         DMA_DIRECTION_TO_DEVICE);
>>>        }
>>>
>>>        if (completion) {
>>> @@ -827,7 +828,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
>>>        /* Writeback */
>>>        if (dir == OHCI_TD_DIR_IN&&   ret>= 0&&   ret<= len) {
>>>            /* IN transfer succeeded */
>>> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret, 1);
>>> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret,
>>> +                         DMA_DIRECTION_FROM_DEVICE);
>>>            OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_CC,
>>>                        OHCI_CC_NOERROR);
>>>            OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_SIZE, ret);
>>> @@ -971,7 +973,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>>>                    pktlen = len;
>>>                }
>>>                if (!completion) {
>>> -                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen, 0);
>>> +                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen,
>>> +                             DMA_DIRECTION_TO_DEVICE);
>>>                }
>>>            }
>>>        }
>>> @@ -1021,7 +1024,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
>>>        }
>>>        if (ret>= 0) {
>>>            if (dir == OHCI_TD_DIR_IN) {
>>> -            ohci_copy_td(ohci,&td, ohci->usb_buf, ret, 1);
>>> +            ohci_copy_td(ohci,&td, ohci->usb_buf, ret,
>>> +                         DMA_DIRECTION_FROM_DEVICE);
>>>    #ifdef DEBUG_PACKET
>>>                DPRINTF("  data:");
>>>                for (i = 0; i<   ret; i++)
>>> @@ -1748,11 +1752,14 @@ static USBBusOps ohci_bus_ops = {
>>>    };
>>>
>>>    static int usb_ohci_init(OHCIState *ohci, DeviceState *dev,
>>> -                         int num_ports, uint32_t localmem_base,
>>> -                         char *masterbus, uint32_t firstport)
>>> +                         int num_ports, dma_addr_t localmem_base,
>>> +                         char *masterbus, uint32_t firstport,
>>> +                         DMAContext *dma)
>>>    {
>>>        int i;
>>>
>>> +    ohci->dma = dma;
>>> +
>>>        if (usb_frame_time == 0) {
>>>    #ifdef OHCI_TIME_WARP
>>>            usb_frame_time = get_ticks_per_sec();
>>> @@ -1817,7 +1824,8 @@ static int usb_ohci_initfn_pci(struct PCIDevice *dev)
>>>        ohci->pci_dev.config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin A */
>>>
>>>        if (usb_ohci_init(&ohci->state,&dev->qdev, ohci->num_ports, 0,
>>> -                      ohci->masterbus, ohci->firstport) != 0) {
>>> +                      ohci->masterbus, ohci->firstport,
>>> +                      pci_dma_context(dev)) != 0) {
>>>            return -1;
>>>        }
>>>        ohci->state.irq = ohci->pci_dev.irq[0];
>>> @@ -1831,7 +1839,7 @@ typedef struct {
>>>        SysBusDevice busdev;
>>>        OHCIState ohci;
>>>        uint32_t num_ports;
>>> -    target_phys_addr_t dma_offset;
>>> +    dma_addr_t dma_offset;
>>>    } OHCISysBusState;
>>>
>>>    static int ohci_init_pxa(SysBusDevice *dev)
>>> @@ -1839,7 +1847,8 @@ static int ohci_init_pxa(SysBusDevice *dev)
>>>        OHCISysBusState *s = FROM_SYSBUS(OHCISysBusState, dev);
>>>
>>>        /* Cannot fail as we pass NULL for masterbus */
>>> -    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0);
>>> +    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0,
>>> +                  NULL);
>>>        sysbus_init_irq(dev,&s->ohci.irq);
>>>        sysbus_init_mmio(dev,&s->ohci.mem);
>>>
>>> @@ -1875,7 +1884,7 @@ static TypeInfo ohci_pci_info = {
>>>
>>>    static Property ohci_sysbus_properties[] = {
>>>        DEFINE_PROP_UINT32("num-ports", OHCISysBusState, num_ports, 3),
>>> -    DEFINE_PROP_TADDR("dma-offset", OHCISysBusState, dma_offset, 3),
>>> +    DEFINE_PROP_DMAADDR("dma-offset", OHCISysBusState, dma_offset, 3),
>>>        DEFINE_PROP_END_OF_LIST(),
>>>    };
>>>
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:38       ` Anthony Liguori
@ 2012-06-20 21:42         ` Michael S. Tsirkin
  2012-06-20 21:46           ` Anthony Liguori
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-20 21:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On Wed, Jun 20, 2012 at 04:38:30PM -0500, Anthony Liguori wrote:
> On 06/20/2012 04:32 PM, Michael S. Tsirkin wrote:
> >On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
> >>>diff --git a/hw/pci.h b/hw/pci.h
> >>>index 7f223c0..ee669d9 100644
> >>>--- a/hw/pci.h
> >>>+++ b/hw/pci.h
> >>>@@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
> >>>  }
> >>>
> >>>  /* DMA access functions */
> >>>+static inline DMAContext *pci_dma_context(PCIDevice *dev)
> >>>+{
> >>>+    /* Stub for when we have no PCI iommu support */
> >>>+    return NULL;
> >>>+}
> >>
> >>Why is all of this stuff static inline?
> >
> >Let's face it, most people don't need an MMU in their VM.
> >inline stubs help make double sure we are not adding
> >overhead for the sake of this niche case.
> 
> It also makes for an overly complex pci.h with no obvious performance justification.
> 
A stub in a header plus an offline empty function is even more useless
code. inline stubs is standard procedure.

> Let's not prematurely optimize here.
> 
> Regards,
> 
> Anthony Liguori

It's not just an optimization.  It is easier to see what's going on this
way.

-- 
MST

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:42         ` Michael S. Tsirkin
@ 2012-06-20 21:46           ` Anthony Liguori
  2012-06-20 22:00             ` Michael S. Tsirkin
  0 siblings, 1 reply; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 21:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On 06/20/2012 04:42 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 20, 2012 at 04:38:30PM -0500, Anthony Liguori wrote:
>> On 06/20/2012 04:32 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
>>>>> diff --git a/hw/pci.h b/hw/pci.h
>>>>> index 7f223c0..ee669d9 100644
>>>>> --- a/hw/pci.h
>>>>> +++ b/hw/pci.h
>>>>> @@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
>>>>>   }
>>>>>
>>>>>   /* DMA access functions */
>>>>> +static inline DMAContext *pci_dma_context(PCIDevice *dev)
>>>>> +{
>>>>> +    /* Stub for when we have no PCI iommu support */
>>>>> +    return NULL;
>>>>> +}
>>>>
>>>> Why is all of this stuff static inline?
>>>
>>> Let's face it, most people don't need an MMU in their VM.
>>> inline stubs help make double sure we are not adding
>>> overhead for the sake of this niche case.
>>
>> It also makes for an overly complex pci.h with no obvious performance justification.
>>
> A stub in a header plus an offline empty function is even more useless
> code. inline stubs is standard procedure.

Look at 8/13.  They don't stay stubs for long.

Regards,

Anthony Liguori

>
>> Let's not prematurely optimize here.
>>
>> Regards,
>>
>> Anthony Liguori
>
> It's not just an optimization.  It is easier to see what's going on this
> way.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps
  2012-06-20 21:25   ` Anthony Liguori
@ 2012-06-20 21:52     ` Benjamin Herrenschmidt
  2012-06-22  3:18     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 21:52 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:25 -0500, Anthony Liguori wrote:

> So this cancellation stuff is hopelessly broken
> 
> It's simply not possible to fully cancel pending DMA in a synchronous callback.

Well, at least for PAPR H_PUT_TCE, cancellation must be synchronous, ie
the hypercall must not return until the cancellation is complete.

> Indeed, bdrv_aio_cancel ends up having a nasty little loop in it:
> 
>      if (active) {
>          /* fail safe: if the aio could not be canceled, we wait for
>             it */
>          while (qemu_paio_error(acb) == EINPROGRESS)
>              ;
>      }
> 
> That spins w/100% CPU.
> 
> Can you explain when DMA cancellation really happens and what the effect would 
> be if we simply ignored it?

It will almost never happen in practice. It will actually never happen
with the current patch series. Where it -will- eventually happen in the
long run is if the guest removes a translation that is "in use" by a
dma_map() mapping established by a device. It's always a guest
programming error though and it's not an attack vector since the guest
can only shoot itself in the foot, but it might make things like kdump
less reliable inside the guest.

We need a way to signal the device that the translation is going away
and we need -a- way to synchronize though it could be a two step process
(see below).

> Can we do something more clever like use an asynchronous callback to handle 
> flushing active DMA mappings?
> 
> There's just no way a callback like this is going to work.

Ok so first let's see what happens in real HW: One of the DMA accesses
gets a target abort return from the host bridge. The device interrupts
it's operations and signals an error.

Now, I agree that requiring a cancel callback to act synchronously might
be a bit fishy, so what about we define the following semantics:

 - First this assumes our iommu backend decides to implement that level
of correctness, as I said above, none do in that patch series (yet)

 - The basic idea is that for most iommu, there's an MMIO to start a TLB
flush and an MMIO the guest uses to spin on to get the status as to
whether the TLB flush has completed, so we can do things asynchronously
that way. However we -still- need to do things synchronously for the
hypercall used by PAPR, but as we discussed earlier that can be done
without spinning, by delaying the completion of the hypercall.

 - So step 1, no callback at all. When an iommu TLB flush operation is
started, we tag all pending maps (see below). We signal completion when
all those maps have been unmapped.

 - The above tagging can be done using some kind of generation count
along with an ordered list of maps, we keep track of the "oldest" map
still active, that sort of thing. Not too hard.

 - step 2, because some maps can be long lived and we don't want to
delay invalidations for ever, we add a cancel callback which device can
optionally installed along with a map. This callback is only meant to
-initiate- a cancellation in order to speed up when the unmap will
occur.

What do you think ? Would that work ? As I explained in my email
exchange with Gerd, there's quite a few issues in actually implementing
cancellation properly anyway, for example, today on PAPR, H_PUT_TCE is
implemented by the kernel KVM in real mode for performance reasons. So
we would need to change the KVM API to be able to keep the kernel
informed that there are maps covering portions of the iommu space (a
bitmap ?) to force exists to qemu when an invalidation collides with a
map for example.

Additionally, to be totally -correct-, we would need synchronization
with qemu to a much larger extent. IE. Any invalidation must also make
sure that anything that used a previous translation has completed, ie,
even the simple dma_ld/st ops must be synchronized in theory.

My conclusion is that the complexity of solving the problem is huge,
while the actual problem scope is close to non-existent. So I think we
can safely merge the series and ignore the issue for the time being.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:46           ` Anthony Liguori
@ 2012-06-20 22:00             ` Michael S. Tsirkin
  0 siblings, 0 replies; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-20 22:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On Wed, Jun 20, 2012 at 04:46:49PM -0500, Anthony Liguori wrote:
> On 06/20/2012 04:42 PM, Michael S. Tsirkin wrote:
> >On Wed, Jun 20, 2012 at 04:38:30PM -0500, Anthony Liguori wrote:
> >>On 06/20/2012 04:32 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
> >>>>>diff --git a/hw/pci.h b/hw/pci.h
> >>>>>index 7f223c0..ee669d9 100644
> >>>>>--- a/hw/pci.h
> >>>>>+++ b/hw/pci.h
> >>>>>@@ -558,10 +558,16 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
> >>>>>  }
> >>>>>
> >>>>>  /* DMA access functions */
> >>>>>+static inline DMAContext *pci_dma_context(PCIDevice *dev)
> >>>>>+{
> >>>>>+    /* Stub for when we have no PCI iommu support */
> >>>>>+    return NULL;
> >>>>>+}
> >>>>
> >>>>Why is all of this stuff static inline?
> >>>
> >>>Let's face it, most people don't need an MMU in their VM.
> >>>inline stubs help make double sure we are not adding
> >>>overhead for the sake of this niche case.
> >>
> >>It also makes for an overly complex pci.h with no obvious performance justification.
> >>
> >A stub in a header plus an offline empty function is even more useless
> >code. inline stubs is standard procedure.
> 
> Look at 8/13.  They don't stay stubs for long.

That does ont seem to touch pci.h?

inlines in dma.h make sense too: a small inline wrapper that selects
between the iommu/non iommu variant and an offline implementation for
the iommu one.


> Regards,
> 
> Anthony Liguori
> 
> >
> >>Let's not prematurely optimize here.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >
> >It's not just an optimization.  It is easier to see what's going on this
> >way.
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:40     ` Michael S. Tsirkin
@ 2012-06-20 22:01       ` Anthony Liguori
  0 siblings, 0 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 22:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel, David Gibson

On 06/20/2012 04:40 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
>>> diff --git a/qemu-common.h b/qemu-common.h
>>> index 8f87e41..80026af 100644
>>> --- a/qemu-common.h
>>> +++ b/qemu-common.h
>>> @@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
>>>   typedef struct VirtIODevice VirtIODevice;
>>>   typedef struct QEMUSGList QEMUSGList;
>>>   typedef struct SHPCDevice SHPCDevice;
>>> +typedef struct DMAContext DMAContext;
>>
>> Please don't put this in qemu-common.h.  Stick it in a dma-specific header.
>
> Weird.
>
> The point of typedefs in qemu-common.h is so people can
> use type pointer *without pulling in the relevant header*.

You're providing a back explanation to something that was completely unrelated...

qemu-common.h was created because everything (literally everything) was in a 
single vl.h.

So qemu-common.h was simply the left over crap from vl.h that didn't have a home 
elsewhere.

It was never intended that we'd keep adding more stuff to qemu-common.h.

Regards,

Anthony Liguori

> If we put a typedef in specific header it defeats the purpose.
>
> It used to even say this somewhere so I don't remember where.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-20 21:40       ` Anthony Liguori
@ 2012-06-20 22:02         ` Benjamin Herrenschmidt
  2012-06-21  7:33           ` Michael S. Tsirkin
  2012-06-21  6:43         ` Gerd Hoffmann
  1 sibling, 1 reply; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-20 22:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, Gerd Hoffmann, qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:40 -0500, Anthony Liguori wrote:

> Well let's return void in the DMA methods and let the IOMMUs assert on error. 
> At least that will avoid surprises until someone decides they care enough about 
> errors to touch all callers.
> 
> I think silently failing a memcpy() can potentially lead to a vulnerability so 
> I'd rather avoid that.

No I'd rather keep the error returns, really, even if that means fixing
a few devices. I can look at making sure we don't pass random qemu data,
on error that's reasonably easy.

assert on error means guest code can assert qemu ... not a great idea
but maybe we can add a warning.

> >> Why leave pci accessors and not implement usb_memory_rw() wrappers?
> >
> > Well, "usb" is a bit too generic, ehci and ohci would each need to have
> > their own sets of wrappers. But yes, that's possible... is it really
> > worth it ? There's nothing fundamentally wrong with using the dma_*
> > accessors.
> 
> So is using the pci accessors wrong?

Not really either, I don't think it matters :-)

> I'm not saying you should go and convert every caller of the pci_ functions, I 
> just want a clear policy on what interface devices should use.

Ideally the bus interface for the bus they sit on so they don't have to
bother digging the DMAContext and are immune to change we would do in
that area.

Devices that mix multiple bus types however are a bit more tricky, but
so far are few, and those can use dma_* and know where to get the
DMAContext from.

If we ever replace DMAContext with something else we can probably just
change the field to that "something else" with a very simple
search/replace on those devices (at least that's the best case :-)

I think anything else is just no worth bothering.

Cheers,
Ben.

> Regards,
> 
> Anthony Liguori
> 
> >
> > Cheers,
> > Ben.
> >
> >> Regards,
> >>
> >> Anthony Liguori
> >>
> >>> ---
> >>>    hw/usb/hcd-ohci.c |   93 +++++++++++++++++++++++++++++------------------------
> >>>    1 file changed, 51 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> >>> index 1a1cc88..844e7ed 100644
> >>> --- a/hw/usb/hcd-ohci.c
> >>> +++ b/hw/usb/hcd-ohci.c
> >>> @@ -31,7 +31,7 @@
> >>>    #include "hw/usb.h"
> >>>    #include "hw/pci.h"
> >>>    #include "hw/sysbus.h"
> >>> -#include "hw/qdev-addr.h"
> >>> +#include "hw/qdev-dma.h"
> >>>
> >>>    //#define DEBUG_OHCI
> >>>    /* Dump packet contents.  */
> >>> @@ -62,6 +62,7 @@ typedef struct {
> >>>        USBBus bus;
> >>>        qemu_irq irq;
> >>>        MemoryRegion mem;
> >>> +    DMAContext *dma;
> >>>        int num_ports;
> >>>        const char *name;
> >>>
> >>> @@ -104,7 +105,7 @@ typedef struct {
> >>>        uint32_t htest;
> >>>
> >>>        /* SM501 local memory offset */
> >>> -    target_phys_addr_t localmem_base;
> >>> +    dma_addr_t localmem_base;
> >>>
> >>>        /* Active packets.  */
> >>>        uint32_t old_ctl;
> >>> @@ -482,14 +483,14 @@ static void ohci_reset(void *opaque)
> >>>
> >>>    /* Get an array of dwords from main memory */
> >>>    static inline int get_dwords(OHCIState *ohci,
> >>> -                             uint32_t addr, uint32_t *buf, int num)
> >>> +                             dma_addr_t addr, uint32_t *buf, int num)
> >>>    {
> >>>        int i;
> >>>
> >>>        addr += ohci->localmem_base;
> >>>
> >>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
> >>> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> >>> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
> >>>            *buf = le32_to_cpu(*buf);
> >>>        }
> >>>
> >>> @@ -498,7 +499,7 @@ static inline int get_dwords(OHCIState *ohci,
> >>>
> >>>    /* Put an array of dwords in to main memory */
> >>>    static inline int put_dwords(OHCIState *ohci,
> >>> -                             uint32_t addr, uint32_t *buf, int num)
> >>> +                             dma_addr_t addr, uint32_t *buf, int num)
> >>>    {
> >>>        int i;
> >>>
> >>> @@ -506,7 +507,7 @@ static inline int put_dwords(OHCIState *ohci,
> >>>
> >>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
> >>>            uint32_t tmp = cpu_to_le32(*buf);
> >>> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> >>> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
> >>>        }
> >>>
> >>>        return 1;
> >>> @@ -514,14 +515,14 @@ static inline int put_dwords(OHCIState *ohci,
> >>>
> >>>    /* Get an array of words from main memory */
> >>>    static inline int get_words(OHCIState *ohci,
> >>> -                            uint32_t addr, uint16_t *buf, int num)
> >>> +                            dma_addr_t addr, uint16_t *buf, int num)
> >>>    {
> >>>        int i;
> >>>
> >>>        addr += ohci->localmem_base;
> >>>
> >>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
> >>> -        cpu_physical_memory_read(addr, buf, sizeof(*buf));
> >>> +        dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
> >>>            *buf = le16_to_cpu(*buf);
> >>>        }
> >>>
> >>> @@ -530,7 +531,7 @@ static inline int get_words(OHCIState *ohci,
> >>>
> >>>    /* Put an array of words in to main memory */
> >>>    static inline int put_words(OHCIState *ohci,
> >>> -                            uint32_t addr, uint16_t *buf, int num)
> >>> +                            dma_addr_t addr, uint16_t *buf, int num)
> >>>    {
> >>>        int i;
> >>>
> >>> @@ -538,40 +539,40 @@ static inline int put_words(OHCIState *ohci,
> >>>
> >>>        for (i = 0; i<   num; i++, buf++, addr += sizeof(*buf)) {
> >>>            uint16_t tmp = cpu_to_le16(*buf);
> >>> -        cpu_physical_memory_write(addr,&tmp, sizeof(tmp));
> >>> +        dma_memory_write(ohci->dma, addr,&tmp, sizeof(tmp));
> >>>        }
> >>>
> >>>        return 1;
> >>>    }
> >>>
> >>>    static inline int ohci_read_ed(OHCIState *ohci,
> >>> -                               uint32_t addr, struct ohci_ed *ed)
> >>> +                               dma_addr_t addr, struct ohci_ed *ed)
> >>>    {
> >>>        return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed)>>   2);
> >>>    }
> >>>
> >>>    static inline int ohci_read_td(OHCIState *ohci,
> >>> -                               uint32_t addr, struct ohci_td *td)
> >>> +                               dma_addr_t addr, struct ohci_td *td)
> >>>    {
> >>>        return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>   2);
> >>>    }
> >>>
> >>>    static inline int ohci_read_iso_td(OHCIState *ohci,
> >>> -                                   uint32_t addr, struct ohci_iso_td *td)
> >>> +                                   dma_addr_t addr, struct ohci_iso_td *td)
> >>>    {
> >>>        return (get_dwords(ohci, addr, (uint32_t *)td, 4)&&
> >>>                get_words(ohci, addr + 16, td->offset, 8));
> >>>    }
> >>>
> >>>    static inline int ohci_read_hcca(OHCIState *ohci,
> >>> -                                 uint32_t addr, struct ohci_hcca *hcca)
> >>> +                                 dma_addr_t addr, struct ohci_hcca *hcca)
> >>>    {
> >>> -    cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
> >>> +    dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, sizeof(*hcca));
> >>>        return 1;
> >>>    }
> >>>
> >>>    static inline int ohci_put_ed(OHCIState *ohci,
> >>> -                              uint32_t addr, struct ohci_ed *ed)
> >>> +                              dma_addr_t addr, struct ohci_ed *ed)
> >>>    {
> >>>        /* ed->tail is under control of the HCD.
> >>>         * Since just ed->head is changed by HC, just write back this
> >>> @@ -583,64 +584,63 @@ static inline int ohci_put_ed(OHCIState *ohci,
> >>>    }
> >>>
> >>>    static inline int ohci_put_td(OHCIState *ohci,
> >>> -                              uint32_t addr, struct ohci_td *td)
> >>> +                              dma_addr_t addr, struct ohci_td *td)
> >>>    {
> >>>        return put_dwords(ohci, addr, (uint32_t *)td, sizeof(*td)>>   2);
> >>>    }
> >>>
> >>>    static inline int ohci_put_iso_td(OHCIState *ohci,
> >>> -                                  uint32_t addr, struct ohci_iso_td *td)
> >>> +                                  dma_addr_t addr, struct ohci_iso_td *td)
> >>>    {
> >>>        return (put_dwords(ohci, addr, (uint32_t *)td, 4)&&
> >>>                put_words(ohci, addr + 16, td->offset, 8));
> >>>    }
> >>>
> >>>    static inline int ohci_put_hcca(OHCIState *ohci,
> >>> -                                uint32_t addr, struct ohci_hcca *hcca)
> >>> +                                dma_addr_t addr, struct ohci_hcca *hcca)
> >>>    {
> >>> -    cpu_physical_memory_write(addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> >>> -                              (char *)hcca + HCCA_WRITEBACK_OFFSET,
> >>> -                              HCCA_WRITEBACK_SIZE);
> >>> +    dma_memory_write(ohci->dma,
> >>> +                     addr + ohci->localmem_base + HCCA_WRITEBACK_OFFSET,
> >>> +                     (char *)hcca + HCCA_WRITEBACK_OFFSET,
> >>> +                     HCCA_WRITEBACK_SIZE);
> >>>        return 1;
> >>>    }
> >>>
> >>>    /* Read/Write the contents of a TD from/to main memory.  */
> >>>    static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
> >>> -                         uint8_t *buf, int len, int write)
> >>> +                         uint8_t *buf, int len, DMADirection dir)
> >>>    {
> >>> -    uint32_t ptr;
> >>> -    uint32_t n;
> >>> +    dma_addr_t ptr, n;
> >>>
> >>>        ptr = td->cbp;
> >>>        n = 0x1000 - (ptr&   0xfff);
> >>>        if (n>   len)
> >>>            n = len;
> >>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> >>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
> >>>        if (n == len)
> >>>            return;
> >>>        ptr = td->be&   ~0xfffu;
> >>>        buf += n;
> >>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> >>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
> >>>    }
> >>>
> >>>    /* Read/Write the contents of an ISO TD from/to main memory.  */
> >>>    static void ohci_copy_iso_td(OHCIState *ohci,
> >>>                                 uint32_t start_addr, uint32_t end_addr,
> >>> -                             uint8_t *buf, int len, int write)
> >>> +                             uint8_t *buf, int len, DMADirection dir)
> >>>    {
> >>> -    uint32_t ptr;
> >>> -    uint32_t n;
> >>> +    dma_addr_t ptr, n;
> >>>
> >>>        ptr = start_addr;
> >>>        n = 0x1000 - (ptr&   0xfff);
> >>>        if (n>   len)
> >>>            n = len;
> >>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
> >>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, n, dir);
> >>>        if (n == len)
> >>>            return;
> >>>        ptr = end_addr&   ~0xfffu;
> >>>        buf += n;
> >>> -    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
> >>> +    dma_memory_rw(ohci->dma, ptr + ohci->localmem_base, buf, len - n, dir);
> >>>    }
> >>>
> >>>    static void ohci_process_lists(OHCIState *ohci, int completion);
> >>> @@ -803,7 +803,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
> >>>        }
> >>>
> >>>        if (len&&   dir != OHCI_TD_DIR_IN) {
> >>> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len, 0);
> >>> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, len,
> >>> +                         DMA_DIRECTION_TO_DEVICE);
> >>>        }
> >>>
> >>>        if (completion) {
> >>> @@ -827,7 +828,8 @@ static int ohci_service_iso_td(OHCIState *ohci, struct ohci_ed *ed,
> >>>        /* Writeback */
> >>>        if (dir == OHCI_TD_DIR_IN&&   ret>= 0&&   ret<= len) {
> >>>            /* IN transfer succeeded */
> >>> -        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret, 1);
> >>> +        ohci_copy_iso_td(ohci, start_addr, end_addr, ohci->usb_buf, ret,
> >>> +                         DMA_DIRECTION_FROM_DEVICE);
> >>>            OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_CC,
> >>>                        OHCI_CC_NOERROR);
> >>>            OHCI_SET_BM(iso_td.offset[relative_frame_number], TD_PSW_SIZE, ret);
> >>> @@ -971,7 +973,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
> >>>                    pktlen = len;
> >>>                }
> >>>                if (!completion) {
> >>> -                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen, 0);
> >>> +                ohci_copy_td(ohci,&td, ohci->usb_buf, pktlen,
> >>> +                             DMA_DIRECTION_TO_DEVICE);
> >>>                }
> >>>            }
> >>>        }
> >>> @@ -1021,7 +1024,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed *ed)
> >>>        }
> >>>        if (ret>= 0) {
> >>>            if (dir == OHCI_TD_DIR_IN) {
> >>> -            ohci_copy_td(ohci,&td, ohci->usb_buf, ret, 1);
> >>> +            ohci_copy_td(ohci,&td, ohci->usb_buf, ret,
> >>> +                         DMA_DIRECTION_FROM_DEVICE);
> >>>    #ifdef DEBUG_PACKET
> >>>                DPRINTF("  data:");
> >>>                for (i = 0; i<   ret; i++)
> >>> @@ -1748,11 +1752,14 @@ static USBBusOps ohci_bus_ops = {
> >>>    };
> >>>
> >>>    static int usb_ohci_init(OHCIState *ohci, DeviceState *dev,
> >>> -                         int num_ports, uint32_t localmem_base,
> >>> -                         char *masterbus, uint32_t firstport)
> >>> +                         int num_ports, dma_addr_t localmem_base,
> >>> +                         char *masterbus, uint32_t firstport,
> >>> +                         DMAContext *dma)
> >>>    {
> >>>        int i;
> >>>
> >>> +    ohci->dma = dma;
> >>> +
> >>>        if (usb_frame_time == 0) {
> >>>    #ifdef OHCI_TIME_WARP
> >>>            usb_frame_time = get_ticks_per_sec();
> >>> @@ -1817,7 +1824,8 @@ static int usb_ohci_initfn_pci(struct PCIDevice *dev)
> >>>        ohci->pci_dev.config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin A */
> >>>
> >>>        if (usb_ohci_init(&ohci->state,&dev->qdev, ohci->num_ports, 0,
> >>> -                      ohci->masterbus, ohci->firstport) != 0) {
> >>> +                      ohci->masterbus, ohci->firstport,
> >>> +                      pci_dma_context(dev)) != 0) {
> >>>            return -1;
> >>>        }
> >>>        ohci->state.irq = ohci->pci_dev.irq[0];
> >>> @@ -1831,7 +1839,7 @@ typedef struct {
> >>>        SysBusDevice busdev;
> >>>        OHCIState ohci;
> >>>        uint32_t num_ports;
> >>> -    target_phys_addr_t dma_offset;
> >>> +    dma_addr_t dma_offset;
> >>>    } OHCISysBusState;
> >>>
> >>>    static int ohci_init_pxa(SysBusDevice *dev)
> >>> @@ -1839,7 +1847,8 @@ static int ohci_init_pxa(SysBusDevice *dev)
> >>>        OHCISysBusState *s = FROM_SYSBUS(OHCISysBusState, dev);
> >>>
> >>>        /* Cannot fail as we pass NULL for masterbus */
> >>> -    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0);
> >>> +    usb_ohci_init(&s->ohci,&dev->qdev, s->num_ports, s->dma_offset, NULL, 0,
> >>> +                  NULL);
> >>>        sysbus_init_irq(dev,&s->ohci.irq);
> >>>        sysbus_init_mmio(dev,&s->ohci.mem);
> >>>
> >>> @@ -1875,7 +1884,7 @@ static TypeInfo ohci_pci_info = {
> >>>
> >>>    static Property ohci_sysbus_properties[] = {
> >>>        DEFINE_PROP_UINT32("num-ports", OHCISysBusState, num_ports, 3),
> >>> -    DEFINE_PROP_TADDR("dma-offset", OHCISysBusState, dma_offset, 3),
> >>> +    DEFINE_PROP_DMAADDR("dma-offset", OHCISysBusState, dma_offset, 3),
> >>>        DEFINE_PROP_END_OF_LIST(),
> >>>    };
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 21:14   ` Anthony Liguori
  2012-06-20 21:29     ` Benjamin Herrenschmidt
@ 2012-06-20 22:26     ` Peter Maydell
  2012-06-20 22:59       ` Anthony Liguori
  2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 63+ messages in thread
From: Peter Maydell @ 2012-06-20 22:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On 20 June 2012 22:14, Anthony Liguori <anthony@codemonkey.ws> wrote:
> Why not make life easy and fix dma_addr_t to 64-bit?

...for that matter weren't we tossing around the idea of just
making target_phys_addr_t 64 bits for everything? (I actually
want to do this for target-arm anyway; last time I did some
quick smoke-tests of performance it didn't seem to hurt really
even on a 32 bit host, and it avoids having to put the A15 in
a different qemu-system-* binary to the other cores.)

-- PMM

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 22:26     ` Peter Maydell
@ 2012-06-20 22:59       ` Anthony Liguori
  2012-06-21  7:54         ` Peter Maydell
  0 siblings, 1 reply; 63+ messages in thread
From: Anthony Liguori @ 2012-06-20 22:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, David Gibson

On 06/20/2012 05:26 PM, Peter Maydell wrote:
> On 20 June 2012 22:14, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> Why not make life easy and fix dma_addr_t to 64-bit?
>
> ...for that matter weren't we tossing around the idea of just
> making target_phys_addr_t 64 bits for everything? (I actually
> want to do this for target-arm anyway; last time I did some
> quick smoke-tests of performance it didn't seem to hurt really
> even on a 32 bit host, and it avoids having to put the A15 in
> a different qemu-system-* binary to the other cores.)

Didn't you whine and moan about the impact to printf()s last time I did this? ;-)

I can refresh my patch and resubmit..

Regards,

Anthony Liguori

>
> -- PMM

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers
  2012-06-20  3:52         ` Benjamin Herrenschmidt
@ 2012-06-21  1:42           ` David Gibson
  0 siblings, 0 replies; 63+ messages in thread
From: David Gibson @ 2012-06-21  1:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Gerd Hoffmann, anthony, qemu-devel

On Wed, Jun 20, 2012 at 01:52:12PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote:
> > So, in fact the original comment is a bit out of date.  With the
> > current version of this series, then a guest attempt to invalidate
> > will be delayed until the unmap occurs. 
> 
> No, this code was dropped, including the tracking of the maps, following
> comments from Anthony and others. The API for providing a cancel
> callback is still there but nothing will call it unless the backend does
> its own tracking and decides to do so.

Ah, right.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 21:29     ` Benjamin Herrenschmidt
@ 2012-06-21  1:44       ` David Gibson
  0 siblings, 0 replies; 63+ messages in thread
From: David Gibson @ 2012-06-21  1:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: qemu-devel, Anthony Liguori

On Thu, Jun 21, 2012 at 07:29:23AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-20 at 16:14 -0500, Anthony Liguori wrote:
> 
> > Why not make life easy and fix dma_addr_t to 64-bit?
> 
> No opinion on my side, that's from the original patch series, I suppose
> the goal was to avoid the overhead/bloat on 32-bit only
> platforms/targets.

More or less.  I think I do set it to 64-bit later in the series.
Particularly when IOMMU was configurable off, I had it set to
target_phys_addr_t given people's propensity to bitch and moan about
the slightest theoretical bloat.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-20 21:15   ` Anthony Liguori
  2012-06-20 21:30     ` Benjamin Herrenschmidt
@ 2012-06-21  1:45     ` David Gibson
  2012-06-21  1:46       ` David Gibson
  2012-06-21  2:50       ` Benjamin Herrenschmidt
  2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 2 replies; 63+ messages in thread
From: David Gibson @ 2012-06-21  1:45 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On Wed, Jun 20, 2012 at 04:15:13PM -0500, Anthony Liguori wrote:
> On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> >From: David Gibson<david@gibson.dropbear.id.au>
> >
> >This patch adds cpu_physical_memory_set() function.  This is equivalent to
> >calling cpu_physical_memory_write() with a buffer filled with a character,
> >ie, a memset of target memory.
> >
> >It uses a small temporary buffer on the stack.
> >
> >Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> >Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> 
> Why should this be in the core API?  Shouldn't this be a helper on
> top of the DMA API?

Well, I was hoping to avoid having to allocate a temporary buffer of
zeroes, which is necessary to do this in terms of the existing
cpu_physical_memory_write() api.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-21  1:45     ` David Gibson
@ 2012-06-21  1:46       ` David Gibson
  2012-06-21  2:50       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: David Gibson @ 2012-06-21  1:46 UTC (permalink / raw)
  To: Anthony Liguori, Benjamin Herrenschmidt, qemu-devel

On Thu, Jun 21, 2012 at 11:45:14AM +1000, David Gibson wrote:
> On Wed, Jun 20, 2012 at 04:15:13PM -0500, Anthony Liguori wrote:
> > On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> > >From: David Gibson<david@gibson.dropbear.id.au>
> > >
> > >This patch adds cpu_physical_memory_set() function.  This is equivalent to
> > >calling cpu_physical_memory_write() with a buffer filled with a character,
> > >ie, a memset of target memory.
> > >
> > >It uses a small temporary buffer on the stack.
> > >
> > >Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > >Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> > 
> > Why should this be in the core API?  Shouldn't this be a helper on
> > top of the DMA API?
> 
> Well, I was hoping to avoid having to allocate a temporary buffer of
> zeroes, which is necessary to do this in terms of the existing
> cpu_physical_memory_write() api.

Ugh, sorry, I'm out of date again.  That's what I did do; now it does
have a temp buf because you already asked ben to get rid of the
duplicated memory write logic, so I guess it could be at the dma layer
instead.  I'm pretty sure at least one person suggested it be at this
layer though.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:16   ` Anthony Liguori
                       ` (2 preceding siblings ...)
  2012-06-20 21:40     ` Michael S. Tsirkin
@ 2012-06-21  1:48     ` David Gibson
  2012-06-22  2:02     ` Benjamin Herrenschmidt
  4 siblings, 0 replies; 63+ messages in thread
From: David Gibson @ 2012-06-21  1:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Richard Henderson, Eduard - Gabriel Munteanu, qemu-devel,
	Michael S. Tsirkin

On Wed, Jun 20, 2012 at 04:16:47PM -0500, Anthony Liguori wrote:
[snip]
> >diff --git a/qemu-common.h b/qemu-common.h
> >index 8f87e41..80026af 100644
> >--- a/qemu-common.h
> >+++ b/qemu-common.h
> >@@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
> >  typedef struct VirtIODevice VirtIODevice;
> >  typedef struct QEMUSGList QEMUSGList;
> >  typedef struct SHPCDevice SHPCDevice;
> >+typedef struct DMAContext DMAContext;
> 
> Please don't put this in qemu-common.h.  Stick it in a dma-specific
> header.

I'm pretty sure I started to hit circular include hell without the
forward declaration here.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-21  1:45     ` David Gibson
  2012-06-21  1:46       ` David Gibson
@ 2012-06-21  2:50       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-21  2:50 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, Anthony Liguori

On Thu, 2012-06-21 at 11:45 +1000, David Gibson wrote:
> > Why should this be in the core API?  Shouldn't this be a helper on
> > top of the DMA API?
> 
> Well, I was hoping to avoid having to allocate a temporary buffer of
> zeroes, which is necessary to do this in terms of the existing
> cpu_physical_memory_write() api.

and which I ended up doing anyway in the latest patch following previous
reviews where people barfed at having a duplication of the code to
access the guest memory :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-20 21:40       ` Anthony Liguori
  2012-06-20 22:02         ` Benjamin Herrenschmidt
@ 2012-06-21  6:43         ` Gerd Hoffmann
  1 sibling, 0 replies; 63+ messages in thread
From: Gerd Hoffmann @ 2012-06-21  6:43 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Michael S. Tsirkin, qemu-devel, David Gibson

  Hi,

>>> Why leave pci accessors and not implement usb_memory_rw() wrappers?
>>
>> Well, "usb" is a bit too generic, ehci and ohci would each need to have
>> their own sets of wrappers. But yes, that's possible... is it really
>> worth it ? There's nothing fundamentally wrong with using the dma_*
>> accessors.
> 
> So is using the pci accessors wrong?
> 
> I'm not saying you should go and convert every caller of the pci_
> functions, I just want a clear policy on what interface devices should use.

usb device emulations should use usb_packet_copy()

usb host adapters emulations should use either usb_packet_map() +
usb_packet_unmap(), or use usb_packet_addbuf(), then copy from/to the
buffer using whatever is approximate.  For pci host controller that is
pci_memory_rw().  For ohci which exists in both pci and non-pci variants
it looks reasonable to me to get a iommu handle in bus-specific code,
then use dma_memory_rw with that handle directly in the common code paths.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-20 22:02         ` Benjamin Herrenschmidt
@ 2012-06-21  7:33           ` Michael S. Tsirkin
  2012-06-21 12:55             ` Anthony Liguori
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-21  7:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gerd Hoffmann, qemu-devel, Anthony Liguori, David Gibson

On Thu, Jun 21, 2012 at 08:02:06AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-20 at 16:40 -0500, Anthony Liguori wrote:
> 
> > Well let's return void in the DMA methods and let the IOMMUs assert on error. 
> > At least that will avoid surprises until someone decides they care enough about 
> > errors to touch all callers.
> > 
> > I think silently failing a memcpy() can potentially lead to a vulnerability so 
> > I'd rather avoid that.
> 
> No I'd rather keep the error returns, really, even if that means fixing
> a few devices. I can look at making sure we don't pass random qemu data,
> on error that's reasonably easy.
> 
> assert on error means guest code can assert qemu ... not a great idea
> but maybe we can add a warning.

Why not?  Guest can always just halt if it wants to anyway.
On the other hand, warnings can fill up host logs so
represent a security problem.

> > >> Why leave pci accessors and not implement usb_memory_rw() wrappers?
> > >
> > > Well, "usb" is a bit too generic, ehci and ohci would each need to have
> > > their own sets of wrappers. But yes, that's possible... is it really
> > > worth it ? There's nothing fundamentally wrong with using the dma_*
> > > accessors.
> > 
> > So is using the pci accessors wrong?
> 
> Not really either, I don't think it matters :-)
> 
> > I'm not saying you should go and convert every caller of the pci_ functions, I 
> > just want a clear policy on what interface devices should use.
> 
> Ideally the bus interface for the bus they sit on so they don't have to
> bother digging the DMAContext and are immune to change we would do in
> that area.
> 
> Devices that mix multiple bus types however are a bit more tricky, but
> so far are few, and those can use dma_* and know where to get the
> DMAContext from.
> 
> If we ever replace DMAContext with something else we can probably just
> change the field to that "something else" with a very simple
> search/replace on those devices (at least that's the best case :-)
> 
> I think anything else is just no worth bothering.
> 
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 22:59       ` Anthony Liguori
@ 2012-06-21  7:54         ` Peter Maydell
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Maydell @ 2012-06-21  7:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On 20 June 2012 23:59, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 06/20/2012 05:26 PM, Peter Maydell wrote:
>> On 20 June 2012 22:14, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> ...for that matter weren't we tossing around the idea of just
>> making target_phys_addr_t 64 bits for everything? (I actually
>> want to do this for target-arm anyway; last time I did some
>> quick smoke-tests of performance it didn't seem to hurt really
>> even on a 32 bit host, and it avoids having to put the A15 in
>> a different qemu-system-* binary to the other cores.)

> Didn't you whine and moan about the impact to printf()s last time I did
> this? ;-)

IIRC somebody in the review thread came up with a nice
solution to that, though I forget what it was exactly.

-- PMM

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-21  7:33           ` Michael S. Tsirkin
@ 2012-06-21 12:55             ` Anthony Liguori
  2012-06-21 14:10               ` Michael S. Tsirkin
  2012-06-22  2:28               ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 63+ messages in thread
From: Anthony Liguori @ 2012-06-21 12:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gerd Hoffmann, qemu-devel, David Gibson

On 06/21/2012 02:33 AM, Michael S. Tsirkin wrote:
> On Thu, Jun 21, 2012 at 08:02:06AM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2012-06-20 at 16:40 -0500, Anthony Liguori wrote:
>>
>>> Well let's return void in the DMA methods and let the IOMMUs assert on error.
>>> At least that will avoid surprises until someone decides they care enough about
>>> errors to touch all callers.
>>>
>>> I think silently failing a memcpy() can potentially lead to a vulnerability so
>>> I'd rather avoid that.
>>
>> No I'd rather keep the error returns, really, even if that means fixing
>> a few devices. I can look at making sure we don't pass random qemu data,
>> on error that's reasonably easy.
>>
>> assert on error means guest code can assert qemu ... not a great idea
>> but maybe we can add a warning.
>
> Why not?  Guest can always just halt if it wants to anyway.
> On the other hand, warnings can fill up host logs so
> represent a security problem.

As long as we scrub the buffers, returning an unhandled error seems okay to me.

I've long thought we should have some sort of generic way to throw an error and 
effectively pause a single device.  I'm not sure how it would work in practice 
though.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-21 12:55             ` Anthony Liguori
@ 2012-06-21 14:10               ` Michael S. Tsirkin
  2012-06-22  2:28               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: Michael S. Tsirkin @ 2012-06-21 14:10 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Gerd Hoffmann, qemu-devel, David Gibson

On Thu, Jun 21, 2012 at 07:55:58AM -0500, Anthony Liguori wrote:
> On 06/21/2012 02:33 AM, Michael S. Tsirkin wrote:
> >On Thu, Jun 21, 2012 at 08:02:06AM +1000, Benjamin Herrenschmidt wrote:
> >>On Wed, 2012-06-20 at 16:40 -0500, Anthony Liguori wrote:
> >>
> >>>Well let's return void in the DMA methods and let the IOMMUs assert on error.
> >>>At least that will avoid surprises until someone decides they care enough about
> >>>errors to touch all callers.
> >>>
> >>>I think silently failing a memcpy() can potentially lead to a vulnerability so
> >>>I'd rather avoid that.
> >>
> >>No I'd rather keep the error returns, really, even if that means fixing
> >>a few devices. I can look at making sure we don't pass random qemu data,
> >>on error that's reasonably easy.
> >>
> >>assert on error means guest code can assert qemu ... not a great idea
> >>but maybe we can add a warning.
> >
> >Why not?  Guest can always just halt if it wants to anyway.
> >On the other hand, warnings can fill up host logs so
> >represent a security problem.
> 
> As long as we scrub the buffers, returning an unhandled error seems okay to me.
> 
> I've long thought we should have some sort of generic way to throw
> an error and effectively pause a single device.  I'm not sure how it
> would work in practice though.
> 
> Regards,
> 
> Anthony Liguori

I think we should add an API to log a message and pause the VM.
Later admin can resume the VM, save it to file for debugging etc.

-- 
MST

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables
  2012-06-20 21:14   ` Anthony Liguori
  2012-06-20 21:29     ` Benjamin Herrenschmidt
  2012-06-20 22:26     ` Peter Maydell
@ 2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-22  1:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:14 -0500, Anthony Liguori wrote:

>  diff --git a/hw/qdev-dma.h b/hw/qdev-dma.h
> > new file mode 100644
> > index 0000000..f0ff558
> > --- /dev/null
> > +++ b/hw/qdev-dma.h
> > @@ -0,0 +1,12 @@
> > +/*
> > + * Support for dma_addr_t typed properties
> > + *
> > + * Copyright (C) 2012 David Gibson, IBM Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#include "qdev-addr.h"
> > +
> > +#define DEFINE_PROP_DMAADDR(_n, _s, _f, _d)                               \
> > +    DEFINE_PROP_TADDR(_n, _s, _f, _d)
> 
> Why not make life easy and fix dma_addr_t to 64-bit?

So you guys haven't come to a firm conclusion yet :) I'll leave that as
it is for now, it's trivial anyway. If you want to change target_addr to
be 64-bit that will make dma_addr_t 64-bit too. If you want to make it
64-bit unconditionally, just drop this patch (and change the
definition).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set()
  2012-06-20 21:15   ` Anthony Liguori
  2012-06-20 21:30     ` Benjamin Herrenschmidt
  2012-06-21  1:45     ` David Gibson
@ 2012-06-22  1:58     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-22  1:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:15 -0500, Anthony Liguori wrote:
> On 06/19/2012 01:39 AM, Benjamin Herrenschmidt wrote:
> > From: David Gibson<david@gibson.dropbear.id.au>
> >
> > This patch adds cpu_physical_memory_set() function.  This is equivalent to
> > calling cpu_physical_memory_write() with a buffer filled with a character,
> > ie, a memset of target memory.
> >
> > It uses a small temporary buffer on the stack.
> >
> > Signed-off-by: David Gibson<david@gibson.dropbear.id.au>
> > Signed-off-by: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> 
> Why should this be in the core API?  Shouldn't this be a helper on top of the 
> DMA API?

Moved it into the dma_* helpers.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions
  2012-06-20 21:16   ` Anthony Liguori
                       ` (3 preceding siblings ...)
  2012-06-21  1:48     ` David Gibson
@ 2012-06-22  2:02     ` Benjamin Herrenschmidt
  4 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-22  2:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Eduard - Gabriel Munteanu, Richard Henderson, Michael S. Tsirkin,
	qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:16 -0500, Anthony Liguori wrote:
> > diff --git a/qemu-common.h b/qemu-common.h
> > index 8f87e41..80026af 100644
> > --- a/qemu-common.h
> > +++ b/qemu-common.h
> > @@ -264,6 +264,7 @@ typedef struct EventNotifier EventNotifier;
> >   typedef struct VirtIODevice VirtIODevice;
> >   typedef struct QEMUSGList QEMUSGList;
> >   typedef struct SHPCDevice SHPCDevice;
> > +typedef struct DMAContext DMAContext;
> 
> Please don't put this in qemu-common.h.  Stick it in a dma-specific
> header.

Ok so I just removed it from qemu-common.h (it's already in dma.h). It
causes a minor breakage in spapr which is easily fixed by adding the
right include.

I'm building all the targets now and .... it seems to pass.

So it's gone. That does mean that Jason will probably have to change
something to his patches though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] usb-ohci: Use universal DMA helper functions
  2012-06-21 12:55             ` Anthony Liguori
  2012-06-21 14:10               ` Michael S. Tsirkin
@ 2012-06-22  2:28               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-22  2:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: David Gibson, Gerd Hoffmann, qemu-devel, Michael S. Tsirkin

On Thu, 2012-06-21 at 07:55 -0500, Anthony Liguori wrote:

> As long as we scrub the buffers, returning an unhandled error seems okay to me.
> 
> I've long thought we should have some sort of generic way to throw an error and 
> effectively pause a single device.  I'm not sure how it would work in practice 
> though.

I'll scrub the buffer in dma_memory_rw when reading from the guest in
all failure cases to ensure that we don't use a buffer of stale qemu
data.

In addition I'm doing a separate patch that I'll post later (needs a bit
more time) that improves OHCI's error handling.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps
  2012-06-20 21:25   ` Anthony Liguori
  2012-06-20 21:52     ` Benjamin Herrenschmidt
@ 2012-06-22  3:18     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-22  3:18 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, David Gibson

On Wed, 2012-06-20 at 16:25 -0500, Anthony Liguori wrote:

> > +static void dma_aio_cancel(BlockDriverAIOCB *acb)
> > +{
> > +    DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
> > +
> > +    trace_dma_aio_cancel(dbs);
> > +
> > +    if (dbs->acb) {
> > +        BlockDriverAIOCB *acb = dbs->acb;
> > +        dbs->acb = NULL;
> > +        dbs->in_cancel = true;
> > +        bdrv_aio_cancel(acb);
> > +        dbs->in_cancel = false;
> > +    }
> > +    dbs->common.cb = NULL;
> > +    dma_complete(dbs, 0);
> 
> So this cancellation stuff is hopelessly broken
> 
> It's simply not possible to fully cancel pending DMA in a synchronous callback.
> 
> Indeed, bdrv_aio_cancel ends up having a nasty little loop in it:

Yes, it's broken. Note that the patch didn't add the above function,
only moved it around.

In any case, I've decided to just drop that patch completely from the
series. IE. I'm not adding the dma_memory_map_with_cancel() variant,
there's no point since:

 - Nothing will call the cancel callback today and possibly for a while

 - Nothing passes a cancel callback other than the bdrv stuff and that
callback is hopelessly broken as you mentioned above.

So there's just no point. We will add an optional cancel callback again
later when we eventually decide to sort that problem out properly, it
will be an asynchronous cancel, ie, just "initiate" the cancellation,
and I'll probably add that as an argument to the normal dma_memory_map()
(adding NULL to all callers that don't care) at that point.

For now, let's not add a known to be broken and unused interface.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices
  2012-05-10  4:48 [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Benjamin Herrenschmidt
@ 2012-05-10  4:49 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 63+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-10  4:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, Alex Graf, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Currently the pseries machine emulation does not support DMA for emulated
PCI devices, because the PAPR spec always requires a (guest visible,
paravirtualized) IOMMU which was not implemented.  Now that we have
infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
for pseries.

With the existing PAPR IOMMU code used for VIO devices, this is almost
trivial. We use a single DMAContext for each (virtual) PCI host bridge,
which is the usual configuration on real PAPR machines (which often have
_many_ PCI host bridges).

Cc: Alex Graf <agraf@suse.de>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/spapr.h       |    1 +
 hw/spapr_iommu.c |   52 ++++++++++++++++++++++++++++------------------------
 hw/spapr_pci.c   |   15 +++++++++++++++
 hw/spapr_pci.h   |    1 +
 4 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/hw/spapr.h b/hw/spapr.h
index df3e8b1..7c497aa 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -328,6 +328,7 @@ typedef struct sPAPRTCE {
 } sPAPRTCE;
 
 #define SPAPR_VIO_BASE_LIOBN    0x00000000
+#define SPAPR_PCI_BASE_LIOBN    0x80000000
 
 void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 87ed09c..79c2a06 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -162,6 +162,28 @@ void spapr_tce_free(DMAContext *dma)
     }
 }
 
+static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
+                                target_ulong tce)
+{
+    sPAPRTCE *tcep;
+    target_ulong oldtce;
+
+    if (ioba >= tcet->window_size) {
+        hcall_dprintf("spapr_vio_put_tce on out-of-boards IOBA 0x"
+                      TARGET_FMT_lx "\n", ioba);
+        return H_PARAMETER;
+    }
+
+    tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
+    oldtce = tcep->tce;
+    tcep->tce = tce;
+
+    if (oldtce != 0) {
+        iommu_wait_for_invalidated_maps(&tcet->dma, ioba, SPAPR_TCE_PAGE_SIZE);
+    }
+
+    return H_SUCCESS;
+}
 
 static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
                               target_ulong opcode, target_ulong *args)
@@ -170,43 +192,25 @@ static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
     target_ulong ioba = args[1];
     target_ulong tce = args[2];
     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
-    sPAPRTCE *tcep;
-    target_ulong oldtce;
 
     if (liobn & 0xFFFFFFFF00000000ULL) {
         hcall_dprintf("spapr_vio_put_tce on out-of-boundsw LIOBN "
                       TARGET_FMT_lx "\n", liobn);
         return H_PARAMETER;
     }
-    if (!tcet) {
-        hcall_dprintf("spapr_vio_put_tce on non-existent LIOBN "
-                      TARGET_FMT_lx "\n", liobn);
-        return H_PARAMETER;
-    }
 
     ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
 
+    if (tcet) {
+        return put_tce_emu(tcet, ioba, tce);
+    }
 #ifdef DEBUG_TCE
-    fprintf(stderr, "spapr_vio_put_tce on liobn=" TARGET_FMT_lx /*%s*/
+    fprintf(stderr, "%s on liobn=" TARGET_FMT_lx /*%s*/
             "  ioba 0x" TARGET_FMT_lx "  TCE 0x" TARGET_FMT_lx "\n",
-            liobn, /*dev->qdev.id, */ioba, tce);
+            __func__, liobn, /*dev->qdev.id, */ioba, tce);
 #endif
 
-    if (ioba >= tcet->window_size) {
-        hcall_dprintf("spapr_vio_put_tce on out-of-boards IOBA 0x"
-                      TARGET_FMT_lx "\n", ioba);
-        return H_PARAMETER;
-    }
-
-    tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
-    oldtce = tcep->tce;
-    tcep->tce = tce;
-
-    if (oldtce != 0) {
-        iommu_wait_for_invalidated_maps(&tcet->dma, ioba, SPAPR_TCE_PAGE_SIZE);
-    }
-
-    return H_SUCCESS;
+    return H_PARAMETER;
 }
 
 void spapr_iommu_init(void)
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 25b400a..7b9973c 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -265,12 +265,21 @@ static const MemoryRegionOps spapr_io_ops = {
 /*
  * PHB PCI device
  */
+static DMAContext *spapr_pci_dma_context_fn(PCIBus *bus, void *opaque,
+                                            int devfn)
+{
+    sPAPRPHBState *phb = opaque;
+
+    return phb->dma;
+}
+
 static int spapr_phb_init(SysBusDevice *s)
 {
     sPAPRPHBState *phb = FROM_SYSBUS(sPAPRPHBState, s);
     char *namebuf;
     int i;
     PCIBus *bus;
+    uint32_t liobn;
 
     phb->dtbusname = g_strdup_printf("pci@%" PRIx64, phb->buid);
     namebuf = alloca(strlen(phb->dtbusname) + 32);
@@ -311,6 +320,10 @@ static int spapr_phb_init(SysBusDevice *s)
                            PCI_DEVFN(0, 0), PCI_NUM_PINS);
     phb->host_state.bus = bus;
 
+    liobn = SPAPR_PCI_BASE_LIOBN | (pci_find_domain(bus) << 16);
+    phb->dma = spapr_tce_new_dma_context(liobn, 0x40000000);
+    pci_setup_iommu(bus, spapr_pci_dma_context_fn, phb);
+
     QLIST_INSERT_HEAD(&spapr->phbs, phb, list);
 
     /* Initialize the LSI table */
@@ -471,6 +484,8 @@ int spapr_populate_pci_devices(sPAPRPHBState *phb,
     _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
                      sizeof(interrupt_map)));
 
+    spapr_dma_dt(fdt, bus_off, "ibm,dma-window", phb->dma);
+
     return 0;
 }
 
diff --git a/hw/spapr_pci.h b/hw/spapr_pci.h
index f54c2e8..d9e46e2 100644
--- a/hw/spapr_pci.h
+++ b/hw/spapr_pci.h
@@ -38,6 +38,7 @@ typedef struct sPAPRPHBState {
     MemoryRegion memspace, iospace;
     target_phys_addr_t mem_win_addr, mem_win_size, io_win_addr, io_win_size;
     MemoryRegion memwindow, iowindow;
+    DMAContext *dma;
 
     struct {
         uint32_t dt_irq;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2012-06-22  3:19 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-19  6:39 [Qemu-devel] [PATCH 00/13] iommu series Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 01/13] Better support for dma_addr_t variables Benjamin Herrenschmidt
2012-06-20 21:14   ` Anthony Liguori
2012-06-20 21:29     ` Benjamin Herrenschmidt
2012-06-21  1:44       ` David Gibson
2012-06-20 22:26     ` Peter Maydell
2012-06-20 22:59       ` Anthony Liguori
2012-06-21  7:54         ` Peter Maydell
2012-06-22  1:58     ` Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 02/13] Implement cpu_physical_memory_set() Benjamin Herrenschmidt
2012-06-20 21:15   ` Anthony Liguori
2012-06-20 21:30     ` Benjamin Herrenschmidt
2012-06-20 21:37       ` Anthony Liguori
2012-06-21  1:45     ` David Gibson
2012-06-21  1:46       ` David Gibson
2012-06-21  2:50       ` Benjamin Herrenschmidt
2012-06-22  1:58     ` Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 03/13] iommu: Add universal DMA helper functions Benjamin Herrenschmidt
2012-06-20 21:16   ` Anthony Liguori
2012-06-20 21:32     ` Michael S. Tsirkin
2012-06-20 21:38       ` Anthony Liguori
2012-06-20 21:42         ` Michael S. Tsirkin
2012-06-20 21:46           ` Anthony Liguori
2012-06-20 22:00             ` Michael S. Tsirkin
2012-06-20 21:33     ` Benjamin Herrenschmidt
2012-06-20 21:40     ` Michael S. Tsirkin
2012-06-20 22:01       ` Anthony Liguori
2012-06-21  1:48     ` David Gibson
2012-06-22  2:02     ` Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 04/13] usb-ohci: Use " Benjamin Herrenschmidt
2012-06-20 21:18   ` Anthony Liguori
2012-06-20 21:36     ` Benjamin Herrenschmidt
2012-06-20 21:40       ` Anthony Liguori
2012-06-20 22:02         ` Benjamin Herrenschmidt
2012-06-21  7:33           ` Michael S. Tsirkin
2012-06-21 12:55             ` Anthony Liguori
2012-06-21 14:10               ` Michael S. Tsirkin
2012-06-22  2:28               ` Benjamin Herrenschmidt
2012-06-21  6:43         ` Gerd Hoffmann
2012-06-19  6:39 ` [Qemu-devel] [PATCH 05/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers Benjamin Herrenschmidt
2012-06-20 21:21   ` Anthony Liguori
2012-06-20 21:37     ` Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 06/13] ide/ahci: Use universal DMA helper functions Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers Benjamin Herrenschmidt
2012-06-19 13:42   ` Gerd Hoffmann
2012-06-19 20:23     ` Benjamin Herrenschmidt
2012-06-20  3:14       ` David Gibson
2012-06-20  3:52         ` Benjamin Herrenschmidt
2012-06-21  1:42           ` David Gibson
2012-06-20  6:25         ` Gerd Hoffmann
2012-06-20  9:25           ` Benjamin Herrenschmidt
2012-06-20  9:54             ` Gerd Hoffmann
2012-06-19  6:39 ` [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 09/13] iommu: Add facility to cancel in-use dma memory maps Benjamin Herrenschmidt
2012-06-20 21:25   ` Anthony Liguori
2012-06-20 21:52     ` Benjamin Herrenschmidt
2012-06-22  3:18     ` Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 10/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 11/13] iommu: Allow PCI to use " Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt
2012-06-19  6:39 ` [Qemu-devel] [PATCH 13/13] Add a memory barrier to DMA functions Benjamin Herrenschmidt
2012-06-20 21:12 ` [Qemu-devel] [PATCH 00/13] iommu series Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2012-05-10  4:48 [Qemu-devel] [PATCH 00/13] IOMMU infrastructure Benjamin Herrenschmidt
2012-05-10  4:49 ` [Qemu-devel] [PATCH 12/13] pseries: Implement IOMMU and DMA for PAPR PCI devices Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.