All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] AMD IOMMU emulation patchset
@ 2011-01-29 17:40 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

Hi everybody,

I'm a bit late, I know, school kept me busy.

But here it is. I hope I answered your previous concerns in this patchset. Let
me know what you think, I hope this gets merged soon. Some testing would be
great.

The patchset is based on mst/pci. I'll send the SeaBIOS patches soon.


    Cheers,
    Eduard

Eduard - Gabriel Munteanu (13):
  Generic DMA memory access interface
  pci: add IOMMU support via the generic DMA layer
  AMD IOMMU emulation
  ide: use the DMA memory access interface for PCI IDE controllers
  rtl8139: use the DMA memory access interface
  eepro100: use the DMA memory access interface
  ac97: use the DMA memory access interface
  es1370: use the DMA memory access interface
  e1000: use the DMA memory access interface
  lsi53c895a: use the DMA memory access interface
  pcnet: use the DMA memory access interface
  usb-uhci: use the DMA memory access interface
  usb-ohci: use the DMA memory access interface

 Makefile.target    |    2 +-
 dma-helpers.c      |   23 ++-
 dma.h              |    4 +-
 hw/ac97.c          |    6 +-
 hw/amd_iommu.c     |  694 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dma_rw.c        |  124 ++++++++++
 hw/dma_rw.h        |  157 ++++++++++++
 hw/e1000.c         |   26 ++-
 hw/eepro100.c      |   97 +++++---
 hw/es1370.c        |    4 +-
 hw/ide/ahci.c      |    3 +-
 hw/ide/internal.h  |    1 +
 hw/ide/macio.c     |    4 +-
 hw/ide/pci.c       |   18 +-
 hw/lsi53c895a.c    |   24 +-
 hw/pc.c            |    2 +
 hw/pci.c           |    7 +
 hw/pci.h           |    7 +
 hw/pci_ids.h       |    2 +
 hw/pci_internals.h |    1 +
 hw/pci_regs.h      |    1 +
 hw/pcnet-pci.c     |    5 +-
 hw/rtl8139.c       |  100 +++++----
 hw/usb-ohci.c      |   54 +++--
 hw/usb-uhci.c      |   26 +-
 25 files changed, 1233 insertions(+), 159 deletions(-)
 create mode 100644 hw/amd_iommu.c
 create mode 100644 hw/dma_rw.c
 create mode 100644 hw/dma_rw.h

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 00/13] AMD IOMMU emulation patchset
@ 2011-01-29 17:40 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

Hi everybody,

I'm a bit late, I know, school kept me busy.

But here it is. I hope I answered your previous concerns in this patchset. Let
me know what you think, I hope this gets merged soon. Some testing would be
great.

The patchset is based on mst/pci. I'll send the SeaBIOS patches soon.


    Cheers,
    Eduard

Eduard - Gabriel Munteanu (13):
  Generic DMA memory access interface
  pci: add IOMMU support via the generic DMA layer
  AMD IOMMU emulation
  ide: use the DMA memory access interface for PCI IDE controllers
  rtl8139: use the DMA memory access interface
  eepro100: use the DMA memory access interface
  ac97: use the DMA memory access interface
  es1370: use the DMA memory access interface
  e1000: use the DMA memory access interface
  lsi53c895a: use the DMA memory access interface
  pcnet: use the DMA memory access interface
  usb-uhci: use the DMA memory access interface
  usb-ohci: use the DMA memory access interface

 Makefile.target    |    2 +-
 dma-helpers.c      |   23 ++-
 dma.h              |    4 +-
 hw/ac97.c          |    6 +-
 hw/amd_iommu.c     |  694 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/dma_rw.c        |  124 ++++++++++
 hw/dma_rw.h        |  157 ++++++++++++
 hw/e1000.c         |   26 ++-
 hw/eepro100.c      |   97 +++++---
 hw/es1370.c        |    4 +-
 hw/ide/ahci.c      |    3 +-
 hw/ide/internal.h  |    1 +
 hw/ide/macio.c     |    4 +-
 hw/ide/pci.c       |   18 +-
 hw/lsi53c895a.c    |   24 +-
 hw/pc.c            |    2 +
 hw/pci.c           |    7 +
 hw/pci.h           |    7 +
 hw/pci_ids.h       |    2 +
 hw/pci_internals.h |    1 +
 hw/pci_regs.h      |    1 +
 hw/pcnet-pci.c     |    5 +-
 hw/rtl8139.c       |  100 +++++----
 hw/usb-ohci.c      |   54 +++--
 hw/usb-uhci.c      |   26 +-
 25 files changed, 1233 insertions(+), 159 deletions(-)
 create mode 100644 hw/amd_iommu.c
 create mode 100644 hw/dma_rw.c
 create mode 100644 hw/dma_rw.h

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/13] Generic DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This introduces replacements for memory access functions like
cpu_physical_memory_read(). The new interface can handle address
translation and access checking through an IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
 hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 282 insertions(+), 1 deletions(-)
 create mode 100644 hw/dma_rw.c
 create mode 100644 hw/dma_rw.h

diff --git a/Makefile.target b/Makefile.target
index e15b1c4..e5817ab 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o
+obj-i386-y += pc_piix.o dma_rw.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/dma_rw.c b/hw/dma_rw.c
new file mode 100644
index 0000000..ef8e7f8
--- /dev/null
+++ b/hw/dma_rw.c
@@ -0,0 +1,124 @@
+/*
+ * Generic DMA memory access interface.
+ *
+ * Copyright (c) 2011 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "dma_rw.h"
+#include "range.h"
+
+static void dma_register_memory_map(DMADevice *dev,
+                                    dma_addr_t addr,
+                                    dma_addr_t len,
+                                    target_phys_addr_t paddr,
+                                    DMAInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    DMAMemoryMap *map;
+
+    map = qemu_malloc(sizeof(DMAMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
+}
+
+static void dma_unregister_memory_map(DMADevice *dev,
+                                      target_phys_addr_t paddr,
+                                      dma_addr_t len)
+{
+    DMAMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 dma_addr_t len)
+{
+    DMAMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *dma_memory_map(DMADevice *dev,
+                     DMAInvalidateMapFunc *cb,
+                     void *opaque,
+                     dma_addr_t addr,
+                     dma_addr_t *len,
+                     int is_write)
+{
+    int err;
+    target_phys_addr_t paddr, plen;
+
+    if (!dev || !dev->mmu) {
+        return cpu_physical_memory_map(addr, len, is_write);
+    }
+
+    plen = *len;
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
+    }
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void dma_memory_unmap(DMADevice *dev,
+                      void *buffer,
+                      dma_addr_t len,
+                      int is_write,
+                      dma_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    if (dev && dev->mmu) {
+        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
+    }
+}
+
diff --git a/hw/dma_rw.h b/hw/dma_rw.h
new file mode 100644
index 0000000..bc93511
--- /dev/null
+++ b/hw/dma_rw.h
@@ -0,0 +1,157 @@
+#ifndef DMA_RW_H
+#define DMA_RW_H
+
+#include "qemu-common.h"
+
+typedef uint64_t dma_addr_t;
+
+typedef struct DMAMmu DMAMmu;
+typedef struct DMADevice DMADevice;
+typedef struct DMAMemoryMap DMAMemoryMap;
+
+typedef int DMATranslateFunc(DMADevice *dev,
+                             dma_addr_t addr,
+                             dma_addr_t *paddr,
+                             dma_addr_t *len,
+                             int is_write);
+
+typedef void DMAInvalidateMapFunc(void *);
+
+struct DMAMmu {
+    DeviceState *iommu;
+    DMATranslateFunc *translate;
+    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
+};
+
+struct DMADevice {
+    DMAMmu *mmu;
+};
+
+struct DMAMemoryMap {
+    dma_addr_t              addr;
+    dma_addr_t              len;
+    target_phys_addr_t      paddr;
+    DMAInvalidateMapFunc    *invalidate;
+    void                    *invalidate_opaque;
+
+    QLIST_ENTRY(DMAMemoryMap) list;
+};
+
+static inline void dma_memory_rw(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 void *buf,
+                                 dma_addr_t len,
+                                 int is_write)
+{
+    dma_addr_t paddr, plen;
+    int err;
+
+    /*
+     * Fast-path non-iommu.
+     * More importantly, makes it obvious what this function does.
+     */
+    if (!dev || !dev->mmu) {
+        cpu_physical_memory_rw(addr, buf, plen, is_write);
+        return;
+    }
+
+    while (len) {
+        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static inline void dma_memory_read(DMADevice *dev,
+                                   dma_addr_t addr,
+                                   void *buf,
+                                   dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void dma_memory_write(DMADevice *dev,
+                                    dma_addr_t addr,
+                                    const void *buf,
+                                    dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, (void *) buf, len, 1);
+}
+
+void *dma_memory_map(DMADevice *dev,
+                     DMAInvalidateMapFunc *cb,
+                     void *opaque,
+                     dma_addr_t addr,
+                     dma_addr_t *len,
+                     int is_write);
+void dma_memory_unmap(DMADevice *dev,
+                      void *buffer,
+                      dma_addr_t len,
+                      int is_write,
+                      dma_addr_t access_len);
+
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 dma_addr_t len);
+
+
+#define DEFINE_DMA_LD(suffix, size)                                       \
+static inline uint##size##_t                                              \
+dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
+{                                                                         \
+    int err;                                                              \
+    dma_addr_t paddr, plen;                                               \
+                                                                          \
+    if (!dev || !dev->mmu) {                                              \
+        return ld##suffix##_phys(addr);                                   \
+    }                                                                     \
+                                                                          \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_DMA_ST(suffix, size)                                       \
+static inline void                                                        \
+dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    if (!dev || !dev->mmu) {                                              \
+        st##suffix##_phys(addr, val);                                     \
+        return;                                                           \
+    }                                                                     \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_DMA_LD(ub, 8)
+DEFINE_DMA_LD(uw, 16)
+DEFINE_DMA_LD(l, 32)
+DEFINE_DMA_LD(q, 64)
+
+DEFINE_DMA_ST(b, 8)
+DEFINE_DMA_ST(w, 16)
+DEFINE_DMA_ST(l, 32)
+DEFINE_DMA_ST(q, 64)
+
+#endif
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 01/13] Generic DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This introduces replacements for memory access functions like
cpu_physical_memory_read(). The new interface can handle address
translation and access checking through an IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
 hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 282 insertions(+), 1 deletions(-)
 create mode 100644 hw/dma_rw.c
 create mode 100644 hw/dma_rw.h

diff --git a/Makefile.target b/Makefile.target
index e15b1c4..e5817ab 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o
+obj-i386-y += pc_piix.o dma_rw.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/dma_rw.c b/hw/dma_rw.c
new file mode 100644
index 0000000..ef8e7f8
--- /dev/null
+++ b/hw/dma_rw.c
@@ -0,0 +1,124 @@
+/*
+ * Generic DMA memory access interface.
+ *
+ * Copyright (c) 2011 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "dma_rw.h"
+#include "range.h"
+
+static void dma_register_memory_map(DMADevice *dev,
+                                    dma_addr_t addr,
+                                    dma_addr_t len,
+                                    target_phys_addr_t paddr,
+                                    DMAInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    DMAMemoryMap *map;
+
+    map = qemu_malloc(sizeof(DMAMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
+}
+
+static void dma_unregister_memory_map(DMADevice *dev,
+                                      target_phys_addr_t paddr,
+                                      dma_addr_t len)
+{
+    DMAMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 dma_addr_t len)
+{
+    DMAMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *dma_memory_map(DMADevice *dev,
+                     DMAInvalidateMapFunc *cb,
+                     void *opaque,
+                     dma_addr_t addr,
+                     dma_addr_t *len,
+                     int is_write)
+{
+    int err;
+    target_phys_addr_t paddr, plen;
+
+    if (!dev || !dev->mmu) {
+        return cpu_physical_memory_map(addr, len, is_write);
+    }
+
+    plen = *len;
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
+    }
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void dma_memory_unmap(DMADevice *dev,
+                      void *buffer,
+                      dma_addr_t len,
+                      int is_write,
+                      dma_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    if (dev && dev->mmu) {
+        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
+    }
+}
+
diff --git a/hw/dma_rw.h b/hw/dma_rw.h
new file mode 100644
index 0000000..bc93511
--- /dev/null
+++ b/hw/dma_rw.h
@@ -0,0 +1,157 @@
+#ifndef DMA_RW_H
+#define DMA_RW_H
+
+#include "qemu-common.h"
+
+typedef uint64_t dma_addr_t;
+
+typedef struct DMAMmu DMAMmu;
+typedef struct DMADevice DMADevice;
+typedef struct DMAMemoryMap DMAMemoryMap;
+
+typedef int DMATranslateFunc(DMADevice *dev,
+                             dma_addr_t addr,
+                             dma_addr_t *paddr,
+                             dma_addr_t *len,
+                             int is_write);
+
+typedef void DMAInvalidateMapFunc(void *);
+
+struct DMAMmu {
+    DeviceState *iommu;
+    DMATranslateFunc *translate;
+    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
+};
+
+struct DMADevice {
+    DMAMmu *mmu;
+};
+
+struct DMAMemoryMap {
+    dma_addr_t              addr;
+    dma_addr_t              len;
+    target_phys_addr_t      paddr;
+    DMAInvalidateMapFunc    *invalidate;
+    void                    *invalidate_opaque;
+
+    QLIST_ENTRY(DMAMemoryMap) list;
+};
+
+static inline void dma_memory_rw(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 void *buf,
+                                 dma_addr_t len,
+                                 int is_write)
+{
+    dma_addr_t paddr, plen;
+    int err;
+
+    /*
+     * Fast-path non-iommu.
+     * More importantly, makes it obvious what this function does.
+     */
+    if (!dev || !dev->mmu) {
+        cpu_physical_memory_rw(addr, buf, plen, is_write);
+        return;
+    }
+
+    while (len) {
+        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static inline void dma_memory_read(DMADevice *dev,
+                                   dma_addr_t addr,
+                                   void *buf,
+                                   dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void dma_memory_write(DMADevice *dev,
+                                    dma_addr_t addr,
+                                    const void *buf,
+                                    dma_addr_t len)
+{
+    dma_memory_rw(dev, addr, (void *) buf, len, 1);
+}
+
+void *dma_memory_map(DMADevice *dev,
+                     DMAInvalidateMapFunc *cb,
+                     void *opaque,
+                     dma_addr_t addr,
+                     dma_addr_t *len,
+                     int is_write);
+void dma_memory_unmap(DMADevice *dev,
+                      void *buffer,
+                      dma_addr_t len,
+                      int is_write,
+                      dma_addr_t access_len);
+
+
+void dma_invalidate_memory_range(DMADevice *dev,
+                                 dma_addr_t addr,
+                                 dma_addr_t len);
+
+
+#define DEFINE_DMA_LD(suffix, size)                                       \
+static inline uint##size##_t                                              \
+dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
+{                                                                         \
+    int err;                                                              \
+    dma_addr_t paddr, plen;                                               \
+                                                                          \
+    if (!dev || !dev->mmu) {                                              \
+        return ld##suffix##_phys(addr);                                   \
+    }                                                                     \
+                                                                          \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_DMA_ST(suffix, size)                                       \
+static inline void                                                        \
+dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    if (!dev || !dev->mmu) {                                              \
+        st##suffix##_phys(addr, val);                                     \
+        return;                                                           \
+    }                                                                     \
+    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_DMA_LD(ub, 8)
+DEFINE_DMA_LD(uw, 16)
+DEFINE_DMA_LD(l, 32)
+DEFINE_DMA_LD(q, 64)
+
+DEFINE_DMA_ST(b, 8)
+DEFINE_DMA_ST(w, 16)
+DEFINE_DMA_ST(l, 32)
+DEFINE_DMA_ST(q, 64)
+
+#endif
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/13] pci: add IOMMU support via the generic DMA layer
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

IOMMUs can now be hooked onto the PCI bus. This makes use of the generic
DMA layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |    7 +++++++
 hw/pci.h           |    7 +++++++
 hw/pci_internals.h |    1 +
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 612ccaa..0a32a93 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -738,6 +738,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
         return NULL;
     }
     pci_dev->bus = bus;
+    pci_dev->dma.mmu = &bus->mmu;
     pci_dev->devfn = devfn;
     pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
     pci_dev->irq_state = 0;
@@ -2163,3 +2164,9 @@ int pci_qdev_find_device(const char *id, PCIDevice **pdev)
 
     return rc;
 }
+
+void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate)
+{
+    dev->bus->mmu.iommu = &dev->qdev;
+    dev->bus->mmu.translate = translate;    
+}
diff --git a/hw/pci.h b/hw/pci.h
index 550531b..4bb0a94 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -5,6 +5,7 @@
 #include "qobject.h"
 
 #include "qdev.h"
+#include "dma_rw.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -128,6 +129,10 @@ enum {
 
 struct PCIDevice {
     DeviceState qdev;
+
+    /* For devices which do DMA. */
+    DMADevice dma;
+
     /* PCI config space */
     uint8_t *config;
 
@@ -267,6 +272,8 @@ void pci_bridge_update_mappings(PCIBus *b);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index fbe1866..6452e8c 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -16,6 +16,7 @@ extern struct BusInfo pci_bus_info;
 
 struct PCIBus {
     BusState qbus;
+    DMAMmu mmu;
     uint8_t devfn_min;
     pci_set_irq_fn set_irq;
     pci_map_irq_fn map_irq;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 02/13] pci: add IOMMU support via the generic DMA layer
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

IOMMUs can now be hooked onto the PCI bus. This makes use of the generic
DMA layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |    7 +++++++
 hw/pci.h           |    7 +++++++
 hw/pci_internals.h |    1 +
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 612ccaa..0a32a93 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -738,6 +738,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
         return NULL;
     }
     pci_dev->bus = bus;
+    pci_dev->dma.mmu = &bus->mmu;
     pci_dev->devfn = devfn;
     pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
     pci_dev->irq_state = 0;
@@ -2163,3 +2164,9 @@ int pci_qdev_find_device(const char *id, PCIDevice **pdev)
 
     return rc;
 }
+
+void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate)
+{
+    dev->bus->mmu.iommu = &dev->qdev;
+    dev->bus->mmu.translate = translate;    
+}
diff --git a/hw/pci.h b/hw/pci.h
index 550531b..4bb0a94 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -5,6 +5,7 @@
 #include "qobject.h"
 
 #include "qdev.h"
+#include "dma_rw.h"
 
 /* PCI includes legacy ISA access.  */
 #include "isa.h"
@@ -128,6 +129,10 @@ enum {
 
 struct PCIDevice {
     DeviceState qdev;
+
+    /* For devices which do DMA. */
+    DMADevice dma;
+
     /* PCI config space */
     uint8_t *config;
 
@@ -267,6 +272,8 @@ void pci_bridge_update_mappings(PCIBus *b);
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index fbe1866..6452e8c 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -16,6 +16,7 @@ extern struct BusInfo pci_bus_info;
 
 struct PCIBus {
     BusState qbus;
+    DMAMmu mmu;
     uint8_t devfn_min;
     pci_set_irq_fn set_irq;
     pci_map_irq_fn map_irq;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/13] AMD IOMMU emulation
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/amd_iommu.c  |  694 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 700 insertions(+), 1 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index e5817ab..4b650bd 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o dma_rw.o
+obj-i386-y += pc_piix.o dma_rw.o amd_iommu.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..6c6346a
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,694 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2011 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+#include "dma_rw.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+#define CAPAB_REG_SIZE          0x04
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+#define IOMMU_PERM_READ             (1 << 0)
+#define IOMMU_PERM_WRITE            (1 << 1)
+#define IOMMU_PERM_RW               (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, devfn);
+    if (dev) {
+        dma_invalidate_memory_range(&dev->dma, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_exec(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    while (st->cmdbuf_head != st->cmdbuf_tail) {
+        /* Wrap head pointer. */
+        if (st->cmdbuf_head >= st->cmdbuf_len * CMDBUF_ENTRY_SIZE) {
+            st->cmdbuf_head = 0;
+        }
+
+        amd_iommu_cmdbuf_exec(st);
+
+        /* Increment head pointer. */
+        st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    }
+
+    *((uint64_t *) (st->mmio_buf + MMIO_COMMAND_HEAD)) = cpu_to_le64(st->cmdbuf_head);
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = le64_to_cpu(*base);
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+
+            /* We must reset the head and tail pointers. */
+            st->cmdbuf_head = st->cmdbuf_tail = 0;
+            memset(st->mmio_buf + MMIO_COMMAND_HEAD, 0, 8);
+            memset(st->mmio_buf + MMIO_COMMAND_TAIL, 0, 8);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write,
+                                            st, DEVICE_LITTLE_ENDIAN);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_enabled = 1;
+
+    /* Further changes to the capability are prohibited. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_default_write_config(dev, addr, val, len);
+
+    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static void amd_iommu_reset(DeviceState *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
+    unsigned char *capab = st->capab;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->enabled      = 0;
+    st->ats_enabled  = 0;
+    st->mmio_enabled = 0;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    /* Changes to the capability are allowed after system reset. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
+
+    memset(st->mmio_buf, 0, MMIO_SIZE);
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static inline AMDIOMMUState *amd_iommu_dma_to_state(DMADevice *dev)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu); 
+
+    return DO_UPCAST(AMDIOMMUState, dev, pci_dev);
+}
+
+static int amd_iommu_translate(DMADevice *dev,
+                               dma_addr_t addr,
+                               dma_addr_t *paddr,
+                               dma_addr_t *len,
+                               int is_write)
+{
+    PCIDevice *pci_dev = container_of(dev, PCIDevice, dma);
+    PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu);
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev);
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    unsigned perms;
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /*
+     * It's okay to check for either read or write permissions
+     * even for memory maps, since we don't support R/W maps.
+     */
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    /* Get device table entry. */
+    devfn = pci_dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    /* Secure Device capability */
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC, 0, CAPAB_SIZE);
+    st->capab = st->dev.config + st->capab_offset;
+    dev->config_write = amd_iommu_write_capab;
+
+    /* Allocate backing space for the MMIO registers. */
+    st->mmio_buf = qemu_malloc(MMIO_SIZE);
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.reset   = amd_iommu_reset,
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
diff --git a/hw/pc.c b/hw/pc.c
index fface7d..9f51e95 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1163,6 +1163,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(pci_bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index ea3418c..5dbe281 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -27,6 +27,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -57,6 +58,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_TI                 0x104c
 
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index dd0bed4..3d098aa 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -209,6 +209,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 03/13] AMD IOMMU emulation
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +-
 hw/amd_iommu.c  |  694 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 700 insertions(+), 1 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index e5817ab..4b650bd 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
-obj-i386-y += pc_piix.o dma_rw.o
+obj-i386-y += pc_piix.o dma_rw.o amd_iommu.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..6c6346a
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,694 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2011 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+#include "dma_rw.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+#define CAPAB_REG_SIZE          0x04
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+#define IOMMU_PERM_READ             (1 << 0)
+#define IOMMU_PERM_WRITE            (1 << 1)
+#define IOMMU_PERM_RW               (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, devfn);
+    if (dev) {
+        dma_invalidate_memory_range(&dev->dma, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_exec(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    while (st->cmdbuf_head != st->cmdbuf_tail) {
+        /* Wrap head pointer. */
+        if (st->cmdbuf_head >= st->cmdbuf_len * CMDBUF_ENTRY_SIZE) {
+            st->cmdbuf_head = 0;
+        }
+
+        amd_iommu_cmdbuf_exec(st);
+
+        /* Increment head pointer. */
+        st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    }
+
+    *((uint64_t *) (st->mmio_buf + MMIO_COMMAND_HEAD)) = cpu_to_le64(st->cmdbuf_head);
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = le64_to_cpu(*base);
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+
+            /* We must reset the head and tail pointers. */
+            st->cmdbuf_head = st->cmdbuf_tail = 0;
+            memset(st->mmio_buf + MMIO_COMMAND_HEAD, 0, 8);
+            memset(st->mmio_buf + MMIO_COMMAND_TAIL, 0, 8);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write,
+                                            st, DEVICE_LITTLE_ENDIAN);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_enabled = 1;
+
+    /* Further changes to the capability are prohibited. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_default_write_config(dev, addr, val, len);
+
+    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static void amd_iommu_reset(DeviceState *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
+    unsigned char *capab = st->capab;
+    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
+
+    st->enabled      = 0;
+    st->ats_enabled  = 0;
+    st->mmio_enabled = 0;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    /* Changes to the capability are allowed after system reset. */
+    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
+    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
+
+    memset(st->mmio_buf, 0, MMIO_SIZE);
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static inline AMDIOMMUState *amd_iommu_dma_to_state(DMADevice *dev)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu); 
+
+    return DO_UPCAST(AMDIOMMUState, dev, pci_dev);
+}
+
+static int amd_iommu_translate(DMADevice *dev,
+                               dma_addr_t addr,
+                               dma_addr_t *paddr,
+                               dma_addr_t *len,
+                               int is_write)
+{
+    PCIDevice *pci_dev = container_of(dev, PCIDevice, dma);
+    PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu);
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev);
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    unsigned perms;
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /*
+     * It's okay to check for either read or write permissions
+     * even for memory maps, since we don't support R/W maps.
+     */
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    /* Get device table entry. */
+    devfn = pci_dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    /* Secure Device capability */
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC, 0, CAPAB_SIZE);
+    st->capab = st->dev.config + st->capab_offset;
+    dev->config_write = amd_iommu_write_capab;
+
+    /* Allocate backing space for the MMIO registers. */
+    st->mmio_buf = qemu_malloc(MMIO_SIZE);
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.reset   = amd_iommu_reset,
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
diff --git a/hw/pc.c b/hw/pc.c
index fface7d..9f51e95 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1163,6 +1163,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(pci_bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index ea3418c..5dbe281 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -27,6 +27,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -57,6 +58,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_TI                 0x104c
 
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index dd0bed4..3d098aa 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -209,6 +209,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/13] ide: use the DMA memory access interface for PCI IDE controllers
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Note this doesn't handle AHCI emulation yet!

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   23 ++++++++++++++++++-----
 dma.h             |    4 +++-
 hw/ide/ahci.c     |    3 ++-
 hw/ide/internal.h |    1 +
 hw/ide/macio.c    |    4 ++--
 hw/ide/pci.c      |   18 +++++++++++-------
 6 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..29a74a4 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,13 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+    qsg->dma = dma;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +74,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dma_memory_unmap(dbs->sg->dma,
+                         dbs->iov.iov[i].iov_base,
+                         dbs->iov.iov[i].iov_len, !dbs->is_write,
+                         dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +112,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dma_memory_map(dbs->sg->dma, dma_bdrv_cancel, dbs,
+                             cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..2417b32 100644
--- a/dma.h
+++ b/dma.h
@@ -14,6 +14,7 @@
 //#include "cpu.h"
 #include "hw/hw.h"
 #include "block.h"
+#include "hw/dma_rw.h"
 
 typedef struct {
     target_phys_addr_t base;
@@ -25,9 +26,10 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+    DMADevice *dma;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 968fdce..aea06a9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -993,7 +993,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     if (sglist_alloc_hint > 0) {
         AHCI_SG *tbl = (AHCI_SG *)prdt;
 
-        qemu_sglist_init(sglist, sglist_alloc_hint);
+        /* FIXME: pass a proper DMADevice. */
+        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
         for (i = 0; i < sglist_alloc_hint; i++) {
             /* flags_size is zero-based */
             qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 697c3b4..3d3d5db 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -468,6 +468,7 @@ struct IDEDMA {
     struct iovec iov;
     QEMUIOVector qiov;
     BlockDriverAIOCB *aiocb;
+    DMADevice *dev;
 };
 
 struct IDEBus {
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index c1b4caa..654ae7c 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
@@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
     s->io_buffer_index = 0;
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 510b2de..e3432c4 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -64,7 +64,8 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (BMDMA_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg,
+                     s->nsector / (BMDMA_PAGE_SIZE / 512) + 1, dma->dev);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -72,7 +73,7 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -114,7 +115,7 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -129,11 +130,11 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                dma_memory_write(dma->dev, bm->cur_prd_addr,
+                                 s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                dma_memory_read(dma->dev, bm->cur_prd_addr,
+                                s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
@@ -444,6 +445,9 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    d->bmdma[0].dma.dev = &dev->dma;
+    d->bmdma[1].dma.dev = &dev->dma;
 }
 
 static const struct IDEDMAOps bmdma_ops = {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 04/13] ide: use the DMA memory access interface for PCI IDE controllers
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Note this doesn't handle AHCI emulation yet!

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   23 ++++++++++++++++++-----
 dma.h             |    4 +++-
 hw/ide/ahci.c     |    3 ++-
 hw/ide/internal.h |    1 +
 hw/ide/macio.c    |    4 ++--
 hw/ide/pci.c      |   18 +++++++++++-------
 6 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..29a74a4 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,13 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+    qsg->dma = dma;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +74,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dma_memory_unmap(dbs->sg->dma,
+                         dbs->iov.iov[i].iov_base,
+                         dbs->iov.iov[i].iov_len, !dbs->is_write,
+                         dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +112,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dma_memory_map(dbs->sg->dma, dma_bdrv_cancel, dbs,
+                             cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..2417b32 100644
--- a/dma.h
+++ b/dma.h
@@ -14,6 +14,7 @@
 //#include "cpu.h"
 #include "hw/hw.h"
 #include "block.h"
+#include "hw/dma_rw.h"
 
 typedef struct {
     target_phys_addr_t base;
@@ -25,9 +26,10 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+    DMADevice *dma;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 968fdce..aea06a9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -993,7 +993,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
     if (sglist_alloc_hint > 0) {
         AHCI_SG *tbl = (AHCI_SG *)prdt;
 
-        qemu_sglist_init(sglist, sglist_alloc_hint);
+        /* FIXME: pass a proper DMADevice. */
+        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
         for (i = 0; i < sglist_alloc_hint; i++) {
             /* flags_size is zero-based */
             qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 697c3b4..3d3d5db 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -468,6 +468,7 @@ struct IDEDMA {
     struct iovec iov;
     QEMUIOVector qiov;
     BlockDriverAIOCB *aiocb;
+    DMADevice *dev;
 };
 
 struct IDEBus {
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index c1b4caa..654ae7c 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
@@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
     s->io_buffer_index = 0;
     s->io_buffer_size = io->len;
 
-    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
     qemu_sglist_add(&s->sg, io->addr, io->len);
     io->addr += io->len;
     io->len = 0;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 510b2de..e3432c4 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -64,7 +64,8 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (BMDMA_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg,
+                     s->nsector / (BMDMA_PAGE_SIZE / 512) + 1, dma->dev);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -72,7 +73,7 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -114,7 +115,7 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -129,11 +130,11 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                dma_memory_write(dma->dev, bm->cur_prd_addr,
+                                 s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                dma_memory_read(dma->dev, bm->cur_prd_addr,
+                                s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
@@ -444,6 +445,9 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    d->bmdma[0].dma.dev = &dev->dma;
+    d->bmdma[1].dma.dev = &dev->dma;
 }
 
 static const struct IDEDMAOps bmdma_ops = {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/13] rtl8139: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |  100 +++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..75f4e64 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -49,6 +49,7 @@
 
 #include "hw.h"
 #include "pci.h"
+#include "dma_rw.h"
 #include "qemu-timer.h"
 #include "net.h"
 #include "loader.h"
@@ -413,12 +414,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -499,6 +494,14 @@ typedef struct RTL8139State {
     int rtl8139_mmio_io_addr_dummy;
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -749,6 +752,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -760,15 +765,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                dma_memory_write(dma, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            dma_memory_write(dma, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -777,7 +782,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    dma_memory_write(dma, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -817,6 +822,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    DMADevice *dma = &s->dev.dma;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -971,13 +977,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1022,7 +1028,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        dma_memory_write(dma, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1035,7 +1041,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        dma_memory_write(dma, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1084,9 +1090,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        dma_memory_write(dma, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        dma_memory_write(dma, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1282,50 +1288,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    DMADevice *dma = &s->dev.dma;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1761,6 +1771,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1783,7 +1795,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    dma_memory_read(dma, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1889,6 +1901,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1914,14 +1928,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2028,7 +2042,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    dma_memory_read(dma, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2055,10 +2070,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    dma_memory_write(dma, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    dma_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2367,7 +2382,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2384,7 +2398,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 05/13] rtl8139: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |  100 +++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 57 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..75f4e64 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -49,6 +49,7 @@
 
 #include "hw.h"
 #include "pci.h"
+#include "dma_rw.h"
 #include "qemu-timer.h"
 #include "net.h"
 #include "loader.h"
@@ -413,12 +414,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -499,6 +494,14 @@ typedef struct RTL8139State {
     int rtl8139_mmio_io_addr_dummy;
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -749,6 +752,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -760,15 +765,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                dma_memory_write(dma, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            dma_memory_write(dma, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -777,7 +782,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    dma_memory_write(dma, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -817,6 +822,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    DMADevice *dma = &s->dev.dma;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -971,13 +977,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        dma_memory_read(dma, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1022,7 +1028,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        dma_memory_write(dma, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1035,7 +1041,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        dma_memory_write(dma, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1084,9 +1090,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        dma_memory_write(dma, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        dma_memory_write(dma, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1282,50 +1288,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    DMADevice *dma = &s->dev.dma;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    dma_memory_write(dma, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    dma_memory_write(dma, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    dma_memory_write(dma, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1761,6 +1771,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1783,7 +1795,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    dma_memory_read(dma, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1889,6 +1901,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1914,14 +1928,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    dma_memory_read(dma, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2028,7 +2042,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    dma_memory_read(dma, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2055,10 +2070,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    dma_memory_write(dma, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    dma_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2367,7 +2382,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2384,7 +2398,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/13] eepro100: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   97 +++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index edf48f6..58defcf 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -308,10 +308,12 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
+    DMADevice *dma = &s->dev.dma;
+
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    dma_memory_write(dma, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -689,17 +691,19 @@ static void set_ru_state(EEPRO100State * s, ru_state_t state)
 
 static void dump_statistics(EEPRO100State * s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     /* Dump statistical data. Most data is never changed by the emulation
      * and always 0, so we first just copy the whole block and then those
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    dma_memory_write(dma, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -709,7 +713,9 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    DMADevice *dma = &s->dev.dma;
+
+    dma_memory_read(dma, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -719,6 +725,7 @@ static void read_cb(EEPRO100State *s)
 
 static void tx_command(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
     uint32_t tbd_array = le32_to_cpu(s->tx.tbd_array_addr);
     uint16_t tcb_bytes = (le16_to_cpu(s->tx.tcb_bytes) & 0x3fff);
     /* Sends larger than MAX_ETH_FRAME_SIZE are allowed, up to 2600 bytes. */
@@ -739,18 +746,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+        uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        dma_memory_read(dma,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -761,16 +768,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+                uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
+                uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                dma_memory_read(dma,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -779,16 +786,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+            uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
+            uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            dma_memory_read(dma,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -807,13 +814,14 @@ static void tx_command(EEPRO100State *s)
 
 static void set_multicast_list(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
     uint16_t multicast_count = s->tx.tbd_array_addr & BITS(13, 0);
     uint16_t i;
     memset(&s->mult[0], 0, sizeof(s->mult));
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        dma_memory_read(dma, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -823,6 +831,8 @@ static void set_multicast_list(EEPRO100State *s)
 
 static void action_command(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     for (;;) {
         bool bit_el;
         bool bit_s;
@@ -847,12 +857,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            dma_memory_read(dma,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            dma_memory_read(dma,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n",
                                 nic_dump(&s->configuration[0], 16)));
             TRACE(OTHER, logout("configuration: %s\n",
@@ -889,7 +901,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        dma_stw(dma, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -956,7 +968,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -967,7 +979,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,6 +1271,7 @@ static uint32_t eepro100_read_port(EEPRO100State * s)
 static void eepro100_write_port(EEPRO100State * s, uint32_t val)
 {
     val = le32_to_cpu(val);
+    DMADevice *dma = &s->dev.dma;
     uint32_t address = (val & ~PORT_SELECTION_MASK);
     uint8_t selection = (val & PORT_SELECTION_MASK);
     switch (selection) {
@@ -1268,10 +1281,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        dma_memory_read(dma, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        dma_memory_write(dma, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1652,6 +1665,7 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
      * - Interesting packets should set bit 29 in power management driver register.
      */
     EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    DMADevice *dma = &s->dev.dma;
     uint16_t rfd_status = 0xa000;
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
@@ -1734,8 +1748,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    dma_memory_read(dma,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1749,9 +1764,11 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    dma_stw(dma,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+            rfd_status);
+    dma_stw(dma,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1765,8 +1782,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    dma_memory_write(dma, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 06/13] eepro100: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   97 +++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index edf48f6..58defcf 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -308,10 +308,12 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
+    DMADevice *dma = &s->dev.dma;
+
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    dma_memory_write(dma, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -689,17 +691,19 @@ static void set_ru_state(EEPRO100State * s, ru_state_t state)
 
 static void dump_statistics(EEPRO100State * s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     /* Dump statistical data. Most data is never changed by the emulation
      * and always 0, so we first just copy the whole block and then those
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    dma_memory_write(dma, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -709,7 +713,9 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    DMADevice *dma = &s->dev.dma;
+
+    dma_memory_read(dma, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -719,6 +725,7 @@ static void read_cb(EEPRO100State *s)
 
 static void tx_command(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
     uint32_t tbd_array = le32_to_cpu(s->tx.tbd_array_addr);
     uint16_t tcb_bytes = (le16_to_cpu(s->tx.tcb_bytes) & 0x3fff);
     /* Sends larger than MAX_ETH_FRAME_SIZE are allowed, up to 2600 bytes. */
@@ -739,18 +746,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+        uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        dma_memory_read(dma,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -761,16 +768,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+                uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
+                uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                dma_memory_read(dma,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -779,16 +786,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = dma_ldl(dma, tbd_address);
+            uint16_t tx_buffer_size = dma_lduw(dma, tbd_address + 4);
+            uint16_t tx_buffer_el = dma_lduw(dma, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            dma_memory_read(dma,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -807,13 +814,14 @@ static void tx_command(EEPRO100State *s)
 
 static void set_multicast_list(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
     uint16_t multicast_count = s->tx.tbd_array_addr & BITS(13, 0);
     uint16_t i;
     memset(&s->mult[0], 0, sizeof(s->mult));
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        dma_memory_read(dma, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -823,6 +831,8 @@ static void set_multicast_list(EEPRO100State *s)
 
 static void action_command(EEPRO100State *s)
 {
+    DMADevice *dma = &s->dev.dma;
+
     for (;;) {
         bool bit_el;
         bool bit_s;
@@ -847,12 +857,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            dma_memory_read(dma,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            dma_memory_read(dma,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n",
                                 nic_dump(&s->configuration[0], 16)));
             TRACE(OTHER, logout("configuration: %s\n",
@@ -889,7 +901,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        dma_stw(dma, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -956,7 +968,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -967,7 +979,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,6 +1271,7 @@ static uint32_t eepro100_read_port(EEPRO100State * s)
 static void eepro100_write_port(EEPRO100State * s, uint32_t val)
 {
     val = le32_to_cpu(val);
+    DMADevice *dma = &s->dev.dma;
     uint32_t address = (val & ~PORT_SELECTION_MASK);
     uint8_t selection = (val & PORT_SELECTION_MASK);
     switch (selection) {
@@ -1268,10 +1281,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        dma_memory_read(dma, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        dma_memory_write(dma, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1652,6 +1665,7 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
      * - Interesting packets should set bit 29 in power management driver register.
      */
     EEPRO100State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    DMADevice *dma = &s->dev.dma;
     uint16_t rfd_status = 0xa000;
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
@@ -1734,8 +1748,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    dma_memory_read(dma,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1749,9 +1764,11 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    dma_stw(dma,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+            rfd_status);
+    dma_stw(dma,
+            s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1765,8 +1782,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    dma_memory_write(dma, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/13] ac97: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index d71072d..383c1b3 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    dma_memory_read (&s->dev.dma, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        dma_memory_read (&s->dev.dma, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        dma_memory_write (&s->dev.dma, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 07/13] ac97: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index d71072d..383c1b3 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    dma_memory_read (&s->dev.dma, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        dma_memory_read (&s->dev.dma, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        dma_memory_write (&s->dev.dma, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/13] es1370: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/es1370.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/es1370.c b/hw/es1370.c
index 40cb48c..8b1a405 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -802,7 +802,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             if (!acquired)
                 break;
 
-            cpu_physical_memory_write (addr, tmpbuf, acquired);
+            dma_memory_write (&s->dev.dma, addr, tmpbuf, acquired);
 
             temp -= acquired;
             addr += acquired;
@@ -816,7 +816,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             int copied, to_copy;
 
             to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));
-            cpu_physical_memory_read (addr, tmpbuf, to_copy);
+            dma_memory_read (&s->dev.dma, addr, tmpbuf, to_copy);
             copied = AUD_write (voice, tmpbuf, to_copy);
             if (!copied)
                 break;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 08/13] es1370: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/es1370.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/es1370.c b/hw/es1370.c
index 40cb48c..8b1a405 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -802,7 +802,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             if (!acquired)
                 break;
 
-            cpu_physical_memory_write (addr, tmpbuf, acquired);
+            dma_memory_write (&s->dev.dma, addr, tmpbuf, acquired);
 
             temp -= acquired;
             addr += acquired;
@@ -816,7 +816,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
             int copied, to_copy;
 
             to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));
-            cpu_physical_memory_read (addr, tmpbuf, to_copy);
+            dma_memory_read (&s->dev.dma, addr, tmpbuf, to_copy);
             copied = AUD_write (voice, tmpbuf, to_copy);
             if (!copied)
                 break;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/13] e1000: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/e1000.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index af101bd..0d71650 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -470,7 +470,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
             bytes = split_size;
             if (tp->size + bytes > msh)
                 bytes = msh - tp->size;
-            cpu_physical_memory_read(addr, tp->data + tp->size, bytes);
+            dma_memory_read(&s->dev.dma, addr, tp->data + tp->size, bytes);
             if ((sz = tp->size + bytes) >= hdr && tp->size < hdr)
                 memmove(tp->header, tp->data, hdr);
             tp->size = sz;
@@ -485,7 +485,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
         // context descriptor TSE is not set, while data descriptor TSE is set
         DBGOUT(TXERR, "TCP segmentaion Error\n");
     } else {
-        cpu_physical_memory_read(addr, tp->data + tp->size, split_size);
+        dma_memory_read(&s->dev.dma, addr, tp->data + tp->size, split_size);
         tp->size += split_size;
     }
 
@@ -501,7 +501,9 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 }
 
 static uint32_t
-txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
+txdesc_writeback(E1000State *s,
+                 target_phys_addr_t base,
+                 struct e1000_tx_desc *dp)
 {
     uint32_t txd_upper, txd_lower = le32_to_cpu(dp->lower.data);
 
@@ -510,8 +512,9 @@ txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
     txd_upper = (le32_to_cpu(dp->upper.data) | E1000_TXD_STAT_DD) &
                 ~(E1000_TXD_STAT_EC | E1000_TXD_STAT_LC | E1000_TXD_STAT_TU);
     dp->upper.data = cpu_to_le32(txd_upper);
-    cpu_physical_memory_write(base + ((char *)&dp->upper - (char *)dp),
-                              (void *)&dp->upper, sizeof(dp->upper));
+    dma_memory_write(&s->dev.dma,
+                     base + ((char *)&dp->upper - (char *)dp),
+                     (void *)&dp->upper, sizeof(dp->upper));
     return E1000_ICR_TXDW;
 }
 
@@ -530,14 +533,14 @@ start_xmit(E1000State *s)
     while (s->mac_reg[TDH] != s->mac_reg[TDT]) {
         base = ((uint64_t)s->mac_reg[TDBAH] << 32) + s->mac_reg[TDBAL] +
                sizeof(struct e1000_tx_desc) * s->mac_reg[TDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        dma_memory_read(&s->dev.dma, base, (void *)&desc, sizeof(desc));
 
         DBGOUT(TX, "index %d: %p : %x %x\n", s->mac_reg[TDH],
                (void *)(intptr_t)desc.buffer_addr, desc.lower.data,
                desc.upper.data);
 
         process_tx_desc(s, &desc);
-        cause |= txdesc_writeback(base, &desc);
+        cause |= txdesc_writeback(s, base, &desc);
 
         if (++s->mac_reg[TDH] * sizeof(desc) >= s->mac_reg[TDLEN])
             s->mac_reg[TDH] = 0;
@@ -679,18 +682,19 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
         }
         base = ((uint64_t)s->mac_reg[RDBAH] << 32) + s->mac_reg[RDBAL] +
                sizeof(desc) * s->mac_reg[RDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        dma_memory_read(&s->dev.dma, base, (void *)&desc, sizeof(desc));
         desc.special = vlan_special;
         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
         if (desc.buffer_addr) {
-            cpu_physical_memory_write(le64_to_cpu(desc.buffer_addr),
-                                      (void *)(buf + vlan_offset), size);
+            dma_memory_write(&s->dev.dma,
+                             le64_to_cpu(desc.buffer_addr),
+                             (void *)(buf + vlan_offset), size);
             desc.length = cpu_to_le16(size + fcs_len(s));
             desc.status |= E1000_RXD_STAT_EOP|E1000_RXD_STAT_IXSM;
         } else { // as per intel docs; skip descriptors with null buf addr
             DBGOUT(RX, "Null RX descriptor!!\n");
         }
-        cpu_physical_memory_write(base, (void *)&desc, sizeof(desc));
+        dma_memory_write(&s->dev.dma, base, (void *)&desc, sizeof(desc));
 
         if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
             s->mac_reg[RDH] = 0;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 09/13] e1000: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/e1000.c |   26 +++++++++++++++-----------
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index af101bd..0d71650 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -470,7 +470,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
             bytes = split_size;
             if (tp->size + bytes > msh)
                 bytes = msh - tp->size;
-            cpu_physical_memory_read(addr, tp->data + tp->size, bytes);
+            dma_memory_read(&s->dev.dma, addr, tp->data + tp->size, bytes);
             if ((sz = tp->size + bytes) >= hdr && tp->size < hdr)
                 memmove(tp->header, tp->data, hdr);
             tp->size = sz;
@@ -485,7 +485,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
         // context descriptor TSE is not set, while data descriptor TSE is set
         DBGOUT(TXERR, "TCP segmentaion Error\n");
     } else {
-        cpu_physical_memory_read(addr, tp->data + tp->size, split_size);
+        dma_memory_read(&s->dev.dma, addr, tp->data + tp->size, split_size);
         tp->size += split_size;
     }
 
@@ -501,7 +501,9 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 }
 
 static uint32_t
-txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
+txdesc_writeback(E1000State *s,
+                 target_phys_addr_t base,
+                 struct e1000_tx_desc *dp)
 {
     uint32_t txd_upper, txd_lower = le32_to_cpu(dp->lower.data);
 
@@ -510,8 +512,9 @@ txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
     txd_upper = (le32_to_cpu(dp->upper.data) | E1000_TXD_STAT_DD) &
                 ~(E1000_TXD_STAT_EC | E1000_TXD_STAT_LC | E1000_TXD_STAT_TU);
     dp->upper.data = cpu_to_le32(txd_upper);
-    cpu_physical_memory_write(base + ((char *)&dp->upper - (char *)dp),
-                              (void *)&dp->upper, sizeof(dp->upper));
+    dma_memory_write(&s->dev.dma,
+                     base + ((char *)&dp->upper - (char *)dp),
+                     (void *)&dp->upper, sizeof(dp->upper));
     return E1000_ICR_TXDW;
 }
 
@@ -530,14 +533,14 @@ start_xmit(E1000State *s)
     while (s->mac_reg[TDH] != s->mac_reg[TDT]) {
         base = ((uint64_t)s->mac_reg[TDBAH] << 32) + s->mac_reg[TDBAL] +
                sizeof(struct e1000_tx_desc) * s->mac_reg[TDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        dma_memory_read(&s->dev.dma, base, (void *)&desc, sizeof(desc));
 
         DBGOUT(TX, "index %d: %p : %x %x\n", s->mac_reg[TDH],
                (void *)(intptr_t)desc.buffer_addr, desc.lower.data,
                desc.upper.data);
 
         process_tx_desc(s, &desc);
-        cause |= txdesc_writeback(base, &desc);
+        cause |= txdesc_writeback(s, base, &desc);
 
         if (++s->mac_reg[TDH] * sizeof(desc) >= s->mac_reg[TDLEN])
             s->mac_reg[TDH] = 0;
@@ -679,18 +682,19 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
         }
         base = ((uint64_t)s->mac_reg[RDBAH] << 32) + s->mac_reg[RDBAL] +
                sizeof(desc) * s->mac_reg[RDH];
-        cpu_physical_memory_read(base, (void *)&desc, sizeof(desc));
+        dma_memory_read(&s->dev.dma, base, (void *)&desc, sizeof(desc));
         desc.special = vlan_special;
         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
         if (desc.buffer_addr) {
-            cpu_physical_memory_write(le64_to_cpu(desc.buffer_addr),
-                                      (void *)(buf + vlan_offset), size);
+            dma_memory_write(&s->dev.dma,
+                             le64_to_cpu(desc.buffer_addr),
+                             (void *)(buf + vlan_offset), size);
             desc.length = cpu_to_le16(size + fcs_len(s));
             desc.status |= E1000_RXD_STAT_EOP|E1000_RXD_STAT_IXSM;
         } else { // as per intel docs; skip descriptors with null buf addr
             DBGOUT(RX, "Null RX descriptor!!\n");
         }
-        cpu_physical_memory_write(base, (void *)&desc, sizeof(desc));
+        dma_memory_write(&s->dev.dma, base, (void *)&desc, sizeof(desc));
 
         if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
             s->mac_reg[RDH] = 0;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/13] lsi53c895a: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/lsi53c895a.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 0129ae3..76bd631 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -394,7 +394,7 @@ static inline uint32_t read_dword(LSIState *s, uint32_t addr)
     if ((addr & 0xffffe000) == s->script_ram_base) {
         return s->script_ram[(addr & 0x1fff) >> 2];
     }
-    cpu_physical_memory_read(addr, (uint8_t *)&buf, 4);
+    dma_memory_read(&s->dev.dma, addr, (uint8_t *)&buf, 4);
     return cpu_to_le32(buf);
 }
 
@@ -574,9 +574,9 @@ static void lsi_do_dma(LSIState *s, int out)
 
     /* ??? Set SFBR to first data byte.  */
     if (out) {
-        cpu_physical_memory_read(addr, s->current->dma_buf, count);
+        dma_memory_read(&s->dev.dma, addr, s->current->dma_buf, count);
     } else {
-        cpu_physical_memory_write(addr, s->current->dma_buf, count);
+        dma_memory_write(&s->dev.dma, addr, s->current->dma_buf, count);
     }
     s->current->dma_len -= count;
     if (s->current->dma_len == 0) {
@@ -741,7 +741,7 @@ static void lsi_do_command(LSIState *s)
     DPRINTF("Send command len=%d\n", s->dbc);
     if (s->dbc > 16)
         s->dbc = 16;
-    cpu_physical_memory_read(s->dnad, buf, s->dbc);
+    dma_memory_read(&s->dev.dma, s->dnad, buf, s->dbc);
     s->sfbr = buf[0];
     s->command_complete = 0;
 
@@ -790,7 +790,7 @@ static void lsi_do_status(LSIState *s)
     s->dbc = 1;
     sense = s->sense;
     s->sfbr = sense;
-    cpu_physical_memory_write(s->dnad, &sense, 1);
+    dma_memory_write(&s->dev.dma, s->dnad, &sense, 1);
     lsi_set_phase(s, PHASE_MI);
     s->msg_action = 1;
     lsi_add_msg_byte(s, 0); /* COMMAND COMPLETE */
@@ -804,7 +804,7 @@ static void lsi_do_msgin(LSIState *s)
     len = s->msg_len;
     if (len > s->dbc)
         len = s->dbc;
-    cpu_physical_memory_write(s->dnad, s->msg, len);
+    dma_memory_write(&s->dev.dma, s->dnad, s->msg, len);
     /* Linux drivers rely on the last byte being in the SIDL.  */
     s->sidl = s->msg[len - 1];
     s->msg_len -= len;
@@ -836,7 +836,7 @@ static void lsi_do_msgin(LSIState *s)
 static uint8_t lsi_get_msgbyte(LSIState *s)
 {
     uint8_t data;
-    cpu_physical_memory_read(s->dnad, &data, 1);
+    dma_memory_read(&s->dev.dma, s->dnad, &data, 1);
     s->dnad++;
     s->dbc--;
     return data;
@@ -924,8 +924,8 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, uint32_t src, int count)
     DPRINTF("memcpy dest 0x%08x src 0x%08x count %d\n", dest, src, count);
     while (count) {
         n = (count > LSI_BUF_SIZE) ? LSI_BUF_SIZE : count;
-        cpu_physical_memory_read(src, buf, n);
-        cpu_physical_memory_write(dest, buf, n);
+        dma_memory_read(&s->dev.dma, src, buf, n);
+        dma_memory_write(&s->dev.dma, dest, buf, n);
         src += n;
         dest += n;
         count -= n;
@@ -993,7 +993,7 @@ again:
 
             /* 32-bit Table indirect */
             offset = sxt24(addr);
-            cpu_physical_memory_read(s->dsa + offset, (uint8_t *)buf, 8);
+            dma_memory_read(&s->dev.dma, s->dsa + offset, (uint8_t *)buf, 8);
             /* byte count is stored in bits 0:23 only */
             s->dbc = cpu_to_le32(buf[0]) & 0xffffff;
             s->rbc = s->dbc;
@@ -1352,7 +1352,7 @@ again:
             n = (insn & 7);
             reg = (insn >> 16) & 0xff;
             if (insn & (1 << 24)) {
-                cpu_physical_memory_read(addr, data, n);
+                dma_memory_read(&s->dev.dma, addr, data, n);
                 DPRINTF("Load reg 0x%x size %d addr 0x%08x = %08x\n", reg, n,
                         addr, *(int *)data);
                 for (i = 0; i < n; i++) {
@@ -1363,7 +1363,7 @@ again:
                 for (i = 0; i < n; i++) {
                     data[i] = lsi_reg_readb(s, reg + i);
                 }
-                cpu_physical_memory_write(addr, data, n);
+                dma_memory_write(&s->dev.dma, addr, data, n);
             }
         }
     }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 10/13] lsi53c895a: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/lsi53c895a.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 0129ae3..76bd631 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -394,7 +394,7 @@ static inline uint32_t read_dword(LSIState *s, uint32_t addr)
     if ((addr & 0xffffe000) == s->script_ram_base) {
         return s->script_ram[(addr & 0x1fff) >> 2];
     }
-    cpu_physical_memory_read(addr, (uint8_t *)&buf, 4);
+    dma_memory_read(&s->dev.dma, addr, (uint8_t *)&buf, 4);
     return cpu_to_le32(buf);
 }
 
@@ -574,9 +574,9 @@ static void lsi_do_dma(LSIState *s, int out)
 
     /* ??? Set SFBR to first data byte.  */
     if (out) {
-        cpu_physical_memory_read(addr, s->current->dma_buf, count);
+        dma_memory_read(&s->dev.dma, addr, s->current->dma_buf, count);
     } else {
-        cpu_physical_memory_write(addr, s->current->dma_buf, count);
+        dma_memory_write(&s->dev.dma, addr, s->current->dma_buf, count);
     }
     s->current->dma_len -= count;
     if (s->current->dma_len == 0) {
@@ -741,7 +741,7 @@ static void lsi_do_command(LSIState *s)
     DPRINTF("Send command len=%d\n", s->dbc);
     if (s->dbc > 16)
         s->dbc = 16;
-    cpu_physical_memory_read(s->dnad, buf, s->dbc);
+    dma_memory_read(&s->dev.dma, s->dnad, buf, s->dbc);
     s->sfbr = buf[0];
     s->command_complete = 0;
 
@@ -790,7 +790,7 @@ static void lsi_do_status(LSIState *s)
     s->dbc = 1;
     sense = s->sense;
     s->sfbr = sense;
-    cpu_physical_memory_write(s->dnad, &sense, 1);
+    dma_memory_write(&s->dev.dma, s->dnad, &sense, 1);
     lsi_set_phase(s, PHASE_MI);
     s->msg_action = 1;
     lsi_add_msg_byte(s, 0); /* COMMAND COMPLETE */
@@ -804,7 +804,7 @@ static void lsi_do_msgin(LSIState *s)
     len = s->msg_len;
     if (len > s->dbc)
         len = s->dbc;
-    cpu_physical_memory_write(s->dnad, s->msg, len);
+    dma_memory_write(&s->dev.dma, s->dnad, s->msg, len);
     /* Linux drivers rely on the last byte being in the SIDL.  */
     s->sidl = s->msg[len - 1];
     s->msg_len -= len;
@@ -836,7 +836,7 @@ static void lsi_do_msgin(LSIState *s)
 static uint8_t lsi_get_msgbyte(LSIState *s)
 {
     uint8_t data;
-    cpu_physical_memory_read(s->dnad, &data, 1);
+    dma_memory_read(&s->dev.dma, s->dnad, &data, 1);
     s->dnad++;
     s->dbc--;
     return data;
@@ -924,8 +924,8 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, uint32_t src, int count)
     DPRINTF("memcpy dest 0x%08x src 0x%08x count %d\n", dest, src, count);
     while (count) {
         n = (count > LSI_BUF_SIZE) ? LSI_BUF_SIZE : count;
-        cpu_physical_memory_read(src, buf, n);
-        cpu_physical_memory_write(dest, buf, n);
+        dma_memory_read(&s->dev.dma, src, buf, n);
+        dma_memory_write(&s->dev.dma, dest, buf, n);
         src += n;
         dest += n;
         count -= n;
@@ -993,7 +993,7 @@ again:
 
             /* 32-bit Table indirect */
             offset = sxt24(addr);
-            cpu_physical_memory_read(s->dsa + offset, (uint8_t *)buf, 8);
+            dma_memory_read(&s->dev.dma, s->dsa + offset, (uint8_t *)buf, 8);
             /* byte count is stored in bits 0:23 only */
             s->dbc = cpu_to_le32(buf[0]) & 0xffffff;
             s->rbc = s->dbc;
@@ -1352,7 +1352,7 @@ again:
             n = (insn & 7);
             reg = (insn >> 16) & 0xff;
             if (insn & (1 << 24)) {
-                cpu_physical_memory_read(addr, data, n);
+                dma_memory_read(&s->dev.dma, addr, data, n);
                 DPRINTF("Load reg 0x%x size %d addr 0x%08x = %08x\n", reg, n,
                         addr, *(int *)data);
                 for (i = 0; i < n; i++) {
@@ -1363,7 +1363,7 @@ again:
                 for (i = 0; i < n; i++) {
                     data[i] = lsi_reg_readb(s, reg + i);
                 }
-                cpu_physical_memory_write(addr, data, n);
+                dma_memory_write(&s->dev.dma, addr, data, n);
             }
         }
     }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/13] pcnet: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pcnet-pci.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 339a401..3f55c42 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -230,13 +230,13 @@ static void pcnet_mmio_map(PCIDevice *pci_dev, int region_num,
 static void pci_physical_memory_write(void *dma_opaque, target_phys_addr_t addr,
                                       uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_write(addr, buf, len);
+    dma_memory_write(dma_opaque, addr, buf, len);
 }
 
 static void pci_physical_memory_read(void *dma_opaque, target_phys_addr_t addr,
                                      uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_read(addr, buf, len);
+    dma_memory_read(dma_opaque, addr, buf, len);
 }
 
 static void pci_pcnet_cleanup(VLANClientState *nc)
@@ -306,6 +306,7 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
     s->irq = pci_dev->irq[0];
     s->phys_mem_read = pci_physical_memory_read;
     s->phys_mem_write = pci_physical_memory_write;
+    s->dma_opaque = &pci_dev->dma;
 
     if (!pci_dev->qdev.hotplugged) {
         static int loaded = 0;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 11/13] pcnet: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pcnet-pci.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 339a401..3f55c42 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -230,13 +230,13 @@ static void pcnet_mmio_map(PCIDevice *pci_dev, int region_num,
 static void pci_physical_memory_write(void *dma_opaque, target_phys_addr_t addr,
                                       uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_write(addr, buf, len);
+    dma_memory_write(dma_opaque, addr, buf, len);
 }
 
 static void pci_physical_memory_read(void *dma_opaque, target_phys_addr_t addr,
                                      uint8_t *buf, int len, int do_bswap)
 {
-    cpu_physical_memory_read(addr, buf, len);
+    dma_memory_read(dma_opaque, addr, buf, len);
 }
 
 static void pci_pcnet_cleanup(VLANClientState *nc)
@@ -306,6 +306,7 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
     s->irq = pci_dev->irq[0];
     s->phys_mem_read = pci_physical_memory_read;
     s->phys_mem_write = pci_physical_memory_write;
+    s->dma_opaque = &pci_dev->dma;
 
     if (!pci_dev->qdev.hotplugged) {
         static int loaded = 0;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/13] usb-uhci: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/usb-uhci.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index b9b822f..01b7f8b 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -705,7 +705,7 @@ static int uhci_complete_td(UHCIState *s, UHCI_TD *td, UHCIAsync *async, uint32_
 
         if (len > 0) {
             /* write the data back */
-            cpu_physical_memory_write(td->buffer, async->buffer, len);
+            dma_memory_write(&s->dev.dma, td->buffer, async->buffer, len);
         }
 
         if ((td->ctrl & TD_CTRL_SPD) && len < max_len) {
@@ -823,7 +823,7 @@ static int uhci_handle_td(UHCIState *s, uint32_t addr, UHCI_TD *td, uint32_t *in
     switch(pid) {
     case USB_TOKEN_OUT:
     case USB_TOKEN_SETUP:
-        cpu_physical_memory_read(td->buffer, async->buffer, max_len);
+        dma_memory_read(&s->dev.dma, td->buffer, async->buffer, max_len);
         len = uhci_broadcast_packet(s, &async->packet);
         if (len >= 0)
             len = max_len;
@@ -866,7 +866,7 @@ static void uhci_async_complete(USBPacket *packet, void *opaque)
         uint32_t link = async->td;
         uint32_t int_mask = 0, val;
 
-        cpu_physical_memory_read(link & ~0xf, (uint8_t *) &td, sizeof(td));
+        dma_memory_read(&s->dev.dma, link & ~0xf, (uint8_t *) &td, sizeof(td));
         le32_to_cpus(&td.link);
         le32_to_cpus(&td.ctrl);
         le32_to_cpus(&td.token);
@@ -878,8 +878,8 @@ static void uhci_async_complete(USBPacket *packet, void *opaque)
 
         /* update the status bits of the TD */
         val = cpu_to_le32(td.ctrl);
-        cpu_physical_memory_write((link & ~0xf) + 4,
-                                  (const uint8_t *)&val, sizeof(val));
+        dma_memory_write(&s->dev.dma, (link & ~0xf) + 4,
+                         (const uint8_t *)&val, sizeof(val));
         uhci_async_free(s, async);
     } else {
         async->done = 1;
@@ -942,7 +942,7 @@ static void uhci_process_frame(UHCIState *s)
 
     DPRINTF("uhci: processing frame %d addr 0x%x\n" , s->frnum, frame_addr);
 
-    cpu_physical_memory_read(frame_addr, (uint8_t *)&link, 4);
+    dma_memory_read(&s->dev.dma, frame_addr, (uint8_t *)&link, 4);
     le32_to_cpus(&link);
 
     int_mask = 0;
@@ -966,7 +966,8 @@ static void uhci_process_frame(UHCIState *s)
                 break;
             }
 
-            cpu_physical_memory_read(link & ~0xf, (uint8_t *) &qh, sizeof(qh));
+            dma_memory_read(&s->dev.dma,
+                            link & ~0xf, (uint8_t *) &qh, sizeof(qh));
             le32_to_cpus(&qh.link);
             le32_to_cpus(&qh.el_link);
 
@@ -986,7 +987,8 @@ static void uhci_process_frame(UHCIState *s)
         }
 
         /* TD */
-        cpu_physical_memory_read(link & ~0xf, (uint8_t *) &td, sizeof(td));
+        dma_memory_read(&s->dev.dma,
+                        link & ~0xf, (uint8_t *) &td, sizeof(td));
         le32_to_cpus(&td.link);
         le32_to_cpus(&td.ctrl);
         le32_to_cpus(&td.token);
@@ -1000,8 +1002,8 @@ static void uhci_process_frame(UHCIState *s)
         if (old_td_ctrl != td.ctrl) {
             /* update the status bits of the TD */
             val = cpu_to_le32(td.ctrl);
-            cpu_physical_memory_write((link & ~0xf) + 4,
-                                      (const uint8_t *)&val, sizeof(val));
+            dma_memory_write(&s->dev.dma, (link & ~0xf) + 4,
+                             (const uint8_t *)&val, sizeof(val));
         }
 
         if (ret < 0) {
@@ -1029,8 +1031,8 @@ static void uhci_process_frame(UHCIState *s)
 	    /* update QH element link */
             qh.el_link = link;
             val = cpu_to_le32(qh.el_link);
-            cpu_physical_memory_write((curr_qh & ~0xf) + 4,
-                                          (const uint8_t *)&val, sizeof(val));
+            dma_memory_write(&s->dev.dma, (curr_qh & ~0xf) + 4,
+                             (const uint8_t *)&val, sizeof(val));
 
             if (!depth_first(link)) {
                /* done with this QH */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 12/13] usb-uhci: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/usb-uhci.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index b9b822f..01b7f8b 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -705,7 +705,7 @@ static int uhci_complete_td(UHCIState *s, UHCI_TD *td, UHCIAsync *async, uint32_
 
         if (len > 0) {
             /* write the data back */
-            cpu_physical_memory_write(td->buffer, async->buffer, len);
+            dma_memory_write(&s->dev.dma, td->buffer, async->buffer, len);
         }
 
         if ((td->ctrl & TD_CTRL_SPD) && len < max_len) {
@@ -823,7 +823,7 @@ static int uhci_handle_td(UHCIState *s, uint32_t addr, UHCI_TD *td, uint32_t *in
     switch(pid) {
     case USB_TOKEN_OUT:
     case USB_TOKEN_SETUP:
-        cpu_physical_memory_read(td->buffer, async->buffer, max_len);
+        dma_memory_read(&s->dev.dma, td->buffer, async->buffer, max_len);
         len = uhci_broadcast_packet(s, &async->packet);
         if (len >= 0)
             len = max_len;
@@ -866,7 +866,7 @@ static void uhci_async_complete(USBPacket *packet, void *opaque)
         uint32_t link = async->td;
         uint32_t int_mask = 0, val;
 
-        cpu_physical_memory_read(link & ~0xf, (uint8_t *) &td, sizeof(td));
+        dma_memory_read(&s->dev.dma, link & ~0xf, (uint8_t *) &td, sizeof(td));
         le32_to_cpus(&td.link);
         le32_to_cpus(&td.ctrl);
         le32_to_cpus(&td.token);
@@ -878,8 +878,8 @@ static void uhci_async_complete(USBPacket *packet, void *opaque)
 
         /* update the status bits of the TD */
         val = cpu_to_le32(td.ctrl);
-        cpu_physical_memory_write((link & ~0xf) + 4,
-                                  (const uint8_t *)&val, sizeof(val));
+        dma_memory_write(&s->dev.dma, (link & ~0xf) + 4,
+                         (const uint8_t *)&val, sizeof(val));
         uhci_async_free(s, async);
     } else {
         async->done = 1;
@@ -942,7 +942,7 @@ static void uhci_process_frame(UHCIState *s)
 
     DPRINTF("uhci: processing frame %d addr 0x%x\n" , s->frnum, frame_addr);
 
-    cpu_physical_memory_read(frame_addr, (uint8_t *)&link, 4);
+    dma_memory_read(&s->dev.dma, frame_addr, (uint8_t *)&link, 4);
     le32_to_cpus(&link);
 
     int_mask = 0;
@@ -966,7 +966,8 @@ static void uhci_process_frame(UHCIState *s)
                 break;
             }
 
-            cpu_physical_memory_read(link & ~0xf, (uint8_t *) &qh, sizeof(qh));
+            dma_memory_read(&s->dev.dma,
+                            link & ~0xf, (uint8_t *) &qh, sizeof(qh));
             le32_to_cpus(&qh.link);
             le32_to_cpus(&qh.el_link);
 
@@ -986,7 +987,8 @@ static void uhci_process_frame(UHCIState *s)
         }
 
         /* TD */
-        cpu_physical_memory_read(link & ~0xf, (uint8_t *) &td, sizeof(td));
+        dma_memory_read(&s->dev.dma,
+                        link & ~0xf, (uint8_t *) &td, sizeof(td));
         le32_to_cpus(&td.link);
         le32_to_cpus(&td.ctrl);
         le32_to_cpus(&td.token);
@@ -1000,8 +1002,8 @@ static void uhci_process_frame(UHCIState *s)
         if (old_td_ctrl != td.ctrl) {
             /* update the status bits of the TD */
             val = cpu_to_le32(td.ctrl);
-            cpu_physical_memory_write((link & ~0xf) + 4,
-                                      (const uint8_t *)&val, sizeof(val));
+            dma_memory_write(&s->dev.dma, (link & ~0xf) + 4,
+                             (const uint8_t *)&val, sizeof(val));
         }
 
         if (ret < 0) {
@@ -1029,8 +1031,8 @@ static void uhci_process_frame(UHCIState *s)
 	    /* update QH element link */
             qh.el_link = link;
             val = cpu_to_le32(qh.el_link);
-            cpu_physical_memory_write((curr_qh & ~0xf) + 4,
-                                          (const uint8_t *)&val, sizeof(val));
+            dma_memory_write(&s->dev.dma, (curr_qh & ~0xf) + 4,
+                             (const uint8_t *)&val, sizeof(val));
 
             if (!depth_first(link)) {
                /* done with this QH */
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/13] usb-ohci: use the DMA memory access interface
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: blauwirbel, paul, avi, anthony, av1474, yamahata, kvm,
	qemu-devel, Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/usb-ohci.c |   54 +++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index 240e840..ff27024 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -116,6 +116,11 @@ typedef struct {
 
 } OHCIState;
 
+typedef struct {
+    PCIDevice pci_dev;
+    OHCIState state;
+} OHCIPCIState;
+
 /* Host Controller Communications Area */
 struct ohci_hcca {
     uint32_t intr[32];
@@ -427,12 +432,14 @@ static void ohci_reset(void *opaque)
 static inline int get_dwords(OHCIState *ohci,
                              uint32_t addr, uint32_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_rw(addr, (uint8_t *)buf, sizeof(*buf), 0);
+        dma_memory_rw(dma, addr, (uint8_t *)buf, sizeof(*buf), 0);
         *buf = le32_to_cpu(*buf);
     }
 
@@ -443,13 +450,15 @@ static inline int get_dwords(OHCIState *ohci,
 static inline int put_dwords(OHCIState *ohci,
                              uint32_t addr, uint32_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint32_t tmp = cpu_to_le32(*buf);
-        cpu_physical_memory_rw(addr, (uint8_t *)&tmp, sizeof(tmp), 1);
+        dma_memory_rw(dma, addr, (uint8_t *)&tmp, sizeof(tmp), 1);
     }
 
     return 1;
@@ -459,12 +468,14 @@ static inline int put_dwords(OHCIState *ohci,
 static inline int get_words(OHCIState *ohci,
                             uint32_t addr, uint16_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_rw(addr, (uint8_t *)buf, sizeof(*buf), 0);
+        dma_memory_rw(dma, addr, (uint8_t *)buf, sizeof(*buf), 0);
         *buf = le16_to_cpu(*buf);
     }
 
@@ -475,13 +486,15 @@ static inline int get_words(OHCIState *ohci,
 static inline int put_words(OHCIState *ohci,
                             uint32_t addr, uint16_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint16_t tmp = cpu_to_le16(*buf);
-        cpu_physical_memory_rw(addr, (uint8_t *)&tmp, sizeof(tmp), 1);
+        dma_memory_rw(dma, addr, (uint8_t *)&tmp, sizeof(tmp), 1);
     }
 
     return 1;
@@ -509,8 +522,12 @@ static inline int ohci_read_iso_td(OHCIState *ohci,
 static inline int ohci_read_hcca(OHCIState *ohci,
                                  uint32_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_rw(addr + ohci->localmem_base,
-                           (uint8_t *)hcca, sizeof(*hcca), 0);
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+
+    dma_memory_rw(&s->pci_dev.dma,
+                  addr + ohci->localmem_base,
+                  (uint8_t *)hcca, sizeof(*hcca), 0);
+
     return 1;
 }
 
@@ -536,8 +553,12 @@ static inline int ohci_put_iso_td(OHCIState *ohci,
 static inline int ohci_put_hcca(OHCIState *ohci,
                                 uint32_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_rw(addr + ohci->localmem_base,
-                           (uint8_t *)hcca, sizeof(*hcca), 1);
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+
+    dma_memory_rw(&s->pci_dev.dma,
+                  addr + ohci->localmem_base,
+                  (uint8_t *)hcca, sizeof(*hcca), 1);
+
     return 1;
 }
 
@@ -545,6 +566,8 @@ static inline int ohci_put_hcca(OHCIState *ohci,
 static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
                          uint8_t *buf, int len, int write)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     uint32_t ptr;
     uint32_t n;
 
@@ -552,12 +575,12 @@ static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, n, write);
     if (n == len)
         return;
     ptr = td->be & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, len - n, write);
 }
 
 /* Read/Write the contents of an ISO TD from/to main memory.  */
@@ -565,6 +588,8 @@ static void ohci_copy_iso_td(OHCIState *ohci,
                              uint32_t start_addr, uint32_t end_addr,
                              uint8_t *buf, int len, int write)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     uint32_t ptr;
     uint32_t n;
 
@@ -572,12 +597,12 @@ static void ohci_copy_iso_td(OHCIState *ohci,
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, n, write);
     if (n == len)
         return;
     ptr = end_addr & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, len - n, write);
 }
 
 static void ohci_process_lists(OHCIState *ohci, int completion);
@@ -1706,11 +1731,6 @@ static void usb_ohci_init(OHCIState *ohci, DeviceState *dev,
     qemu_register_reset(ohci_reset, ohci);
 }
 
-typedef struct {
-    PCIDevice pci_dev;
-    OHCIState state;
-} OHCIPCIState;
-
 static void ohci_mapfunc(PCIDevice *pci_dev, int i,
             pcibus_t addr, pcibus_t size, int type)
 {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 13/13] usb-ohci: use the DMA memory access interface
@ 2011-01-29 17:40   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-01-29 17:40 UTC (permalink / raw)
  To: joro
  Cc: kvm, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/usb-ohci.c |   54 +++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index 240e840..ff27024 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -116,6 +116,11 @@ typedef struct {
 
 } OHCIState;
 
+typedef struct {
+    PCIDevice pci_dev;
+    OHCIState state;
+} OHCIPCIState;
+
 /* Host Controller Communications Area */
 struct ohci_hcca {
     uint32_t intr[32];
@@ -427,12 +432,14 @@ static void ohci_reset(void *opaque)
 static inline int get_dwords(OHCIState *ohci,
                              uint32_t addr, uint32_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_rw(addr, (uint8_t *)buf, sizeof(*buf), 0);
+        dma_memory_rw(dma, addr, (uint8_t *)buf, sizeof(*buf), 0);
         *buf = le32_to_cpu(*buf);
     }
 
@@ -443,13 +450,15 @@ static inline int get_dwords(OHCIState *ohci,
 static inline int put_dwords(OHCIState *ohci,
                              uint32_t addr, uint32_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint32_t tmp = cpu_to_le32(*buf);
-        cpu_physical_memory_rw(addr, (uint8_t *)&tmp, sizeof(tmp), 1);
+        dma_memory_rw(dma, addr, (uint8_t *)&tmp, sizeof(tmp), 1);
     }
 
     return 1;
@@ -459,12 +468,14 @@ static inline int put_dwords(OHCIState *ohci,
 static inline int get_words(OHCIState *ohci,
                             uint32_t addr, uint16_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-        cpu_physical_memory_rw(addr, (uint8_t *)buf, sizeof(*buf), 0);
+        dma_memory_rw(dma, addr, (uint8_t *)buf, sizeof(*buf), 0);
         *buf = le16_to_cpu(*buf);
     }
 
@@ -475,13 +486,15 @@ static inline int get_words(OHCIState *ohci,
 static inline int put_words(OHCIState *ohci,
                             uint32_t addr, uint16_t *buf, int num)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     int i;
 
     addr += ohci->localmem_base;
 
     for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
         uint16_t tmp = cpu_to_le16(*buf);
-        cpu_physical_memory_rw(addr, (uint8_t *)&tmp, sizeof(tmp), 1);
+        dma_memory_rw(dma, addr, (uint8_t *)&tmp, sizeof(tmp), 1);
     }
 
     return 1;
@@ -509,8 +522,12 @@ static inline int ohci_read_iso_td(OHCIState *ohci,
 static inline int ohci_read_hcca(OHCIState *ohci,
                                  uint32_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_rw(addr + ohci->localmem_base,
-                           (uint8_t *)hcca, sizeof(*hcca), 0);
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+
+    dma_memory_rw(&s->pci_dev.dma,
+                  addr + ohci->localmem_base,
+                  (uint8_t *)hcca, sizeof(*hcca), 0);
+
     return 1;
 }
 
@@ -536,8 +553,12 @@ static inline int ohci_put_iso_td(OHCIState *ohci,
 static inline int ohci_put_hcca(OHCIState *ohci,
                                 uint32_t addr, struct ohci_hcca *hcca)
 {
-    cpu_physical_memory_rw(addr + ohci->localmem_base,
-                           (uint8_t *)hcca, sizeof(*hcca), 1);
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+
+    dma_memory_rw(&s->pci_dev.dma,
+                  addr + ohci->localmem_base,
+                  (uint8_t *)hcca, sizeof(*hcca), 1);
+
     return 1;
 }
 
@@ -545,6 +566,8 @@ static inline int ohci_put_hcca(OHCIState *ohci,
 static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
                          uint8_t *buf, int len, int write)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     uint32_t ptr;
     uint32_t n;
 
@@ -552,12 +575,12 @@ static void ohci_copy_td(OHCIState *ohci, struct ohci_td *td,
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, n, write);
     if (n == len)
         return;
     ptr = td->be & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, len - n, write);
 }
 
 /* Read/Write the contents of an ISO TD from/to main memory.  */
@@ -565,6 +588,8 @@ static void ohci_copy_iso_td(OHCIState *ohci,
                              uint32_t start_addr, uint32_t end_addr,
                              uint8_t *buf, int len, int write)
 {
+    OHCIPCIState *s = container_of(ohci, OHCIPCIState, state);
+    DMADevice *dma = &s->pci_dev.dma;
     uint32_t ptr;
     uint32_t n;
 
@@ -572,12 +597,12 @@ static void ohci_copy_iso_td(OHCIState *ohci,
     n = 0x1000 - (ptr & 0xfff);
     if (n > len)
         n = len;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, n, write);
     if (n == len)
         return;
     ptr = end_addr & ~0xfffu;
     buf += n;
-    cpu_physical_memory_rw(ptr + ohci->localmem_base, buf, len - n, write);
+    dma_memory_rw(dma, ptr + ohci->localmem_base, buf, len - n, write);
 }
 
 static void ohci_process_lists(OHCIState *ohci, int completion);
@@ -1706,11 +1731,6 @@ static void usb_ohci_init(OHCIState *ohci, DeviceState *dev,
     qemu_register_reset(ohci_reset, ohci);
 }
 
-typedef struct {
-    PCIDevice pci_dev;
-    OHCIState state;
-} OHCIPCIState;
-
 static void ohci_mapfunc(PCIDevice *pci_dev, int i,
             pcibus_t addr, pcibus_t size, int type)
 {
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/13] AMD IOMMU emulation patchset
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-01-29 20:19   ` malc
  -1 siblings, 0 replies; 58+ messages in thread
From: malc @ 2011-01-29 20:19 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Sat, 29 Jan 2011, Eduard - Gabriel Munteanu wrote:

> Hi everybody,
> 
> I'm a bit late, I know, school kept me busy.
> 
> But here it is. I hope I answered your previous concerns in this patchset. Let
> me know what you think, I hope this gets merged soon. Some testing would be
> great.
> 
> The patchset is based on mst/pci. I'll send the SeaBIOS patches soon.
> 

Audio bits look fine to me.

> 
>     Cheers,
>     Eduard
> 
> Eduard - Gabriel Munteanu (13):
>   Generic DMA memory access interface
>   pci: add IOMMU support via the generic DMA layer
>   AMD IOMMU emulation
>   ide: use the DMA memory access interface for PCI IDE controllers
>   rtl8139: use the DMA memory access interface
>   eepro100: use the DMA memory access interface
>   ac97: use the DMA memory access interface
>   es1370: use the DMA memory access interface
>   e1000: use the DMA memory access interface
>   lsi53c895a: use the DMA memory access interface
>   pcnet: use the DMA memory access interface
>   usb-uhci: use the DMA memory access interface
>   usb-ohci: use the DMA memory access interface
> 
>  Makefile.target    |    2 +-
>  dma-helpers.c      |   23 ++-
>  dma.h              |    4 +-
>  hw/ac97.c          |    6 +-
>  hw/amd_iommu.c     |  694 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.c        |  124 ++++++++++
>  hw/dma_rw.h        |  157 ++++++++++++
>  hw/e1000.c         |   26 ++-
>  hw/eepro100.c      |   97 +++++---
>  hw/es1370.c        |    4 +-
>  hw/ide/ahci.c      |    3 +-
>  hw/ide/internal.h  |    1 +
>  hw/ide/macio.c     |    4 +-
>  hw/ide/pci.c       |   18 +-
>  hw/lsi53c895a.c    |   24 +-
>  hw/pc.c            |    2 +
>  hw/pci.c           |    7 +
>  hw/pci.h           |    7 +
>  hw/pci_ids.h       |    2 +
>  hw/pci_internals.h |    1 +
>  hw/pci_regs.h      |    1 +
>  hw/pcnet-pci.c     |    5 +-
>  hw/rtl8139.c       |  100 +++++----
>  hw/usb-ohci.c      |   54 +++--
>  hw/usb-uhci.c      |   26 +-
>  25 files changed, 1233 insertions(+), 159 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 00/13] AMD IOMMU emulation patchset
@ 2011-01-29 20:19   ` malc
  0 siblings, 0 replies; 58+ messages in thread
From: malc @ 2011-01-29 20:19 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Sat, 29 Jan 2011, Eduard - Gabriel Munteanu wrote:

> Hi everybody,
> 
> I'm a bit late, I know, school kept me busy.
> 
> But here it is. I hope I answered your previous concerns in this patchset. Let
> me know what you think, I hope this gets merged soon. Some testing would be
> great.
> 
> The patchset is based on mst/pci. I'll send the SeaBIOS patches soon.
> 

Audio bits look fine to me.

> 
>     Cheers,
>     Eduard
> 
> Eduard - Gabriel Munteanu (13):
>   Generic DMA memory access interface
>   pci: add IOMMU support via the generic DMA layer
>   AMD IOMMU emulation
>   ide: use the DMA memory access interface for PCI IDE controllers
>   rtl8139: use the DMA memory access interface
>   eepro100: use the DMA memory access interface
>   ac97: use the DMA memory access interface
>   es1370: use the DMA memory access interface
>   e1000: use the DMA memory access interface
>   lsi53c895a: use the DMA memory access interface
>   pcnet: use the DMA memory access interface
>   usb-uhci: use the DMA memory access interface
>   usb-ohci: use the DMA memory access interface
> 
>  Makefile.target    |    2 +-
>  dma-helpers.c      |   23 ++-
>  dma.h              |    4 +-
>  hw/ac97.c          |    6 +-
>  hw/amd_iommu.c     |  694 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.c        |  124 ++++++++++
>  hw/dma_rw.h        |  157 ++++++++++++
>  hw/e1000.c         |   26 ++-
>  hw/eepro100.c      |   97 +++++---
>  hw/es1370.c        |    4 +-
>  hw/ide/ahci.c      |    3 +-
>  hw/ide/internal.h  |    1 +
>  hw/ide/macio.c     |    4 +-
>  hw/ide/pci.c       |   18 +-
>  hw/lsi53c895a.c    |   24 +-
>  hw/pc.c            |    2 +
>  hw/pci.c           |    7 +
>  hw/pci.h           |    7 +
>  hw/pci_ids.h       |    2 +
>  hw/pci_internals.h |    1 +
>  hw/pci_regs.h      |    1 +
>  hw/pcnet-pci.c     |    5 +-
>  hw/rtl8139.c       |  100 +++++----
>  hw/usb-ohci.c      |   54 +++--
>  hw/usb-uhci.c      |   26 +-
>  25 files changed, 1233 insertions(+), 159 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 0/3] SeaBIOS AMD IOMMU initialization patches
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-03 23:24   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kevin, mst, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel, Eduard - Gabriel Munteanu

Hi,

Here are the SeaBIOS parts that initialize the AMD IOMMU.

I was told an ack from other QEMU/KVM developers would be nice, so please have
a look.


    Thanks,
    Eduard

Eduard - Gabriel Munteanu (3):
  pci: add pci_find_capability() helper
  AMD IOMMU support
  Clarify address space layout.

 src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/config.h   |    8 ++++-
 src/pci.c      |   15 ++++++++++
 src/pci.h      |    1 +
 src/pci_ids.h  |    1 +
 src/pci_regs.h |    1 +
 src/pciinit.c  |   29 +++++++++++++++++++
 7 files changed, 137 insertions(+), 2 deletions(-)

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 0/3] SeaBIOS AMD IOMMU initialization patches
@ 2011-02-03 23:24   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, kevin, avi,
	Eduard - Gabriel Munteanu, paul

Hi,

Here are the SeaBIOS parts that initialize the AMD IOMMU.

I was told an ack from other QEMU/KVM developers would be nice, so please have
a look.


    Thanks,
    Eduard

Eduard - Gabriel Munteanu (3):
  pci: add pci_find_capability() helper
  AMD IOMMU support
  Clarify address space layout.

 src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/config.h   |    8 ++++-
 src/pci.c      |   15 ++++++++++
 src/pci.h      |    1 +
 src/pci_ids.h  |    1 +
 src/pci_regs.h |    1 +
 src/pciinit.c  |   29 +++++++++++++++++++
 7 files changed, 137 insertions(+), 2 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/3] pci: add pci_find_capability() helper
  2011-02-03 23:24   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kevin, mst, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel, Eduard - Gabriel Munteanu

pci_find_capability() looks up a given capability and returns its
offset. This is needed by AMD IOMMU initialization code.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/pci.c |   15 +++++++++++++++
 src/pci.h |    1 +
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/src/pci.c b/src/pci.c
index 944a393..57caba6 100644
--- a/src/pci.c
+++ b/src/pci.c
@@ -185,6 +185,21 @@ pci_find_class(u16 classid)
     return -1;
 }
 
+int pci_find_capability(int bdf, u8 capid)
+{
+    int ptr, cap;
+
+    ptr = PCI_CAPABILITY_LIST;
+    do {
+        cap = pci_config_readb(bdf, ptr);
+        if (pci_config_readb(bdf, cap) == capid)
+            return cap;
+        ptr = cap + PCI_CAP_LIST_NEXT;
+    } while (cap);
+
+    return -1;
+}
+
 int *PCIpaths;
 
 // Build the PCI path designations.
diff --git a/src/pci.h b/src/pci.h
index 9869a26..bf0d1b8 100644
--- a/src/pci.h
+++ b/src/pci.h
@@ -46,6 +46,7 @@ void pci_config_maskw(u16 bdf, u32 addr, u16 off, u16 on);
 int pci_find_vga(void);
 int pci_find_device(u16 vendid, u16 devid);
 int pci_find_class(u16 classid);
+int pci_find_capability(int bdf, u8 capid);
 
 #define PP_ROOT      (1<<17)
 #define PP_PCIBRIDGE (1<<18)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 1/3] pci: add pci_find_capability() helper
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, kevin, avi,
	Eduard - Gabriel Munteanu, paul

pci_find_capability() looks up a given capability and returns its
offset. This is needed by AMD IOMMU initialization code.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/pci.c |   15 +++++++++++++++
 src/pci.h |    1 +
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/src/pci.c b/src/pci.c
index 944a393..57caba6 100644
--- a/src/pci.c
+++ b/src/pci.c
@@ -185,6 +185,21 @@ pci_find_class(u16 classid)
     return -1;
 }
 
+int pci_find_capability(int bdf, u8 capid)
+{
+    int ptr, cap;
+
+    ptr = PCI_CAPABILITY_LIST;
+    do {
+        cap = pci_config_readb(bdf, ptr);
+        if (pci_config_readb(bdf, cap) == capid)
+            return cap;
+        ptr = cap + PCI_CAP_LIST_NEXT;
+    } while (cap);
+
+    return -1;
+}
+
 int *PCIpaths;
 
 // Build the PCI path designations.
diff --git a/src/pci.h b/src/pci.h
index 9869a26..bf0d1b8 100644
--- a/src/pci.h
+++ b/src/pci.h
@@ -46,6 +46,7 @@ void pci_config_maskw(u16 bdf, u32 addr, u16 off, u16 on);
 int pci_find_vga(void);
 int pci_find_device(u16 vendid, u16 devid);
 int pci_find_class(u16 classid);
+int pci_find_capability(int bdf, u8 capid);
 
 #define PP_ROOT      (1<<17)
 #define PP_PCIBRIDGE (1<<18)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/3] AMD IOMMU support
  2011-02-03 23:24   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kevin, mst, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel, Eduard - Gabriel Munteanu

This initializes the AMD IOMMU and creates ACPI tables for it.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/config.h   |    3 ++
 src/pci_ids.h  |    1 +
 src/pci_regs.h |    1 +
 src/pciinit.c  |   29 +++++++++++++++++++
 5 files changed, 118 insertions(+), 0 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 18830dc..fca152c 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -196,6 +196,36 @@ struct srat_memory_affinity
     u32    reserved3[2];
 } PACKED;
 
+/*
+ * IVRS (I/O Virtualization Reporting Structure) table.
+ *
+ * Describes the AMD IOMMU, as per:
+ * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
+ */
+
+struct ivrs_ivhd
+{
+    u8    type;
+    u8    flags;
+    u16   length;
+    u16   devid;
+    u16   capab_off;
+    u32   iommu_base_low;
+    u32   iommu_base_high;
+    u16   pci_seg_group;
+    u16   iommu_info;
+    u32   reserved;
+    u8    entry[0];
+} PACKED;
+
+struct ivrs_table
+{
+    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
+    u32                iv_info;
+    u32                reserved[2];
+    struct ivrs_ivhd   ivhd;
+} PACKED;
+
 #include "acpi-dsdt.hex"
 
 static void
@@ -579,6 +609,59 @@ build_srat(void)
     return srat;
 }
 
+#define IVRS_SIGNATURE 0x53525649 // IVRS
+#define IVRS_MAX_DEVS  32
+static void *
+build_ivrs(void)
+{
+    int iommu_bdf, iommu_cap;
+    int bdf, max, i;
+    struct ivrs_table *ivrs;
+    struct ivrs_ivhd *ivhd;
+
+    /* Note this currently works for a single IOMMU! */
+    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
+    if (iommu_bdf < 0)
+        return NULL;
+    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
+    if (iommu_cap < 0)
+        return NULL;
+
+    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
+    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
+
+    ivhd = &ivrs->ivhd;
+    ivhd->type              = 0x10;
+    ivhd->flags             = 0;
+    ivhd->length            = sizeof(struct ivrs_ivhd);
+    ivhd->devid             = iommu_bdf;
+    ivhd->capab_off         = iommu_cap;
+    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
+                              0xFFFFFFFE;
+    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
+    ivhd->pci_seg_group     = 0;
+    ivhd->iommu_info        = 0;
+    ivhd->reserved          = 0;
+
+    i = 0;
+    foreachpci(bdf, max) {
+        if (bdf == ivhd->devid)
+            continue;
+        ivhd->entry[4 * i + 0] = 2;
+        ivhd->entry[4 * i + 1] = bdf & 0xFF;
+        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
+        ivhd->entry[4 * i + 3] = ~(1 << 3);
+        ivhd->length += 4;
+        if (++i >= IVRS_MAX_DEVS)
+            break;
+    }
+
+    build_header((void *) ivrs, IVRS_SIGNATURE,
+                 sizeof(struct ivrs_table) + 4 * i, 1);
+
+    return ivrs;
+}
+
 static const struct pci_device_id acpi_find_tbl[] = {
     /* PIIX4 Power Management device. */
     PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
@@ -625,6 +708,7 @@ acpi_bios_init(void)
     ACPI_INIT_TABLE(build_madt());
     ACPI_INIT_TABLE(build_hpet());
     ACPI_INIT_TABLE(build_srat());
+    ACPI_INIT_TABLE(build_ivrs());
 
     u16 i, external_tables = qemu_cfg_acpi_additional_tables();
 
diff --git a/src/config.h b/src/config.h
index 6356941..0ba5723 100644
--- a/src/config.h
+++ b/src/config.h
@@ -172,6 +172,9 @@
 #define BUILD_APIC_ADDR           0xfee00000
 #define BUILD_IOAPIC_ADDR         0xfec00000
 
+#define BUILD_AMD_IOMMU_START     0xfed00000
+#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
+
 #define BUILD_SMM_INIT_ADDR       0x38000
 #define BUILD_SMM_ADDR            0xa8000
 #define BUILD_SMM_SIZE            0x8000
diff --git a/src/pci_ids.h b/src/pci_ids.h
index e1cded2..3cc3c6e 100644
--- a/src/pci_ids.h
+++ b/src/pci_ids.h
@@ -72,6 +72,7 @@
 #define PCI_CLASS_SYSTEM_RTC		0x0803
 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
 #define PCI_CLASS_SYSTEM_SDHCI		0x0805
+#define PCI_CLASS_SYSTEM_IOMMU		0x0806
 #define PCI_CLASS_SYSTEM_OTHER		0x0880
 
 #define PCI_BASE_CLASS_INPUT		0x09
diff --git a/src/pci_regs.h b/src/pci_regs.h
index e5effd4..bfac824 100644
--- a/src/pci_regs.h
+++ b/src/pci_regs.h
@@ -208,6 +208,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
diff --git a/src/pciinit.c b/src/pciinit.c
index ee2e72d..4ebcfbe 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
 static struct pci_region pci_bios_mem_region;
 static struct pci_region pci_bios_prefmem_region;
 
+static u32 amd_iommu_addr;
+
 /* host irqs corresponding to PCI irqs A-D */
 const u8 pci_irqs[4] = {
     10, 10, 11, 11
@@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
     pci_set_io_region_addr(bdf, 0, 0x80800000);
 }
 
+static void amd_iommu_init(u16 bdf, void *arg)
+{
+    int cap;
+    u32 base_addr;
+
+    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
+    if (cap < 0) {
+        return;
+    }
+
+    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
+        return;
+    }
+    base_addr = amd_iommu_addr;
+    amd_iommu_addr += 0x4000;
+
+    pci_config_writel(bdf, cap + 0x0C, 0);
+    pci_config_writel(bdf, cap + 0x08, 0);
+    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
+}
+
 static const struct pci_device_id pci_class_tbl[] = {
     /* STORAGE IDE */
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
@@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
     PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
                      pci_bios_init_device_bridge),
 
+    /* AMD IOMMU */
+    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
+                     amd_iommu_init),
+
     /* default */
     PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
 
@@ -408,6 +435,8 @@ pci_setup(void)
     pci_region_init(&pci_bios_prefmem_region,
                     BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
 
+    amd_iommu_addr = BUILD_AMD_IOMMU_START;
+
     pci_bios_init_bus();
 
     int bdf, max;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 2/3] AMD IOMMU support
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, kevin, avi,
	Eduard - Gabriel Munteanu, paul

This initializes the AMD IOMMU and creates ACPI tables for it.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/config.h   |    3 ++
 src/pci_ids.h  |    1 +
 src/pci_regs.h |    1 +
 src/pciinit.c  |   29 +++++++++++++++++++
 5 files changed, 118 insertions(+), 0 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 18830dc..fca152c 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -196,6 +196,36 @@ struct srat_memory_affinity
     u32    reserved3[2];
 } PACKED;
 
+/*
+ * IVRS (I/O Virtualization Reporting Structure) table.
+ *
+ * Describes the AMD IOMMU, as per:
+ * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
+ */
+
+struct ivrs_ivhd
+{
+    u8    type;
+    u8    flags;
+    u16   length;
+    u16   devid;
+    u16   capab_off;
+    u32   iommu_base_low;
+    u32   iommu_base_high;
+    u16   pci_seg_group;
+    u16   iommu_info;
+    u32   reserved;
+    u8    entry[0];
+} PACKED;
+
+struct ivrs_table
+{
+    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
+    u32                iv_info;
+    u32                reserved[2];
+    struct ivrs_ivhd   ivhd;
+} PACKED;
+
 #include "acpi-dsdt.hex"
 
 static void
@@ -579,6 +609,59 @@ build_srat(void)
     return srat;
 }
 
+#define IVRS_SIGNATURE 0x53525649 // IVRS
+#define IVRS_MAX_DEVS  32
+static void *
+build_ivrs(void)
+{
+    int iommu_bdf, iommu_cap;
+    int bdf, max, i;
+    struct ivrs_table *ivrs;
+    struct ivrs_ivhd *ivhd;
+
+    /* Note this currently works for a single IOMMU! */
+    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
+    if (iommu_bdf < 0)
+        return NULL;
+    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
+    if (iommu_cap < 0)
+        return NULL;
+
+    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
+    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
+
+    ivhd = &ivrs->ivhd;
+    ivhd->type              = 0x10;
+    ivhd->flags             = 0;
+    ivhd->length            = sizeof(struct ivrs_ivhd);
+    ivhd->devid             = iommu_bdf;
+    ivhd->capab_off         = iommu_cap;
+    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
+                              0xFFFFFFFE;
+    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
+    ivhd->pci_seg_group     = 0;
+    ivhd->iommu_info        = 0;
+    ivhd->reserved          = 0;
+
+    i = 0;
+    foreachpci(bdf, max) {
+        if (bdf == ivhd->devid)
+            continue;
+        ivhd->entry[4 * i + 0] = 2;
+        ivhd->entry[4 * i + 1] = bdf & 0xFF;
+        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
+        ivhd->entry[4 * i + 3] = ~(1 << 3);
+        ivhd->length += 4;
+        if (++i >= IVRS_MAX_DEVS)
+            break;
+    }
+
+    build_header((void *) ivrs, IVRS_SIGNATURE,
+                 sizeof(struct ivrs_table) + 4 * i, 1);
+
+    return ivrs;
+}
+
 static const struct pci_device_id acpi_find_tbl[] = {
     /* PIIX4 Power Management device. */
     PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
@@ -625,6 +708,7 @@ acpi_bios_init(void)
     ACPI_INIT_TABLE(build_madt());
     ACPI_INIT_TABLE(build_hpet());
     ACPI_INIT_TABLE(build_srat());
+    ACPI_INIT_TABLE(build_ivrs());
 
     u16 i, external_tables = qemu_cfg_acpi_additional_tables();
 
diff --git a/src/config.h b/src/config.h
index 6356941..0ba5723 100644
--- a/src/config.h
+++ b/src/config.h
@@ -172,6 +172,9 @@
 #define BUILD_APIC_ADDR           0xfee00000
 #define BUILD_IOAPIC_ADDR         0xfec00000
 
+#define BUILD_AMD_IOMMU_START     0xfed00000
+#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
+
 #define BUILD_SMM_INIT_ADDR       0x38000
 #define BUILD_SMM_ADDR            0xa8000
 #define BUILD_SMM_SIZE            0x8000
diff --git a/src/pci_ids.h b/src/pci_ids.h
index e1cded2..3cc3c6e 100644
--- a/src/pci_ids.h
+++ b/src/pci_ids.h
@@ -72,6 +72,7 @@
 #define PCI_CLASS_SYSTEM_RTC		0x0803
 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
 #define PCI_CLASS_SYSTEM_SDHCI		0x0805
+#define PCI_CLASS_SYSTEM_IOMMU		0x0806
 #define PCI_CLASS_SYSTEM_OTHER		0x0880
 
 #define PCI_BASE_CLASS_INPUT		0x09
diff --git a/src/pci_regs.h b/src/pci_regs.h
index e5effd4..bfac824 100644
--- a/src/pci_regs.h
+++ b/src/pci_regs.h
@@ -208,6 +208,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
diff --git a/src/pciinit.c b/src/pciinit.c
index ee2e72d..4ebcfbe 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
 static struct pci_region pci_bios_mem_region;
 static struct pci_region pci_bios_prefmem_region;
 
+static u32 amd_iommu_addr;
+
 /* host irqs corresponding to PCI irqs A-D */
 const u8 pci_irqs[4] = {
     10, 10, 11, 11
@@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
     pci_set_io_region_addr(bdf, 0, 0x80800000);
 }
 
+static void amd_iommu_init(u16 bdf, void *arg)
+{
+    int cap;
+    u32 base_addr;
+
+    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
+    if (cap < 0) {
+        return;
+    }
+
+    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
+        return;
+    }
+    base_addr = amd_iommu_addr;
+    amd_iommu_addr += 0x4000;
+
+    pci_config_writel(bdf, cap + 0x0C, 0);
+    pci_config_writel(bdf, cap + 0x08, 0);
+    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
+}
+
 static const struct pci_device_id pci_class_tbl[] = {
     /* STORAGE IDE */
     PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
@@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
     PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
                      pci_bios_init_device_bridge),
 
+    /* AMD IOMMU */
+    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
+                     amd_iommu_init),
+
     /* default */
     PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
 
@@ -408,6 +435,8 @@ pci_setup(void)
     pci_region_init(&pci_bios_prefmem_region,
                     BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
 
+    amd_iommu_addr = BUILD_AMD_IOMMU_START;
+
     pci_bios_init_bus();
 
     int bdf, max;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 3/3] Clarify address space layout.
  2011-02-03 23:24   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kevin, mst, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel, Eduard - Gabriel Munteanu

This clarifies the address space layout by commenting on where APIC,
IOAPIC and AMD IOMMU building regions end.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/config.h |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/config.h b/src/config.h
index 0ba5723..6ab2071 100644
--- a/src/config.h
+++ b/src/config.h
@@ -169,11 +169,12 @@
 #define BUILD_PCIPREFMEM_END      0xfec00000    /* IOAPIC is mapped at */
 #endif
 
-#define BUILD_APIC_ADDR           0xfee00000
-#define BUILD_IOAPIC_ADDR         0xfec00000
+#define BUILD_IOAPIC_ADDR         0xfec00000    /* Ends at +0x100000. */
 
-#define BUILD_AMD_IOMMU_START     0xfed00000
-#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
+#define BUILD_AMD_IOMMU_START     0xfed00000    /* Can be safely relocated. */
+#define BUILD_AMD_IOMMU_END       0xfee00000
+
+#define BUILD_APIC_ADDR           0xfee00000    /* Ends at +0x100000. */
 
 #define BUILD_SMM_INIT_ADDR       0x38000
 #define BUILD_SMM_ADDR            0xa8000
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH 3/3] Clarify address space layout.
@ 2011-02-03 23:24     ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-03 23:24 UTC (permalink / raw)
  To: seabios
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, kevin, avi,
	Eduard - Gabriel Munteanu, paul

This clarifies the address space layout by commenting on where APIC,
IOAPIC and AMD IOMMU building regions end.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 src/config.h |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/config.h b/src/config.h
index 0ba5723..6ab2071 100644
--- a/src/config.h
+++ b/src/config.h
@@ -169,11 +169,12 @@
 #define BUILD_PCIPREFMEM_END      0xfec00000    /* IOAPIC is mapped at */
 #endif
 
-#define BUILD_APIC_ADDR           0xfee00000
-#define BUILD_IOAPIC_ADDR         0xfec00000
+#define BUILD_IOAPIC_ADDR         0xfec00000    /* Ends at +0x100000. */
 
-#define BUILD_AMD_IOMMU_START     0xfed00000
-#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
+#define BUILD_AMD_IOMMU_START     0xfed00000    /* Can be safely relocated. */
+#define BUILD_AMD_IOMMU_END       0xfee00000
+
+#define BUILD_APIC_ADDR           0xfee00000    /* Ends at +0x100000. */
 
 #define BUILD_SMM_INIT_ADDR       0x38000
 #define BUILD_SMM_ADDR            0xa8000
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/3] AMD IOMMU support
  2011-02-03 23:24     ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-04  2:37       ` Isaku Yamahata
  -1 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2011-02-04  2:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: seabios, kevin, mst, joro, blauwirbel, paul, avi, anthony,
	av1474, kvm, qemu-devel

On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> This initializes the AMD IOMMU and creates ACPI tables for it.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/config.h   |    3 ++
>  src/pci_ids.h  |    1 +
>  src/pci_regs.h |    1 +
>  src/pciinit.c  |   29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 0 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 18830dc..fca152c 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -196,6 +196,36 @@ struct srat_memory_affinity
>      u32    reserved3[2];
>  } PACKED;
>  
> +/*
> + * IVRS (I/O Virtualization Reporting Structure) table.
> + *
> + * Describes the AMD IOMMU, as per:
> + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> + */
> +
> +struct ivrs_ivhd
> +{
> +    u8    type;
> +    u8    flags;
> +    u16   length;
> +    u16   devid;
> +    u16   capab_off;
> +    u32   iommu_base_low;
> +    u32   iommu_base_high;
> +    u16   pci_seg_group;
> +    u16   iommu_info;
> +    u32   reserved;
> +    u8    entry[0];
> +} PACKED;
> +
> +struct ivrs_table
> +{
> +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> +    u32                iv_info;
> +    u32                reserved[2];
> +    struct ivrs_ivhd   ivhd;
> +} PACKED;
> +
>  #include "acpi-dsdt.hex"
>  
>  static void
> @@ -579,6 +609,59 @@ build_srat(void)
>      return srat;
>  }
>  
> +#define IVRS_SIGNATURE 0x53525649 // IVRS
> +#define IVRS_MAX_DEVS  32
> +static void *
> +build_ivrs(void)
> +{
> +    int iommu_bdf, iommu_cap;
> +    int bdf, max, i;
> +    struct ivrs_table *ivrs;
> +    struct ivrs_ivhd *ivhd;
> +
> +    /* Note this currently works for a single IOMMU! */
> +    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
> +    if (iommu_bdf < 0)
> +        return NULL;
> +    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
> +    if (iommu_cap < 0)
> +        return NULL;
> +
> +    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
> +    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
> +
> +    ivhd = &ivrs->ivhd;
> +    ivhd->type              = 0x10;
> +    ivhd->flags             = 0;
> +    ivhd->length            = sizeof(struct ivrs_ivhd);
> +    ivhd->devid             = iommu_bdf;
> +    ivhd->capab_off         = iommu_cap;
> +    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
> +                              0xFFFFFFFE;
> +    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
> +    ivhd->pci_seg_group     = 0;
> +    ivhd->iommu_info        = 0;
> +    ivhd->reserved          = 0;
> +
> +    i = 0;
> +    foreachpci(bdf, max) {
> +        if (bdf == ivhd->devid)
> +            continue;
> +        ivhd->entry[4 * i + 0] = 2;
> +        ivhd->entry[4 * i + 1] = bdf & 0xFF;
> +        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
> +        ivhd->entry[4 * i + 3] = ~(1 << 3);
> +        ivhd->length += 4;
> +        if (++i >= IVRS_MAX_DEVS)
> +            break;
> +    }
> +
> +    build_header((void *) ivrs, IVRS_SIGNATURE,
> +                 sizeof(struct ivrs_table) + 4 * i, 1);
> +
> +    return ivrs;
> +}
> +
>  static const struct pci_device_id acpi_find_tbl[] = {
>      /* PIIX4 Power Management device. */
>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
> @@ -625,6 +708,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_ivrs());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
>  
> diff --git a/src/config.h b/src/config.h
> index 6356941..0ba5723 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -172,6 +172,9 @@
>  #define BUILD_APIC_ADDR           0xfee00000
>  #define BUILD_IOAPIC_ADDR         0xfec00000
>  
> +#define BUILD_AMD_IOMMU_START     0xfed00000
> +#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
> +
>  #define BUILD_SMM_INIT_ADDR       0x38000
>  #define BUILD_SMM_ADDR            0xa8000
>  #define BUILD_SMM_SIZE            0x8000
> diff --git a/src/pci_ids.h b/src/pci_ids.h
> index e1cded2..3cc3c6e 100644
> --- a/src/pci_ids.h
> +++ b/src/pci_ids.h
> @@ -72,6 +72,7 @@
>  #define PCI_CLASS_SYSTEM_RTC		0x0803
>  #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>  #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>  #define PCI_CLASS_SYSTEM_OTHER		0x0880
>  
>  #define PCI_BASE_CLASS_INPUT		0x09
> diff --git a/src/pci_regs.h b/src/pci_regs.h
> index e5effd4..bfac824 100644
> --- a/src/pci_regs.h
> +++ b/src/pci_regs.h
> @@ -208,6 +208,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> diff --git a/src/pciinit.c b/src/pciinit.c
> index ee2e72d..4ebcfbe 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
>  static struct pci_region pci_bios_mem_region;
>  static struct pci_region pci_bios_prefmem_region;
>  
> +static u32 amd_iommu_addr;
> +
>  /* host irqs corresponding to PCI irqs A-D */
>  const u8 pci_irqs[4] = {
>      10, 10, 11, 11
> @@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
>      pci_set_io_region_addr(bdf, 0, 0x80800000);
>  }
>  
> +static void amd_iommu_init(u16 bdf, void *arg)
> +{
> +    int cap;
> +    u32 base_addr;
> +
> +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> +    if (cap < 0) {
> +        return;
> +    }
> +
> +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> +        return;
> +    }
> +    base_addr = amd_iommu_addr;
> +    amd_iommu_addr += 0x4000;
> +
> +    pci_config_writel(bdf, cap + 0x0C, 0);
> +    pci_config_writel(bdf, cap + 0x08, 0);
> +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> +}
> +
>  static const struct pci_device_id pci_class_tbl[] = {
>      /* STORAGE IDE */
>      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
>      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
>                       pci_bios_init_device_bridge),
>  
> +    /* AMD IOMMU */
> +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> +                     amd_iommu_init),
> +
>      /* default */
>      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
>  
> @@ -408,6 +435,8 @@ pci_setup(void)
>      pci_region_init(&pci_bios_prefmem_region,
>                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
>  
> +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> +

Minor nit. How about static initialization?


>      pci_bios_init_bus();
>  
>      int bdf, max;
> -- 
> 1.7.3.4
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] AMD IOMMU support
@ 2011-02-04  2:37       ` Isaku Yamahata
  0 siblings, 0 replies; 58+ messages in thread
From: Isaku Yamahata @ 2011-02-04  2:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, seabios, qemu-devel, blauwirbel, kevin, paul, avi

On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> This initializes the AMD IOMMU and creates ACPI tables for it.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/config.h   |    3 ++
>  src/pci_ids.h  |    1 +
>  src/pci_regs.h |    1 +
>  src/pciinit.c  |   29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 0 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 18830dc..fca152c 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -196,6 +196,36 @@ struct srat_memory_affinity
>      u32    reserved3[2];
>  } PACKED;
>  
> +/*
> + * IVRS (I/O Virtualization Reporting Structure) table.
> + *
> + * Describes the AMD IOMMU, as per:
> + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> + */
> +
> +struct ivrs_ivhd
> +{
> +    u8    type;
> +    u8    flags;
> +    u16   length;
> +    u16   devid;
> +    u16   capab_off;
> +    u32   iommu_base_low;
> +    u32   iommu_base_high;
> +    u16   pci_seg_group;
> +    u16   iommu_info;
> +    u32   reserved;
> +    u8    entry[0];
> +} PACKED;
> +
> +struct ivrs_table
> +{
> +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> +    u32                iv_info;
> +    u32                reserved[2];
> +    struct ivrs_ivhd   ivhd;
> +} PACKED;
> +
>  #include "acpi-dsdt.hex"
>  
>  static void
> @@ -579,6 +609,59 @@ build_srat(void)
>      return srat;
>  }
>  
> +#define IVRS_SIGNATURE 0x53525649 // IVRS
> +#define IVRS_MAX_DEVS  32
> +static void *
> +build_ivrs(void)
> +{
> +    int iommu_bdf, iommu_cap;
> +    int bdf, max, i;
> +    struct ivrs_table *ivrs;
> +    struct ivrs_ivhd *ivhd;
> +
> +    /* Note this currently works for a single IOMMU! */
> +    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
> +    if (iommu_bdf < 0)
> +        return NULL;
> +    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
> +    if (iommu_cap < 0)
> +        return NULL;
> +
> +    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
> +    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
> +
> +    ivhd = &ivrs->ivhd;
> +    ivhd->type              = 0x10;
> +    ivhd->flags             = 0;
> +    ivhd->length            = sizeof(struct ivrs_ivhd);
> +    ivhd->devid             = iommu_bdf;
> +    ivhd->capab_off         = iommu_cap;
> +    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
> +                              0xFFFFFFFE;
> +    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
> +    ivhd->pci_seg_group     = 0;
> +    ivhd->iommu_info        = 0;
> +    ivhd->reserved          = 0;
> +
> +    i = 0;
> +    foreachpci(bdf, max) {
> +        if (bdf == ivhd->devid)
> +            continue;
> +        ivhd->entry[4 * i + 0] = 2;
> +        ivhd->entry[4 * i + 1] = bdf & 0xFF;
> +        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
> +        ivhd->entry[4 * i + 3] = ~(1 << 3);
> +        ivhd->length += 4;
> +        if (++i >= IVRS_MAX_DEVS)
> +            break;
> +    }
> +
> +    build_header((void *) ivrs, IVRS_SIGNATURE,
> +                 sizeof(struct ivrs_table) + 4 * i, 1);
> +
> +    return ivrs;
> +}
> +
>  static const struct pci_device_id acpi_find_tbl[] = {
>      /* PIIX4 Power Management device. */
>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
> @@ -625,6 +708,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_ivrs());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
>  
> diff --git a/src/config.h b/src/config.h
> index 6356941..0ba5723 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -172,6 +172,9 @@
>  #define BUILD_APIC_ADDR           0xfee00000
>  #define BUILD_IOAPIC_ADDR         0xfec00000
>  
> +#define BUILD_AMD_IOMMU_START     0xfed00000
> +#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
> +
>  #define BUILD_SMM_INIT_ADDR       0x38000
>  #define BUILD_SMM_ADDR            0xa8000
>  #define BUILD_SMM_SIZE            0x8000
> diff --git a/src/pci_ids.h b/src/pci_ids.h
> index e1cded2..3cc3c6e 100644
> --- a/src/pci_ids.h
> +++ b/src/pci_ids.h
> @@ -72,6 +72,7 @@
>  #define PCI_CLASS_SYSTEM_RTC		0x0803
>  #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>  #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>  #define PCI_CLASS_SYSTEM_OTHER		0x0880
>  
>  #define PCI_BASE_CLASS_INPUT		0x09
> diff --git a/src/pci_regs.h b/src/pci_regs.h
> index e5effd4..bfac824 100644
> --- a/src/pci_regs.h
> +++ b/src/pci_regs.h
> @@ -208,6 +208,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> diff --git a/src/pciinit.c b/src/pciinit.c
> index ee2e72d..4ebcfbe 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
>  static struct pci_region pci_bios_mem_region;
>  static struct pci_region pci_bios_prefmem_region;
>  
> +static u32 amd_iommu_addr;
> +
>  /* host irqs corresponding to PCI irqs A-D */
>  const u8 pci_irqs[4] = {
>      10, 10, 11, 11
> @@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
>      pci_set_io_region_addr(bdf, 0, 0x80800000);
>  }
>  
> +static void amd_iommu_init(u16 bdf, void *arg)
> +{
> +    int cap;
> +    u32 base_addr;
> +
> +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> +    if (cap < 0) {
> +        return;
> +    }
> +
> +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> +        return;
> +    }
> +    base_addr = amd_iommu_addr;
> +    amd_iommu_addr += 0x4000;
> +
> +    pci_config_writel(bdf, cap + 0x0C, 0);
> +    pci_config_writel(bdf, cap + 0x08, 0);
> +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> +}
> +
>  static const struct pci_device_id pci_class_tbl[] = {
>      /* STORAGE IDE */
>      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
>      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
>                       pci_bios_init_device_bridge),
>  
> +    /* AMD IOMMU */
> +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> +                     amd_iommu_init),
> +
>      /* default */
>      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
>  
> @@ -408,6 +435,8 @@ pci_setup(void)
>      pci_region_init(&pci_bios_prefmem_region,
>                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
>  
> +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> +

Minor nit. How about static initialization?


>      pci_bios_init_bus();
>  
>      int bdf, max;
> -- 
> 1.7.3.4
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/13] Generic DMA memory access interface
  2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-05 10:20     ` Blue Swirl
  -1 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2011-02-05 10:20 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, seabios, kevin, joro, paul, avi, anthony, av1474, yamahata,
	kvm, qemu-devel

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
>
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \

If the access is unaligned and the translation splits it to two (for
example, because of page boundary), the access is ignored, which can't
be correct.

Do we have such cases? If yes, should this be handled by the caller
instead (maybe not)?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 01/13] Generic DMA memory access interface
@ 2011-02-05 10:20     ` Blue Swirl
  0 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2011-02-05 10:20 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, seabios, qemu-devel, yamahata, kevin, avi, paul

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
>
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \

If the access is unaligned and the translation splits it to two (for
example, because of page boundary), the access is ignored, which can't
be correct.

Do we have such cases? If yes, should this be handled by the caller
instead (maybe not)?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/13] AMD IOMMU emulation patchset (reworked cc/to)
  2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-05 13:07   ` Blue Swirl
  -1 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2011-02-05 13:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, seabios, kevin, joro, paul, avi, anthony, av1474, yamahata,
	kvm, qemu-devel

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> Hi again,
>
> Sorry for the mess, I forgot to cc Michael and this should go through his tree.
> I'm also cc-ing the SeaBIOS people.
>
> malc already ack-ed the audio bits.

Please use scripts/checkpatch.pl to check for whitespace, brace etc.
issues. The patches (except for 01) look fine to me otherwise.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 00/13] AMD IOMMU emulation patchset (reworked cc/to)
@ 2011-02-05 13:07   ` Blue Swirl
  0 siblings, 0 replies; 58+ messages in thread
From: Blue Swirl @ 2011-02-05 13:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, seabios, qemu-devel, yamahata, kevin, avi, paul

On Thu, Feb 3, 2011 at 11:32 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> Hi again,
>
> Sorry for the mess, I forgot to cc Michael and this should go through his tree.
> I'm also cc-ing the SeaBIOS people.
>
> malc already ack-ed the audio bits.

Please use scripts/checkpatch.pl to check for whitespace, brace etc.
issues. The patches (except for 01) look fine to me otherwise.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/13] AMD IOMMU emulation
  2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 10:54     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 10:54 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: seabios, kevin, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Fri, Feb 04, 2011 at 01:32:57AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/amd_iommu.c  |  694 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 700 insertions(+), 1 deletions(-)
>  create mode 100644 hw/amd_iommu.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index e5817ab..4b650bd 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o dma_rw.o
> +obj-i386-y += pc_piix.o dma_rw.o amd_iommu.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..6c6346a
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,694 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +#include "dma_rw.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +#define CAPAB_REG_SIZE          0x04
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +#define IOMMU_PERM_READ             (1 << 0)
> +#define IOMMU_PERM_WRITE            (1 << 1)
> +#define IOMMU_PERM_RW               (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, devfn);
> +    if (dev) {
> +        dma_invalidate_memory_range(&dev->dma, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_exec(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    while (st->cmdbuf_head != st->cmdbuf_tail) {
> +        /* Wrap head pointer. */
> +        if (st->cmdbuf_head >= st->cmdbuf_len * CMDBUF_ENTRY_SIZE) {
> +            st->cmdbuf_head = 0;
> +        }
> +
> +        amd_iommu_cmdbuf_exec(st);
> +
> +        /* Increment head pointer. */
> +        st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    }
> +
> +    *((uint64_t *) (st->mmio_buf + MMIO_COMMAND_HEAD)) = cpu_to_le64(st->cmdbuf_head);
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}

The above seem to do something like LE/BE conversion?
If yes, it's better to use the appropriate macros for this.
To support unaligned access, memcpy the data to an
aligned buffer.

> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
> +    uint64_t val = le64_to_cpu(*base);
> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +            
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +
> +            /* We must reset the head and tail pointers. */
> +            st->cmdbuf_head = st->cmdbuf_tail = 0;
> +            memset(st->mmio_buf + MMIO_COMMAND_HEAD, 0, 8);
> +            memset(st->mmio_buf + MMIO_COMMAND_TAIL, 0, 8);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write,
> +                                            st, DEVICE_LITTLE_ENDIAN);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;

remove space before &.

> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_enabled = 1;
> +
> +    /* Further changes to the capability are prohibited. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_default_write_config(dev, addr, val, len);
> +
> +    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static void amd_iommu_reset(DeviceState *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
> +    unsigned char *capab = st->capab;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->enabled      = 0;
> +    st->ats_enabled  = 0;
> +    st->mmio_enabled = 0;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    /* Changes to the capability are allowed after system reset. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
> +
> +    memset(st->mmio_buf, 0, MMIO_SIZE);
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static inline AMDIOMMUState *amd_iommu_dma_to_state(DMADevice *dev)
> +{
> +    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu); 
> +
> +    return DO_UPCAST(AMDIOMMUState, dev, pci_dev);
> +}
> +
> +static int amd_iommu_translate(DMADevice *dev,
> +                               dma_addr_t addr,
> +                               dma_addr_t *paddr,
> +                               dma_addr_t *len,
> +                               int is_write)
> +{
> +    PCIDevice *pci_dev = container_of(dev, PCIDevice, dma);
> +    PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu);
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev);
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    unsigned perms;
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /*
> +     * It's okay to check for either read or write permissions
> +     * even for memory maps, since we don't support R/W maps.
> +     */
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    /* Get device table entry. */
> +    devfn = pci_dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;

The spec seems to specify that errors
should signal target abort (in pci config space).

Is the iommu ever an extress device?
if yes this could interact with AER.

> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    /* Secure Device capability */
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC, 0, CAPAB_SIZE);
> +    st->capab = st->dev.config + st->capab_offset;
> +    dev->config_write = amd_iommu_write_capab;
> +
> +    /* Allocate backing space for the MMIO registers. */
> +    st->mmio_buf = qemu_malloc(MMIO_SIZE);
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.reset   = amd_iommu_reset,
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> diff --git a/hw/pc.c b/hw/pc.c
> index fface7d..9f51e95 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1163,6 +1163,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>      int max_bus;
>      int bus;
>  
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>      max_bus = drive_get_max_bus(IF_SCSI);
>      for (bus = 0; bus <= max_bus; bus++) {
>          pci_create_simple(pci_bus, -1, "lsi53c895a");

Will this affect the default pc?
If yes it's probably not a good idea.
I'd prefer new devices used qdev exclusively.

> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index ea3418c..5dbe281 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -27,6 +27,7 @@
>  
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>  
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>  
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -57,6 +58,7 @@
>  
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */

ID 0?
Also, out ids file is a copy from linux. Add the id there
or put it in your .c file.

>  
>  #define PCI_VENDOR_ID_TI                 0x104c
>  
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index dd0bed4..3d098aa 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -209,6 +209,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */

Our pci_regs.h is a copy of the one in linux. Either add the id there,
or put it in your .c file.

>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 03/13] AMD IOMMU emulation
@ 2011-02-06 10:54     ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 10:54 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Fri, Feb 04, 2011 at 01:32:57AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/amd_iommu.c  |  694 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 700 insertions(+), 1 deletions(-)
>  create mode 100644 hw/amd_iommu.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index e5817ab..4b650bd 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o dma_rw.o
> +obj-i386-y += pc_piix.o dma_rw.o amd_iommu.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..6c6346a
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,694 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +#include "dma_rw.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +#define CAPAB_REG_SIZE          0x04
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +#define IOMMU_PERM_READ             (1 << 0)
> +#define IOMMU_PERM_WRITE            (1 << 1)
> +#define IOMMU_PERM_RW               (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, devfn);
> +    if (dev) {
> +        dma_invalidate_memory_range(&dev->dma, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_exec(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    while (st->cmdbuf_head != st->cmdbuf_tail) {
> +        /* Wrap head pointer. */
> +        if (st->cmdbuf_head >= st->cmdbuf_len * CMDBUF_ENTRY_SIZE) {
> +            st->cmdbuf_head = 0;
> +        }
> +
> +        amd_iommu_cmdbuf_exec(st);
> +
> +        /* Increment head pointer. */
> +        st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    }
> +
> +    *((uint64_t *) (st->mmio_buf + MMIO_COMMAND_HEAD)) = cpu_to_le64(st->cmdbuf_head);
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}

The above seem to do something like LE/BE conversion?
If yes, it's better to use the appropriate macros for this.
To support unaligned access, memcpy the data to an
aligned buffer.

> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
> +    uint64_t val = le64_to_cpu(*base);
> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +            
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +
> +            /* We must reset the head and tail pointers. */
> +            st->cmdbuf_head = st->cmdbuf_tail = 0;
> +            memset(st->mmio_buf + MMIO_COMMAND_HEAD, 0, 8);
> +            memset(st->mmio_buf + MMIO_COMMAND_TAIL, 0, 8);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write,
> +                                            st, DEVICE_LITTLE_ENDIAN);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;

remove space before &.

> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_enabled = 1;
> +
> +    /* Further changes to the capability are prohibited. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0x00, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0x00, CAPAB_REG_SIZE);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_default_write_config(dev, addr, val, len);
> +
> +    if (!st->mmio_enabled && st->capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static void amd_iommu_reset(DeviceState *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev.qdev, dev);
> +    unsigned char *capab = st->capab;
> +    uint8_t *capab_wmask = st->dev.wmask + st->capab_offset;
> +
> +    st->enabled      = 0;
> +    st->ats_enabled  = 0;
> +    st->mmio_enabled = 0;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    /* Changes to the capability are allowed after system reset. */
> +    memset(capab_wmask + CAPAB_BAR_LOW, 0xFF, CAPAB_REG_SIZE);
> +    memset(capab_wmask + CAPAB_BAR_HIGH, 0xFF, CAPAB_REG_SIZE);
> +
> +    memset(st->mmio_buf, 0, MMIO_SIZE);
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static inline AMDIOMMUState *amd_iommu_dma_to_state(DMADevice *dev)
> +{
> +    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu); 
> +
> +    return DO_UPCAST(AMDIOMMUState, dev, pci_dev);
> +}
> +
> +static int amd_iommu_translate(DMADevice *dev,
> +                               dma_addr_t addr,
> +                               dma_addr_t *paddr,
> +                               dma_addr_t *len,
> +                               int is_write)
> +{
> +    PCIDevice *pci_dev = container_of(dev, PCIDevice, dma);
> +    PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev->mmu->iommu);
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev);
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    unsigned perms;
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /*
> +     * It's okay to check for either read or write permissions
> +     * even for memory maps, since we don't support R/W maps.
> +     */
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    /* Get device table entry. */
> +    devfn = pci_dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;

The spec seems to specify that errors
should signal target abort (in pci config space).

Is the iommu ever an extress device?
if yes this could interact with AER.

> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    /* Secure Device capability */
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC, 0, CAPAB_SIZE);
> +    st->capab = st->dev.config + st->capab_offset;
> +    dev->config_write = amd_iommu_write_capab;
> +
> +    /* Allocate backing space for the MMIO registers. */
> +    st->mmio_buf = qemu_malloc(MMIO_SIZE);
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.reset   = amd_iommu_reset,
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> diff --git a/hw/pc.c b/hw/pc.c
> index fface7d..9f51e95 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1163,6 +1163,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>      int max_bus;
>      int bus;
>  
> +    pci_create_simple(pci_bus, -1, "amd-iommu");
> +
>      max_bus = drive_get_max_bus(IF_SCSI);
>      for (bus = 0; bus <= max_bus; bus++) {
>          pci_create_simple(pci_bus, -1, "lsi53c895a");

Will this affect the default pc?
If yes it's probably not a good idea.
I'd prefer new devices used qdev exclusively.

> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index ea3418c..5dbe281 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -27,6 +27,7 @@
>  
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>  
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>  
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -57,6 +58,7 @@
>  
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */

ID 0?
Also, out ids file is a copy from linux. Add the id there
or put it in your .c file.

>  
>  #define PCI_VENDOR_ID_TI                 0x104c
>  
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index dd0bed4..3d098aa 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -209,6 +209,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */

Our pci_regs.h is a copy of the one in linux. Either add the id there,
or put it in your .c file.

>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/13] Generic DMA memory access interface
  2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 11:13     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:13 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, av1474, avi, paul

On Fri, Feb 04, 2011 at 01:32:55AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);

So len is in/out here which is a bit confusing,
and apparently not documented until you look at the usage.
I also don't think it needs to be dma_addr_t - it's not
an address. I don't believe we ever need to
translate more than 2G in one go: how about returning
the length on success, negative on error?

Or add a comment.

> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        st##suffix##_phys(addr, val);                                     \
> +        return;                                                           \
> +    }                                                                     \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif

I am guessing the assumption is that address is size-aligned
(which is right) so translation will fail for all addresses
or pass for all of them. But in that case,
assert() is better?

> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 01/13] Generic DMA memory access interface
@ 2011-02-06 11:13     ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:13 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Fri, Feb 04, 2011 at 01:32:55AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);

So len is in/out here which is a bit confusing,
and apparently not documented until you look at the usage.
I also don't think it needs to be dma_addr_t - it's not
an address. I don't believe we ever need to
translate more than 2G in one go: how about returning
the length on success, negative on error?

Or add a comment.

> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        st##suffix##_phys(addr, val);                                     \
> +        return;                                                           \
> +    }                                                                     \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif

I am guessing the assumption is that address is size-aligned
(which is right) so translation will fail for all addresses
or pass for all of them. But in that case,
assert() is better?

> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/13] ide: use the DMA memory access interface for PCI IDE controllers
  2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 11:14     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:14 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: seabios, kevin, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Fri, Feb 04, 2011 at 01:32:58AM +0200, Eduard - Gabriel Munteanu wrote:
> Emulated PCI IDE controllers now use the memory access interface. This
> also allows an emulated IOMMU to translate and check accesses.
> 
> Map invalidation results in cancelling DMA transfers. Since the guest OS
> can't properly recover the DMA results in case the mapping is changed,
> this is a fairly good approximation.
> 
> Note this doesn't handle AHCI emulation yet!
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

How about not changing ahci then, and failing initialization
if an mmu is present?

> ---
>  dma-helpers.c     |   23 ++++++++++++++++++-----
>  dma.h             |    4 +++-
>  hw/ide/ahci.c     |    3 ++-
>  hw/ide/internal.h |    1 +
>  hw/ide/macio.c    |    4 ++--
>  hw/ide/pci.c      |   18 +++++++++++-------
>  6 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 712ed89..29a74a4 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -10,12 +10,13 @@
>  #include "dma.h"
>  #include "block_int.h"
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma)
>  {
>      qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
>      qsg->nsg = 0;
>      qsg->nalloc = alloc_hint;
>      qsg->size = 0;
> +    qsg->dma = dma;
>  }
>  
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
> @@ -73,12 +74,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
>      int i;
>  
>      for (i = 0; i < dbs->iov.niov; ++i) {
> -        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
> -                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
> -                                  dbs->iov.iov[i].iov_len);
> +        dma_memory_unmap(dbs->sg->dma,
> +                         dbs->iov.iov[i].iov_base,
> +                         dbs->iov.iov[i].iov_len, !dbs->is_write,
> +                         dbs->iov.iov[i].iov_len);
>      }
>  }
>  
> +static void dma_bdrv_cancel(void *opaque)
> +{
> +    DMAAIOCB *dbs = opaque;
> +
> +    bdrv_aio_cancel(dbs->acb);
> +    dma_bdrv_unmap(dbs);
> +    qemu_iovec_destroy(&dbs->iov);
> +    qemu_aio_release(dbs);
> +}
> +
>  static void dma_bdrv_cb(void *opaque, int ret)
>  {
>      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> @@ -100,7 +112,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
>      while (dbs->sg_cur_index < dbs->sg->nsg) {
>          cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
>          cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
> -        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
> +        mem = dma_memory_map(dbs->sg->dma, dma_bdrv_cancel, dbs,
> +                             cur_addr, &cur_len, !dbs->is_write);
>          if (!mem)
>              break;
>          qemu_iovec_add(&dbs->iov, mem, cur_len);
> diff --git a/dma.h b/dma.h
> index f3bb275..2417b32 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -14,6 +14,7 @@
>  //#include "cpu.h"
>  #include "hw/hw.h"
>  #include "block.h"
> +#include "hw/dma_rw.h"
>  
>  typedef struct {
>      target_phys_addr_t base;
> @@ -25,9 +26,10 @@ typedef struct {
>      int nsg;
>      int nalloc;
>      target_phys_addr_t size;
> +    DMADevice *dma;
>  } QEMUSGList;
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma);
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
>                       target_phys_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 968fdce..aea06a9 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -993,7 +993,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
>      if (sglist_alloc_hint > 0) {
>          AHCI_SG *tbl = (AHCI_SG *)prdt;
>  
> -        qemu_sglist_init(sglist, sglist_alloc_hint);
> +        /* FIXME: pass a proper DMADevice. */
> +        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
>          for (i = 0; i < sglist_alloc_hint; i++) {
>              /* flags_size is zero-based */
>              qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 697c3b4..3d3d5db 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -468,6 +468,7 @@ struct IDEDMA {
>      struct iovec iov;
>      QEMUIOVector qiov;
>      BlockDriverAIOCB *aiocb;
> +    DMADevice *dev;
>  };
>  
>  struct IDEBus {
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index c1b4caa..654ae7c 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
>  
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>      s->io_buffer_index = 0;
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 510b2de..e3432c4 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -64,7 +64,8 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
>      } prd;
>      int l, len;
>  
> -    qemu_sglist_init(&s->sg, s->nsector / (BMDMA_PAGE_SIZE / 512) + 1);
> +    qemu_sglist_init(&s->sg,
> +                     s->nsector / (BMDMA_PAGE_SIZE / 512) + 1, dma->dev);
>      s->io_buffer_size = 0;
>      for(;;) {
>          if (bm->cur_prd_len == 0) {
> @@ -72,7 +73,7 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
>                  return s->io_buffer_size != 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -114,7 +115,7 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
>                  return 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -129,11 +130,11 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
>              l = bm->cur_prd_len;
>          if (l > 0) {
>              if (is_write) {
> -                cpu_physical_memory_write(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                dma_memory_write(dma->dev, bm->cur_prd_addr,
> +                                 s->io_buffer + s->io_buffer_index, l);
>              } else {
> -                cpu_physical_memory_read(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                dma_memory_read(dma->dev, bm->cur_prd_addr,
> +                                s->io_buffer + s->io_buffer_index, l);
>              }
>              bm->cur_prd_addr += l;
>              bm->cur_prd_len -= l;
> @@ -444,6 +445,9 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
>              continue;
>          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
>      }
> +
> +    d->bmdma[0].dma.dev = &dev->dma;
> +    d->bmdma[1].dma.dev = &dev->dma;
>  }
>  
>  static const struct IDEDMAOps bmdma_ops = {
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 04/13] ide: use the DMA memory access interface for PCI IDE controllers
@ 2011-02-06 11:14     ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:14 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Fri, Feb 04, 2011 at 01:32:58AM +0200, Eduard - Gabriel Munteanu wrote:
> Emulated PCI IDE controllers now use the memory access interface. This
> also allows an emulated IOMMU to translate and check accesses.
> 
> Map invalidation results in cancelling DMA transfers. Since the guest OS
> can't properly recover the DMA results in case the mapping is changed,
> this is a fairly good approximation.
> 
> Note this doesn't handle AHCI emulation yet!
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

How about not changing ahci then, and failing initialization
if an mmu is present?

> ---
>  dma-helpers.c     |   23 ++++++++++++++++++-----
>  dma.h             |    4 +++-
>  hw/ide/ahci.c     |    3 ++-
>  hw/ide/internal.h |    1 +
>  hw/ide/macio.c    |    4 ++--
>  hw/ide/pci.c      |   18 +++++++++++-------
>  6 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 712ed89..29a74a4 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -10,12 +10,13 @@
>  #include "dma.h"
>  #include "block_int.h"
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma)
>  {
>      qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
>      qsg->nsg = 0;
>      qsg->nalloc = alloc_hint;
>      qsg->size = 0;
> +    qsg->dma = dma;
>  }
>  
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
> @@ -73,12 +74,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
>      int i;
>  
>      for (i = 0; i < dbs->iov.niov; ++i) {
> -        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
> -                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
> -                                  dbs->iov.iov[i].iov_len);
> +        dma_memory_unmap(dbs->sg->dma,
> +                         dbs->iov.iov[i].iov_base,
> +                         dbs->iov.iov[i].iov_len, !dbs->is_write,
> +                         dbs->iov.iov[i].iov_len);
>      }
>  }
>  
> +static void dma_bdrv_cancel(void *opaque)
> +{
> +    DMAAIOCB *dbs = opaque;
> +
> +    bdrv_aio_cancel(dbs->acb);
> +    dma_bdrv_unmap(dbs);
> +    qemu_iovec_destroy(&dbs->iov);
> +    qemu_aio_release(dbs);
> +}
> +
>  static void dma_bdrv_cb(void *opaque, int ret)
>  {
>      DMAAIOCB *dbs = (DMAAIOCB *)opaque;
> @@ -100,7 +112,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
>      while (dbs->sg_cur_index < dbs->sg->nsg) {
>          cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
>          cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
> -        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
> +        mem = dma_memory_map(dbs->sg->dma, dma_bdrv_cancel, dbs,
> +                             cur_addr, &cur_len, !dbs->is_write);
>          if (!mem)
>              break;
>          qemu_iovec_add(&dbs->iov, mem, cur_len);
> diff --git a/dma.h b/dma.h
> index f3bb275..2417b32 100644
> --- a/dma.h
> +++ b/dma.h
> @@ -14,6 +14,7 @@
>  //#include "cpu.h"
>  #include "hw/hw.h"
>  #include "block.h"
> +#include "hw/dma_rw.h"
>  
>  typedef struct {
>      target_phys_addr_t base;
> @@ -25,9 +26,10 @@ typedef struct {
>      int nsg;
>      int nalloc;
>      target_phys_addr_t size;
> +    DMADevice *dma;
>  } QEMUSGList;
>  
> -void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
> +void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMADevice *dma);
>  void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
>                       target_phys_addr_t len);
>  void qemu_sglist_destroy(QEMUSGList *qsg);
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 968fdce..aea06a9 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -993,7 +993,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist)
>      if (sglist_alloc_hint > 0) {
>          AHCI_SG *tbl = (AHCI_SG *)prdt;
>  
> -        qemu_sglist_init(sglist, sglist_alloc_hint);
> +        /* FIXME: pass a proper DMADevice. */
> +        qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
>          for (i = 0; i < sglist_alloc_hint; i++) {
>              /* flags_size is zero-based */
>              qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 697c3b4..3d3d5db 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -468,6 +468,7 @@ struct IDEDMA {
>      struct iovec iov;
>      QEMUIOVector qiov;
>      BlockDriverAIOCB *aiocb;
> +    DMADevice *dev;
>  };
>  
>  struct IDEBus {
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index c1b4caa..654ae7c 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -79,7 +79,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
>  
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> @@ -141,7 +141,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>      s->io_buffer_index = 0;
>      s->io_buffer_size = io->len;
>  
> -    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
> +    qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
>      qemu_sglist_add(&s->sg, io->addr, io->len);
>      io->addr += io->len;
>      io->len = 0;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 510b2de..e3432c4 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -64,7 +64,8 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
>      } prd;
>      int l, len;
>  
> -    qemu_sglist_init(&s->sg, s->nsector / (BMDMA_PAGE_SIZE / 512) + 1);
> +    qemu_sglist_init(&s->sg,
> +                     s->nsector / (BMDMA_PAGE_SIZE / 512) + 1, dma->dev);
>      s->io_buffer_size = 0;
>      for(;;) {
>          if (bm->cur_prd_len == 0) {
> @@ -72,7 +73,7 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
>                  return s->io_buffer_size != 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -114,7 +115,7 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
>              if (bm->cur_prd_last ||
>                  (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE)
>                  return 0;
> -            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
> +            dma_memory_read(dma->dev, bm->cur_addr, (uint8_t *)&prd, 8);
>              bm->cur_addr += 8;
>              prd.addr = le32_to_cpu(prd.addr);
>              prd.size = le32_to_cpu(prd.size);
> @@ -129,11 +130,11 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
>              l = bm->cur_prd_len;
>          if (l > 0) {
>              if (is_write) {
> -                cpu_physical_memory_write(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                dma_memory_write(dma->dev, bm->cur_prd_addr,
> +                                 s->io_buffer + s->io_buffer_index, l);
>              } else {
> -                cpu_physical_memory_read(bm->cur_prd_addr,
> -                                          s->io_buffer + s->io_buffer_index, l);
> +                dma_memory_read(dma->dev, bm->cur_prd_addr,
> +                                s->io_buffer + s->io_buffer_index, l);
>              }
>              bm->cur_prd_addr += l;
>              bm->cur_prd_len -= l;
> @@ -444,6 +445,9 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
>              continue;
>          ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
>      }
> +
> +    d->bmdma[0].dma.dev = &dev->dma;
> +    d->bmdma[1].dma.dev = &dev->dma;
>  }
>  
>  static const struct IDEDMAOps bmdma_ops = {
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/13] Generic DMA memory access interface
  2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 11:16     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:16 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, av1474, avi, paul

On Fri, Feb 04, 2011 at 01:32:55AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o

Does this need to be target specific?

>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h

Can we have a configure option to disable this
at compile time? Add stubs to avoid propagating ifdefs
all over the code.


> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        st##suffix##_phys(addr, val);                                     \
> +        return;                                                           \
> +    }                                                                     \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 01/13] Generic DMA memory access interface
@ 2011-02-06 11:16     ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:16 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Fri, Feb 04, 2011 at 01:32:55AM +0200, Eduard - Gabriel Munteanu wrote:
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +-
>  hw/dma_rw.c     |  124 +++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 282 insertions(+), 1 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..e5817ab 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -218,7 +218,7 @@ obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
>  obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> -obj-i386-y += pc_piix.o
> +obj-i386-y += pc_piix.o dma_rw.o

Does this need to be target specific?

>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  
>  # shared objects
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..ef8e7f8
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,124 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +static void dma_register_memory_map(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    dma_addr_t len,
> +                                    target_phys_addr_t paddr,
> +                                    DMAInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    DMAMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(DMAMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +}
> +
> +static void dma_unregister_memory_map(DMADevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +
> +    if (!dev || !dev->mmu) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        dma_register_memory_map(dev, addr, *len, paddr, cb, opaque);
> +    }
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (dev && dev->mmu) {
> +        dma_unregister_memory_map(dev, (target_phys_addr_t) buffer, len);
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h

Can we have a configure option to disable this
at compile time? Add stubs to avoid propagating ifdefs
all over the code.


> new file mode 100644
> index 0000000..bc93511
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,157 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    dma_addr_t              len;
> +    target_phys_addr_t      paddr;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +static inline void dma_memory_rw(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 void *buf,
> +                                 dma_addr_t len,
> +                                 int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (!dev || !dev->mmu) {
> +        cpu_physical_memory_rw(addr, buf, plen, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     dma_addr_t addr,
> +                     dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer,
> +                      dma_addr_t len,
> +                      int is_write,
> +                      dma_addr_t access_len);
> +
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr,
> +                                 dma_addr_t len);
> +
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    if (!dev || !dev->mmu) {                                              \
> +        st##suffix##_phys(addr, val);                                     \
> +        return;                                                           \
> +    }                                                                     \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +#endif
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/3] AMD IOMMU support
  2011-02-03 23:24     ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 11:47       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:47 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, av1474, avi, paul

On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> This initializes the AMD IOMMU and creates ACPI tables for it.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/config.h   |    3 ++
>  src/pci_ids.h  |    1 +
>  src/pci_regs.h |    1 +
>  src/pciinit.c  |   29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 0 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 18830dc..fca152c 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -196,6 +196,36 @@ struct srat_memory_affinity
>      u32    reserved3[2];
>  } PACKED;
>  
> +/*
> + * IVRS (I/O Virtualization Reporting Structure) table.
> + *
> + * Describes the AMD IOMMU, as per:
> + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> + */
> +
> +struct ivrs_ivhd
> +{
> +    u8    type;
> +    u8    flags;
> +    u16   length;
> +    u16   devid;
> +    u16   capab_off;
> +    u32   iommu_base_low;
> +    u32   iommu_base_high;
> +    u16   pci_seg_group;
> +    u16   iommu_info;
> +    u32   reserved;
> +    u8    entry[0];
> +} PACKED;
> +
> +struct ivrs_table
> +{
> +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> +    u32                iv_info;
> +    u32                reserved[2];
> +    struct ivrs_ivhd   ivhd;
> +} PACKED;
> +

prefix with amd_iommu_ or amd_ then ?

>  #include "acpi-dsdt.hex"
>  
>  static void
> @@ -579,6 +609,59 @@ build_srat(void)
>      return srat;
>  }
>  
> +#define IVRS_SIGNATURE 0x53525649 // IVRS
> +#define IVRS_MAX_DEVS  32
> +static void *
> +build_ivrs(void)
> +{
> +    int iommu_bdf, iommu_cap;
> +    int bdf, max, i;
> +    struct ivrs_table *ivrs;
> +    struct ivrs_ivhd *ivhd;
> +
> +    /* Note this currently works for a single IOMMU! */

Meant to be a FIXME?
How hard is it to fix? Just stick this in a loop?

> +    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
> +    if (iommu_bdf < 0)
> +        return NULL;
> +    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
> +    if (iommu_cap < 0)
> +        return NULL;
> +
> +    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
> +    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
> +
> +    ivhd = &ivrs->ivhd;
> +    ivhd->type              = 0x10;
> +    ivhd->flags             = 0;
> +    ivhd->length            = sizeof(struct ivrs_ivhd);
> +    ivhd->devid             = iommu_bdf;
> +    ivhd->capab_off         = iommu_cap;
> +    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
> +                              0xFFFFFFFE;
> +    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
> +    ivhd->pci_seg_group     = 0;
> +    ivhd->iommu_info        = 0;
> +    ivhd->reserved          = 0;
> +
> +    i = 0;
> +    foreachpci(bdf, max) {
> +        if (bdf == ivhd->devid)
> +            continue;
> +        ivhd->entry[4 * i + 0] = 2;
> +        ivhd->entry[4 * i + 1] = bdf & 0xFF;
> +        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
> +        ivhd->entry[4 * i + 3] = ~(1 << 3);
> +        ivhd->length += 4;
> +        if (++i >= IVRS_MAX_DEVS)
> +            break;
> +    }
> +
> +    build_header((void *) ivrs, IVRS_SIGNATURE,
> +                 sizeof(struct ivrs_table) + 4 * i, 1);
> +
> +    return ivrs;
> +}
> +
>  static const struct pci_device_id acpi_find_tbl[] = {
>      /* PIIX4 Power Management device. */
>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
> @@ -625,6 +708,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_ivrs());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
>  
> diff --git a/src/config.h b/src/config.h
> index 6356941..0ba5723 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -172,6 +172,9 @@
>  #define BUILD_APIC_ADDR           0xfee00000
>  #define BUILD_IOAPIC_ADDR         0xfec00000
>  
> +#define BUILD_AMD_IOMMU_START     0xfed00000
> +#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
> +
>  #define BUILD_SMM_INIT_ADDR       0x38000
>  #define BUILD_SMM_ADDR            0xa8000
>  #define BUILD_SMM_SIZE            0x8000
> diff --git a/src/pci_ids.h b/src/pci_ids.h
> index e1cded2..3cc3c6e 100644
> --- a/src/pci_ids.h
> +++ b/src/pci_ids.h
> @@ -72,6 +72,7 @@
>  #define PCI_CLASS_SYSTEM_RTC		0x0803
>  #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>  #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>  #define PCI_CLASS_SYSTEM_OTHER		0x0880
>  
>  #define PCI_BASE_CLASS_INPUT		0x09
> diff --git a/src/pci_regs.h b/src/pci_regs.h
> index e5effd4..bfac824 100644
> --- a/src/pci_regs.h
> +++ b/src/pci_regs.h
> @@ -208,6 +208,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> diff --git a/src/pciinit.c b/src/pciinit.c
> index ee2e72d..4ebcfbe 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
>  static struct pci_region pci_bios_mem_region;
>  static struct pci_region pci_bios_prefmem_region;
>  
> +static u32 amd_iommu_addr;
> +
>  /* host irqs corresponding to PCI irqs A-D */
>  const u8 pci_irqs[4] = {
>      10, 10, 11, 11
> @@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
>      pci_set_io_region_addr(bdf, 0, 0x80800000);
>  }
>  
> +static void amd_iommu_init(u16 bdf, void *arg)
> +{
> +    int cap;
> +    u32 base_addr;
> +
> +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> +    if (cap < 0) {
> +        return;
> +    }

There actually can be multiple instances of this
capability according to spec.
Do we care?


> +
> +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> +        return;
> +    }
> +    base_addr = amd_iommu_addr;
> +    amd_iommu_addr += 0x4000;
> +
> +    pci_config_writel(bdf, cap + 0x0C, 0);
> +    pci_config_writel(bdf, cap + 0x08, 0);
> +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> +}
> +
>  static const struct pci_device_id pci_class_tbl[] = {
>      /* STORAGE IDE */
>      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
>      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
>                       pci_bios_init_device_bridge),
>  
> +    /* AMD IOMMU */

Makes sense to limit to AMD vendor id?

> +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> +                     amd_iommu_init),
> +
>      /* default */
>      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
>  
> @@ -408,6 +435,8 @@ pci_setup(void)
>      pci_region_init(&pci_bios_prefmem_region,
>                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
>  
> +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> +
>      pci_bios_init_bus();
>  
>      int bdf, max;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] AMD IOMMU support
@ 2011-02-06 11:47       ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 11:47 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> This initializes the AMD IOMMU and creates ACPI tables for it.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  src/acpi.c     |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/config.h   |    3 ++
>  src/pci_ids.h  |    1 +
>  src/pci_regs.h |    1 +
>  src/pciinit.c  |   29 +++++++++++++++++++
>  5 files changed, 118 insertions(+), 0 deletions(-)
> 
> diff --git a/src/acpi.c b/src/acpi.c
> index 18830dc..fca152c 100644
> --- a/src/acpi.c
> +++ b/src/acpi.c
> @@ -196,6 +196,36 @@ struct srat_memory_affinity
>      u32    reserved3[2];
>  } PACKED;
>  
> +/*
> + * IVRS (I/O Virtualization Reporting Structure) table.
> + *
> + * Describes the AMD IOMMU, as per:
> + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> + */
> +
> +struct ivrs_ivhd
> +{
> +    u8    type;
> +    u8    flags;
> +    u16   length;
> +    u16   devid;
> +    u16   capab_off;
> +    u32   iommu_base_low;
> +    u32   iommu_base_high;
> +    u16   pci_seg_group;
> +    u16   iommu_info;
> +    u32   reserved;
> +    u8    entry[0];
> +} PACKED;
> +
> +struct ivrs_table
> +{
> +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> +    u32                iv_info;
> +    u32                reserved[2];
> +    struct ivrs_ivhd   ivhd;
> +} PACKED;
> +

prefix with amd_iommu_ or amd_ then ?

>  #include "acpi-dsdt.hex"
>  
>  static void
> @@ -579,6 +609,59 @@ build_srat(void)
>      return srat;
>  }
>  
> +#define IVRS_SIGNATURE 0x53525649 // IVRS
> +#define IVRS_MAX_DEVS  32
> +static void *
> +build_ivrs(void)
> +{
> +    int iommu_bdf, iommu_cap;
> +    int bdf, max, i;
> +    struct ivrs_table *ivrs;
> +    struct ivrs_ivhd *ivhd;
> +
> +    /* Note this currently works for a single IOMMU! */

Meant to be a FIXME?
How hard is it to fix? Just stick this in a loop?

> +    iommu_bdf = pci_find_class(PCI_CLASS_SYSTEM_IOMMU);
> +    if (iommu_bdf < 0)
> +        return NULL;
> +    iommu_cap = pci_find_capability(iommu_bdf, PCI_CAP_ID_SEC);
> +    if (iommu_cap < 0)
> +        return NULL;
> +
> +    ivrs = malloc_high(sizeof(struct ivrs_table) + 4 * IVRS_MAX_DEVS);
> +    ivrs->iv_info = pci_config_readw(iommu_bdf, iommu_cap + 0x12) & ~0x000F;
> +
> +    ivhd = &ivrs->ivhd;
> +    ivhd->type              = 0x10;
> +    ivhd->flags             = 0;
> +    ivhd->length            = sizeof(struct ivrs_ivhd);
> +    ivhd->devid             = iommu_bdf;
> +    ivhd->capab_off         = iommu_cap;
> +    ivhd->iommu_base_low    = pci_config_readl(iommu_bdf, iommu_cap + 0x04) &
> +                              0xFFFFFFFE;
> +    ivhd->iommu_base_high   = pci_config_readl(iommu_bdf, iommu_cap + 0x08);
> +    ivhd->pci_seg_group     = 0;
> +    ivhd->iommu_info        = 0;
> +    ivhd->reserved          = 0;
> +
> +    i = 0;
> +    foreachpci(bdf, max) {
> +        if (bdf == ivhd->devid)
> +            continue;
> +        ivhd->entry[4 * i + 0] = 2;
> +        ivhd->entry[4 * i + 1] = bdf & 0xFF;
> +        ivhd->entry[4 * i + 2] = (bdf >> 8) & 0xFF;
> +        ivhd->entry[4 * i + 3] = ~(1 << 3);
> +        ivhd->length += 4;
> +        if (++i >= IVRS_MAX_DEVS)
> +            break;
> +    }
> +
> +    build_header((void *) ivrs, IVRS_SIGNATURE,
> +                 sizeof(struct ivrs_table) + 4 * i, 1);
> +
> +    return ivrs;
> +}
> +
>  static const struct pci_device_id acpi_find_tbl[] = {
>      /* PIIX4 Power Management device. */
>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
> @@ -625,6 +708,7 @@ acpi_bios_init(void)
>      ACPI_INIT_TABLE(build_madt());
>      ACPI_INIT_TABLE(build_hpet());
>      ACPI_INIT_TABLE(build_srat());
> +    ACPI_INIT_TABLE(build_ivrs());
>  
>      u16 i, external_tables = qemu_cfg_acpi_additional_tables();
>  
> diff --git a/src/config.h b/src/config.h
> index 6356941..0ba5723 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -172,6 +172,9 @@
>  #define BUILD_APIC_ADDR           0xfee00000
>  #define BUILD_IOAPIC_ADDR         0xfec00000
>  
> +#define BUILD_AMD_IOMMU_START     0xfed00000
> +#define BUILD_AMD_IOMMU_END       0xfee00000    /* BUILD_APIC_ADDR */
> +
>  #define BUILD_SMM_INIT_ADDR       0x38000
>  #define BUILD_SMM_ADDR            0xa8000
>  #define BUILD_SMM_SIZE            0x8000
> diff --git a/src/pci_ids.h b/src/pci_ids.h
> index e1cded2..3cc3c6e 100644
> --- a/src/pci_ids.h
> +++ b/src/pci_ids.h
> @@ -72,6 +72,7 @@
>  #define PCI_CLASS_SYSTEM_RTC		0x0803
>  #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
>  #define PCI_CLASS_SYSTEM_SDHCI		0x0805
> +#define PCI_CLASS_SYSTEM_IOMMU		0x0806
>  #define PCI_CLASS_SYSTEM_OTHER		0x0880
>  
>  #define PCI_BASE_CLASS_INPUT		0x09
> diff --git a/src/pci_regs.h b/src/pci_regs.h
> index e5effd4..bfac824 100644
> --- a/src/pci_regs.h
> +++ b/src/pci_regs.h
> @@ -208,6 +208,7 @@
>  #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
>  #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
>  #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
> diff --git a/src/pciinit.c b/src/pciinit.c
> index ee2e72d..4ebcfbe 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -21,6 +21,8 @@ static struct pci_region pci_bios_io_region;
>  static struct pci_region pci_bios_mem_region;
>  static struct pci_region pci_bios_prefmem_region;
>  
> +static u32 amd_iommu_addr;
> +
>  /* host irqs corresponding to PCI irqs A-D */
>  const u8 pci_irqs[4] = {
>      10, 10, 11, 11
> @@ -256,6 +258,27 @@ static void apple_macio_init(u16 bdf, void *arg)
>      pci_set_io_region_addr(bdf, 0, 0x80800000);
>  }
>  
> +static void amd_iommu_init(u16 bdf, void *arg)
> +{
> +    int cap;
> +    u32 base_addr;
> +
> +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> +    if (cap < 0) {
> +        return;
> +    }

There actually can be multiple instances of this
capability according to spec.
Do we care?


> +
> +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> +        return;
> +    }
> +    base_addr = amd_iommu_addr;
> +    amd_iommu_addr += 0x4000;
> +
> +    pci_config_writel(bdf, cap + 0x0C, 0);
> +    pci_config_writel(bdf, cap + 0x08, 0);
> +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> +}
> +
>  static const struct pci_device_id pci_class_tbl[] = {
>      /* STORAGE IDE */
>      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
>      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
>                       pci_bios_init_device_bridge),
>  
> +    /* AMD IOMMU */

Makes sense to limit to AMD vendor id?

> +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> +                     amd_iommu_init),
> +
>      /* default */
>      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
>  
> @@ -408,6 +435,8 @@ pci_setup(void)
>      pci_region_init(&pci_bios_prefmem_region,
>                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
>  
> +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> +
>      pci_bios_init_bus();
>  
>      int bdf, max;
> -- 
> 1.7.3.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/3] AMD IOMMU support
  2011-02-06 11:47       ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-02-06 13:41         ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-06 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: seabios, kevin, joro, blauwirbel, paul, avi, anthony, av1474,
	yamahata, kvm, qemu-devel

On Sun, Feb 06, 2011 at 01:47:57PM +0200, Michael S. Tsirkin wrote:
> On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:

Hi,

[snip]

> > +/*
> > + * IVRS (I/O Virtualization Reporting Structure) table.
> > + *
> > + * Describes the AMD IOMMU, as per:
> > + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> > + */
> > +
> > +struct ivrs_ivhd
> > +{
> > +    u8    type;
> > +    u8    flags;
> > +    u16   length;
> > +    u16   devid;
> > +    u16   capab_off;
> > +    u32   iommu_base_low;
> > +    u32   iommu_base_high;
> > +    u16   pci_seg_group;
> > +    u16   iommu_info;
> > +    u32   reserved;
> > +    u8    entry[0];
> > +} PACKED;
> > +
> > +struct ivrs_table
> > +{
> > +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> > +    u32                iv_info;
> > +    u32                reserved[2];
> > +    struct ivrs_ivhd   ivhd;
> > +} PACKED;
> > +
> 
> prefix with amd_iommu_ or amd_ then ?
>

This should be standard nomenclature already, even if IVRS is AMD
IOMMU-specific.

> >  #include "acpi-dsdt.hex"
> >  
> >  static void
> > @@ -579,6 +609,59 @@ build_srat(void)
> >      return srat;
> >  }
> >  
> > +#define IVRS_SIGNATURE 0x53525649 // IVRS
> > +#define IVRS_MAX_DEVS  32
> > +static void *
> > +build_ivrs(void)
> > +{
> > +    int iommu_bdf, iommu_cap;
> > +    int bdf, max, i;
> > +    struct ivrs_table *ivrs;
> > +    struct ivrs_ivhd *ivhd;
> > +
> > +    /* Note this currently works for a single IOMMU! */
> 
> Meant to be a FIXME?
> How hard is it to fix? Just stick this in a loop?
>

I suspect a real BIOS would have these values hardcoded anyway,
according to the topology of the PCI bus and which IOMMUs sit where. You
already mentioned the possibility of multiple IOMMU capabilities in the
same function/bus, in which case there's probably no easy way to guess
it from SeaBIOS.

[snip]

> > +static void amd_iommu_init(u16 bdf, void *arg)
> > +{
> > +    int cap;
> > +    u32 base_addr;
> > +
> > +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> > +    if (cap < 0) {
> > +        return;
> > +    }
> 
> There actually can be multiple instances of this
> capability according to spec.
> Do we care?
> 

Hm, perhaps we should at least assign a base address there, that's easy.
As for QEMU/KVM usage we probably don't need it. 

> > +
> > +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> > +        return;
> > +    }
> > +    base_addr = amd_iommu_addr;
> > +    amd_iommu_addr += 0x4000;
> > +
> > +    pci_config_writel(bdf, cap + 0x0C, 0);
> > +    pci_config_writel(bdf, cap + 0x08, 0);
> > +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> > +}
> > +
> >  static const struct pci_device_id pci_class_tbl[] = {
> >      /* STORAGE IDE */
> >      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> > @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
> >      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
> >                       pci_bios_init_device_bridge),
> >  
> > +    /* AMD IOMMU */
> 
> Makes sense to limit to AMD vendor id?
> 

I don't think so, I assume any PCI_CLASS_SYSTEM_IOMMU device would
implement the same specification, considering these ids have been
assigned by PCI-SIG.

> > +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> > +                     amd_iommu_init),
> > +
> >      /* default */
> >      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
> >  
> > @@ -408,6 +435,8 @@ pci_setup(void)
> >      pci_region_init(&pci_bios_prefmem_region,
> >                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
> >  
> > +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> > +
> >      pci_bios_init_bus();
> >  
> >      int bdf, max;
> > -- 
> > 1.7.3.4

Thanks for your review, I read your other comments and will resubmit
once I fix those issues.


	Eduard


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] AMD IOMMU support
@ 2011-02-06 13:41         ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 58+ messages in thread
From: Eduard - Gabriel Munteanu @ 2011-02-06 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Sun, Feb 06, 2011 at 01:47:57PM +0200, Michael S. Tsirkin wrote:
> On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:

Hi,

[snip]

> > +/*
> > + * IVRS (I/O Virtualization Reporting Structure) table.
> > + *
> > + * Describes the AMD IOMMU, as per:
> > + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> > + */
> > +
> > +struct ivrs_ivhd
> > +{
> > +    u8    type;
> > +    u8    flags;
> > +    u16   length;
> > +    u16   devid;
> > +    u16   capab_off;
> > +    u32   iommu_base_low;
> > +    u32   iommu_base_high;
> > +    u16   pci_seg_group;
> > +    u16   iommu_info;
> > +    u32   reserved;
> > +    u8    entry[0];
> > +} PACKED;
> > +
> > +struct ivrs_table
> > +{
> > +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> > +    u32                iv_info;
> > +    u32                reserved[2];
> > +    struct ivrs_ivhd   ivhd;
> > +} PACKED;
> > +
> 
> prefix with amd_iommu_ or amd_ then ?
>

This should be standard nomenclature already, even if IVRS is AMD
IOMMU-specific.

> >  #include "acpi-dsdt.hex"
> >  
> >  static void
> > @@ -579,6 +609,59 @@ build_srat(void)
> >      return srat;
> >  }
> >  
> > +#define IVRS_SIGNATURE 0x53525649 // IVRS
> > +#define IVRS_MAX_DEVS  32
> > +static void *
> > +build_ivrs(void)
> > +{
> > +    int iommu_bdf, iommu_cap;
> > +    int bdf, max, i;
> > +    struct ivrs_table *ivrs;
> > +    struct ivrs_ivhd *ivhd;
> > +
> > +    /* Note this currently works for a single IOMMU! */
> 
> Meant to be a FIXME?
> How hard is it to fix? Just stick this in a loop?
>

I suspect a real BIOS would have these values hardcoded anyway,
according to the topology of the PCI bus and which IOMMUs sit where. You
already mentioned the possibility of multiple IOMMU capabilities in the
same function/bus, in which case there's probably no easy way to guess
it from SeaBIOS.

[snip]

> > +static void amd_iommu_init(u16 bdf, void *arg)
> > +{
> > +    int cap;
> > +    u32 base_addr;
> > +
> > +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> > +    if (cap < 0) {
> > +        return;
> > +    }
> 
> There actually can be multiple instances of this
> capability according to spec.
> Do we care?
> 

Hm, perhaps we should at least assign a base address there, that's easy.
As for QEMU/KVM usage we probably don't need it. 

> > +
> > +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> > +        return;
> > +    }
> > +    base_addr = amd_iommu_addr;
> > +    amd_iommu_addr += 0x4000;
> > +
> > +    pci_config_writel(bdf, cap + 0x0C, 0);
> > +    pci_config_writel(bdf, cap + 0x08, 0);
> > +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> > +}
> > +
> >  static const struct pci_device_id pci_class_tbl[] = {
> >      /* STORAGE IDE */
> >      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> > @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
> >      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
> >                       pci_bios_init_device_bridge),
> >  
> > +    /* AMD IOMMU */
> 
> Makes sense to limit to AMD vendor id?
> 

I don't think so, I assume any PCI_CLASS_SYSTEM_IOMMU device would
implement the same specification, considering these ids have been
assigned by PCI-SIG.

> > +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> > +                     amd_iommu_init),
> > +
> >      /* default */
> >      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
> >  
> > @@ -408,6 +435,8 @@ pci_setup(void)
> >      pci_region_init(&pci_bios_prefmem_region,
> >                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
> >  
> > +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> > +
> >      pci_bios_init_bus();
> >  
> >      int bdf, max;
> > -- 
> > 1.7.3.4

Thanks for your review, I read your other comments and will resubmit
once I fix those issues.


	Eduard

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/3] AMD IOMMU support
  2011-02-06 13:41         ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2011-02-06 15:22           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 15:22 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, av1474, avi, paul

On Sun, Feb 06, 2011 at 03:41:45PM +0200, Eduard - Gabriel Munteanu wrote:
> On Sun, Feb 06, 2011 at 01:47:57PM +0200, Michael S. Tsirkin wrote:
> > On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> 
> Hi,
> 
> [snip]
> 
> > > +/*
> > > + * IVRS (I/O Virtualization Reporting Structure) table.
> > > + *
> > > + * Describes the AMD IOMMU, as per:
> > > + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> > > + */
> > > +
> > > +struct ivrs_ivhd
> > > +{
> > > +    u8    type;
> > > +    u8    flags;
> > > +    u16   length;
> > > +    u16   devid;
> > > +    u16   capab_off;
> > > +    u32   iommu_base_low;
> > > +    u32   iommu_base_high;
> > > +    u16   pci_seg_group;
> > > +    u16   iommu_info;
> > > +    u32   reserved;
> > > +    u8    entry[0];
> > > +} PACKED;
> > > +
> > > +struct ivrs_table
> > > +{
> > > +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> > > +    u32                iv_info;
> > > +    u32                reserved[2];
> > > +    struct ivrs_ivhd   ivhd;
> > > +} PACKED;
> > > +
> > 
> > prefix with amd_iommu_ or amd_ then ?
> >
> 
> This should be standard nomenclature already, even if IVRS is AMD
> IOMMU-specific.

Yes but the specific structure is amd specific, isn't it?

> > >  #include "acpi-dsdt.hex"
> > >  
> > >  static void
> > > @@ -579,6 +609,59 @@ build_srat(void)
> > >      return srat;
> > >  }
> > >  
> > > +#define IVRS_SIGNATURE 0x53525649 // IVRS
> > > +#define IVRS_MAX_DEVS  32
> > > +static void *
> > > +build_ivrs(void)
> > > +{
> > > +    int iommu_bdf, iommu_cap;
> > > +    int bdf, max, i;
> > > +    struct ivrs_table *ivrs;
> > > +    struct ivrs_ivhd *ivhd;
> > > +
> > > +    /* Note this currently works for a single IOMMU! */
> > 
> > Meant to be a FIXME?
> > How hard is it to fix? Just stick this in a loop?
> >
> 
> I suspect a real BIOS would have these values hardcoded anyway,
> according to the topology of the PCI bus and which IOMMUs sit where.

Which values exactly?

> You
> already mentioned the possibility of multiple IOMMU capabilities in the
> same function/bus, in which case there's probably no easy way to guess
> it from SeaBIOS.

It's easy enough to enumerate capabilities and pci devices, isn't it?

> [snip]
> 
> > > +static void amd_iommu_init(u16 bdf, void *arg)
> > > +{
> > > +    int cap;
> > > +    u32 base_addr;
> > > +
> > > +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> > > +    if (cap < 0) {
> > > +        return;
> > > +    }
> > 
> > There actually can be multiple instances of this
> > capability according to spec.
> > Do we care?
> > 
> 
> Hm, perhaps we should at least assign a base address there, that's easy.
> As for QEMU/KVM usage we probably don't need it. 

I expect assigning multiple domains will be useful.
I'm guessing multiple devices is what systems have
in this case? If so I'm not really sure why is there need
for multiple iommu capabilities per device.

> > > +
> > > +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> > > +        return;
> > > +    }
> > > +    base_addr = amd_iommu_addr;
> > > +    amd_iommu_addr += 0x4000;
> > > +
> > > +    pci_config_writel(bdf, cap + 0x0C, 0);
> > > +    pci_config_writel(bdf, cap + 0x08, 0);
> > > +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> > > +}
> > > +
> > >  static const struct pci_device_id pci_class_tbl[] = {
> > >      /* STORAGE IDE */
> > >      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> > > @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
> > >      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
> > >                       pci_bios_init_device_bridge),
> > >  
> > > +    /* AMD IOMMU */
> > 
> > Makes sense to limit to AMD vendor id?
> > 
> 
> I don't think so, I assume any PCI_CLASS_SYSTEM_IOMMU device would
> implement the same specification, considering these ids have been
> assigned by PCI-SIG.

This hasn't been the case in the past, e.g. with
PCI_CLASS_NETWORK_ETHERNET, so I see no reason to assume
it here.


> > > +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> > > +                     amd_iommu_init),
> > > +
> > >      /* default */
> > >      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
> > >  
> > > @@ -408,6 +435,8 @@ pci_setup(void)
> > >      pci_region_init(&pci_bios_prefmem_region,
> > >                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
> > >  
> > > +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> > > +
> > >      pci_bios_init_bus();
> > >  
> > >      int bdf, max;
> > > -- 
> > > 1.7.3.4
> 
> Thanks for your review, I read your other comments and will resubmit
> once I fix those issues.
> 
> 
> 	Eduard

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] AMD IOMMU support
@ 2011-02-06 15:22           ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2011-02-06 15:22 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, joro, seabios, qemu-devel, blauwirbel, yamahata, kevin, avi, paul

On Sun, Feb 06, 2011 at 03:41:45PM +0200, Eduard - Gabriel Munteanu wrote:
> On Sun, Feb 06, 2011 at 01:47:57PM +0200, Michael S. Tsirkin wrote:
> > On Fri, Feb 04, 2011 at 01:24:14AM +0200, Eduard - Gabriel Munteanu wrote:
> 
> Hi,
> 
> [snip]
> 
> > > +/*
> > > + * IVRS (I/O Virtualization Reporting Structure) table.
> > > + *
> > > + * Describes the AMD IOMMU, as per:
> > > + * "AMD I/O Virtualization Technology (IOMMU) Specification", rev 1.26
> > > + */
> > > +
> > > +struct ivrs_ivhd
> > > +{
> > > +    u8    type;
> > > +    u8    flags;
> > > +    u16   length;
> > > +    u16   devid;
> > > +    u16   capab_off;
> > > +    u32   iommu_base_low;
> > > +    u32   iommu_base_high;
> > > +    u16   pci_seg_group;
> > > +    u16   iommu_info;
> > > +    u32   reserved;
> > > +    u8    entry[0];
> > > +} PACKED;
> > > +
> > > +struct ivrs_table
> > > +{
> > > +    ACPI_TABLE_HEADER_DEF    /* ACPI common table header. */
> > > +    u32                iv_info;
> > > +    u32                reserved[2];
> > > +    struct ivrs_ivhd   ivhd;
> > > +} PACKED;
> > > +
> > 
> > prefix with amd_iommu_ or amd_ then ?
> >
> 
> This should be standard nomenclature already, even if IVRS is AMD
> IOMMU-specific.

Yes but the specific structure is amd specific, isn't it?

> > >  #include "acpi-dsdt.hex"
> > >  
> > >  static void
> > > @@ -579,6 +609,59 @@ build_srat(void)
> > >      return srat;
> > >  }
> > >  
> > > +#define IVRS_SIGNATURE 0x53525649 // IVRS
> > > +#define IVRS_MAX_DEVS  32
> > > +static void *
> > > +build_ivrs(void)
> > > +{
> > > +    int iommu_bdf, iommu_cap;
> > > +    int bdf, max, i;
> > > +    struct ivrs_table *ivrs;
> > > +    struct ivrs_ivhd *ivhd;
> > > +
> > > +    /* Note this currently works for a single IOMMU! */
> > 
> > Meant to be a FIXME?
> > How hard is it to fix? Just stick this in a loop?
> >
> 
> I suspect a real BIOS would have these values hardcoded anyway,
> according to the topology of the PCI bus and which IOMMUs sit where.

Which values exactly?

> You
> already mentioned the possibility of multiple IOMMU capabilities in the
> same function/bus, in which case there's probably no easy way to guess
> it from SeaBIOS.

It's easy enough to enumerate capabilities and pci devices, isn't it?

> [snip]
> 
> > > +static void amd_iommu_init(u16 bdf, void *arg)
> > > +{
> > > +    int cap;
> > > +    u32 base_addr;
> > > +
> > > +    cap = pci_find_capability(bdf, PCI_CAP_ID_SEC);
> > > +    if (cap < 0) {
> > > +        return;
> > > +    }
> > 
> > There actually can be multiple instances of this
> > capability according to spec.
> > Do we care?
> > 
> 
> Hm, perhaps we should at least assign a base address there, that's easy.
> As for QEMU/KVM usage we probably don't need it. 

I expect assigning multiple domains will be useful.
I'm guessing multiple devices is what systems have
in this case? If so I'm not really sure why is there need
for multiple iommu capabilities per device.

> > > +
> > > +    if (amd_iommu_addr >= BUILD_AMD_IOMMU_END) {
> > > +        return;
> > > +    }
> > > +    base_addr = amd_iommu_addr;
> > > +    amd_iommu_addr += 0x4000;
> > > +
> > > +    pci_config_writel(bdf, cap + 0x0C, 0);
> > > +    pci_config_writel(bdf, cap + 0x08, 0);
> > > +    pci_config_writel(bdf, cap + 0x04, base_addr | 1);
> > > +}
> > > +
> > >  static const struct pci_device_id pci_class_tbl[] = {
> > >      /* STORAGE IDE */
> > >      PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_1,
> > > @@ -279,6 +302,10 @@ static const struct pci_device_id pci_class_tbl[] = {
> > >      PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI,
> > >                       pci_bios_init_device_bridge),
> > >  
> > > +    /* AMD IOMMU */
> > 
> > Makes sense to limit to AMD vendor id?
> > 
> 
> I don't think so, I assume any PCI_CLASS_SYSTEM_IOMMU device would
> implement the same specification, considering these ids have been
> assigned by PCI-SIG.

This hasn't been the case in the past, e.g. with
PCI_CLASS_NETWORK_ETHERNET, so I see no reason to assume
it here.


> > > +    PCI_DEVICE_CLASS(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SYSTEM_IOMMU,
> > > +                     amd_iommu_init),
> > > +
> > >      /* default */
> > >      PCI_DEVICE(PCI_ANY_ID, PCI_ANY_ID, pci_bios_allocate_regions),
> > >  
> > > @@ -408,6 +435,8 @@ pci_setup(void)
> > >      pci_region_init(&pci_bios_prefmem_region,
> > >                      BUILD_PCIPREFMEM_START, BUILD_PCIPREFMEM_END - 1);
> > >  
> > > +    amd_iommu_addr = BUILD_AMD_IOMMU_START;
> > > +
> > >      pci_bios_init_bus();
> > >  
> > >      int bdf, max;
> > > -- 
> > > 1.7.3.4
> 
> Thanks for your review, I read your other comments and will resubmit
> once I fix those issues.
> 
> 
> 	Eduard

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2011-02-06 15:23 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-29 17:40 [PATCH 00/13] AMD IOMMU emulation patchset Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 01/13] Generic DMA memory access interface Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-05 10:20   ` Blue Swirl
2011-02-05 10:20     ` [Qemu-devel] " Blue Swirl
2011-02-06 11:13   ` Michael S. Tsirkin
2011-02-06 11:13     ` [Qemu-devel] " Michael S. Tsirkin
2011-02-06 11:16   ` Michael S. Tsirkin
2011-02-06 11:16     ` [Qemu-devel] " Michael S. Tsirkin
2011-01-29 17:40 ` [PATCH 02/13] pci: add IOMMU support via the generic DMA layer Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 03/13] AMD IOMMU emulation Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-06 10:54   ` Michael S. Tsirkin
2011-02-06 10:54     ` [Qemu-devel] " Michael S. Tsirkin
2011-01-29 17:40 ` [PATCH 04/13] ide: use the DMA memory access interface for PCI IDE controllers Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-06 11:14   ` Michael S. Tsirkin
2011-02-06 11:14     ` [Qemu-devel] " Michael S. Tsirkin
2011-01-29 17:40 ` [PATCH 05/13] rtl8139: use the DMA memory access interface Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 06/13] eepro100: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 07/13] ac97: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 08/13] es1370: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 09/13] e1000: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 10/13] lsi53c895a: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 11/13] pcnet: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 12/13] usb-uhci: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 17:40 ` [PATCH 13/13] usb-ohci: " Eduard - Gabriel Munteanu
2011-01-29 17:40   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-01-29 20:19 ` [PATCH 00/13] AMD IOMMU emulation patchset malc
2011-01-29 20:19   ` [Qemu-devel] " malc
2011-02-03 23:24 ` [PATCH 0/3] SeaBIOS AMD IOMMU initialization patches Eduard - Gabriel Munteanu
2011-02-03 23:24   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-03 23:24   ` [PATCH 1/3] pci: add pci_find_capability() helper Eduard - Gabriel Munteanu
2011-02-03 23:24     ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-03 23:24   ` [PATCH 2/3] AMD IOMMU support Eduard - Gabriel Munteanu
2011-02-03 23:24     ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-04  2:37     ` Isaku Yamahata
2011-02-04  2:37       ` [Qemu-devel] " Isaku Yamahata
2011-02-06 11:47     ` Michael S. Tsirkin
2011-02-06 11:47       ` [Qemu-devel] " Michael S. Tsirkin
2011-02-06 13:41       ` Eduard - Gabriel Munteanu
2011-02-06 13:41         ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-06 15:22         ` Michael S. Tsirkin
2011-02-06 15:22           ` [Qemu-devel] " Michael S. Tsirkin
2011-02-03 23:24   ` [PATCH 3/3] Clarify address space layout Eduard - Gabriel Munteanu
2011-02-03 23:24     ` [Qemu-devel] " Eduard - Gabriel Munteanu
2011-02-05 13:07 ` [PATCH 00/13] AMD IOMMU emulation patchset (reworked cc/to) Blue Swirl
2011-02-05 13:07   ` [Qemu-devel] " Blue Swirl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.