All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] AMD IOMMU emulation patches v3
@ 2010-08-15 19:27 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

Hi,

Please have a look at these and merge if you wish. I hope I've addressed the
issues people have raised.

Some changes from the previous RFC:
- included and updated the other two device patches
- moved map registration and invalidation management into PCI code
- AMD IOMMU emulation is always enabled (no more configure options)
- cleaned up code, I now use typedefs as suggested
- event logging cleanups

BTW, the change to pci_regs.h is properly aligned but the original file contains
tabs.


        Cheers,
        Eduard

Eduard - Gabriel Munteanu (7):
  pci: add range_covers_range()
  pci: memory access API and IOMMU support
  AMD IOMMU emulation
  ide: use the PCI memory access interface
  rtl8139: use the PCI memory access interface
  eepro100: use the PCI memory access interface
  ac97: use the PCI memory access interface

 Makefile.target   |    2 +
 dma-helpers.c     |   46 ++++-
 dma.h             |   21 ++-
 hw/ac97.c         |    6 +-
 hw/amd_iommu.c    |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/eepro100.c     |   78 ++++---
 hw/ide/core.c     |   15 +-
 hw/ide/internal.h |   39 +++
 hw/ide/pci.c      |    7 +
 hw/pc.c           |    2 +
 hw/pci.c          |  197 +++++++++++++++-
 hw/pci.h          |   84 +++++++
 hw/pci_ids.h      |    2 +
 hw/pci_regs.h     |    1 +
 hw/rtl8139.c      |   99 +++++----
 qemu-common.h     |    1 +
 16 files changed, 1191 insertions(+), 97 deletions(-)
 create mode 100644 hw/amd_iommu.c


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 0/7] AMD IOMMU emulation patches v3
@ 2010-08-15 19:27 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

Hi,

Please have a look at these and merge if you wish. I hope I've addressed the
issues people have raised.

Some changes from the previous RFC:
- included and updated the other two device patches
- moved map registration and invalidation management into PCI code
- AMD IOMMU emulation is always enabled (no more configure options)
- cleaned up code, I now use typedefs as suggested
- event logging cleanups

BTW, the change to pci_regs.h is properly aligned but the original file contains
tabs.


        Cheers,
        Eduard

Eduard - Gabriel Munteanu (7):
  pci: add range_covers_range()
  pci: memory access API and IOMMU support
  AMD IOMMU emulation
  ide: use the PCI memory access interface
  rtl8139: use the PCI memory access interface
  eepro100: use the PCI memory access interface
  ac97: use the PCI memory access interface

 Makefile.target   |    2 +
 dma-helpers.c     |   46 ++++-
 dma.h             |   21 ++-
 hw/ac97.c         |    6 +-
 hw/amd_iommu.c    |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/eepro100.c     |   78 ++++---
 hw/ide/core.c     |   15 +-
 hw/ide/internal.h |   39 +++
 hw/ide/pci.c      |    7 +
 hw/pc.c           |    2 +
 hw/pci.c          |  197 +++++++++++++++-
 hw/pci.h          |   84 +++++++
 hw/pci_ids.h      |    2 +
 hw/pci_regs.h     |    1 +
 hw/rtl8139.c      |   99 +++++----
 qemu-common.h     |    1 +
 16 files changed, 1191 insertions(+), 97 deletions(-)
 create mode 100644 hw/amd_iommu.c

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/7] pci: add range_covers_range()
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

This helper function allows map invalidation code to determine which
maps must be invalidated.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 4bd8a1a..5a6cdb5 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -419,6 +419,16 @@ static inline int range_covers_byte(uint64_t offset, uint64_t len,
     return offset <= byte && byte <= range_get_last(offset, len);
 }
 
+/* Check whether a given range completely covers another. */
+static inline int range_covers_range(uint64_t first_big, uint64_t len_big,
+                                     uint64_t first_small, uint64_t len_small)
+{
+    uint64_t last_big = range_get_last(first_big, len_big);
+    uint64_t last_small = range_get_last(first_small, len_small);
+
+    return first_big <= first_small && last_small <= last_big;
+}
+
 /* Check whether 2 given ranges overlap.
  * Undefined if ranges that wrap around 0. */
 static inline int ranges_overlap(uint64_t first1, uint64_t len1,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 1/7] pci: add range_covers_range()
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

This helper function allows map invalidation code to determine which
maps must be invalidated.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 4bd8a1a..5a6cdb5 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -419,6 +419,16 @@ static inline int range_covers_byte(uint64_t offset, uint64_t len,
     return offset <= byte && byte <= range_get_last(offset, len);
 }
 
+/* Check whether a given range completely covers another. */
+static inline int range_covers_range(uint64_t first_big, uint64_t len_big,
+                                     uint64_t first_small, uint64_t len_small)
+{
+    uint64_t last_big = range_get_last(first_big, len_big);
+    uint64_t last_small = range_get_last(first_small, len_small);
+
+    return first_big <= first_small && last_small <= last_big;
+}
+
 /* Check whether 2 given ranges overlap.
  * Undefined if ranges that wrap around 0. */
 static inline int ranges_overlap(uint64_t first1, uint64_t len1,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c      |  197 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h      |   74 +++++++++++++++++++++
 qemu-common.h |    1 +
 3 files changed, 271 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6871728..8668e06 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -58,6 +58,18 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
+};
+
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
 };
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
@@ -166,6 +178,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -227,7 +252,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
                          const char *name, int devfn_min)
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min              = devfn_min;
+    bus->iommu                  = NULL;
+    bus->translate              = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -2029,6 +2057,173 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
     }
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return;
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (range_covers_range(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err)
+        return NULL;
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len)
+        *len = plen;
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb)
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)                  
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
+
 static PCIDeviceInfo bridge_info = {
     .qdev.name    = "pci-bridge",
     .qdev.size    = sizeof(PCIBridge),
diff --git a/hw/pci.h b/hw/pci.h
index 5a6cdb5..a62bc8e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -203,6 +203,8 @@ struct PCIDevice {
         PCICapConfigReadFunc *config_read;
         PCICapConfigWriteFunc *config_write;
     } cap;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -440,4 +442,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+extern void pci_memory_rw(PCIDevice *dev,
+                          pcibus_t addr,
+                          uint8_t *buf,
+                          pcibus_t len,
+                          int is_write);
+extern void *pci_memory_map(PCIDevice *dev,
+                            PCIInvalidateMapFunc *cb,
+                            void *opaque,
+                            pcibus_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+extern void pci_memory_unmap(PCIDevice *dev,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+extern void pci_register_iommu(PCIDevice *dev,
+                               PCITranslateFunc *translate);
+extern void pci_memory_invalidate_range(PCIDevice *dev,
+                                        pcibus_t addr,
+                                        pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+extern void pci_st##suffix(PCIDevice *dev,                              \
+                           pcibus_t addr,                               \
+                           uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)                  
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index 3fb2f0b..40c6d58 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c      |  197 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h      |   74 +++++++++++++++++++++
 qemu-common.h |    1 +
 3 files changed, 271 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6871728..8668e06 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -58,6 +58,18 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
+};
+
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
 };
 
 static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
@@ -166,6 +178,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -227,7 +252,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
                          const char *name, int devfn_min)
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min              = devfn_min;
+    bus->iommu                  = NULL;
+    bus->translate              = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -2029,6 +2057,173 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
     }
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return;
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (range_covers_range(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err)
+        return NULL;
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len)
+        *len = plen;
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb)
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)                  
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
+
 static PCIDeviceInfo bridge_info = {
     .qdev.name    = "pci-bridge",
     .qdev.size    = sizeof(PCIBridge),
diff --git a/hw/pci.h b/hw/pci.h
index 5a6cdb5..a62bc8e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -203,6 +203,8 @@ struct PCIDevice {
         PCICapConfigReadFunc *config_read;
         PCICapConfigWriteFunc *config_write;
     } cap;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -440,4 +442,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+extern void pci_memory_rw(PCIDevice *dev,
+                          pcibus_t addr,
+                          uint8_t *buf,
+                          pcibus_t len,
+                          int is_write);
+extern void *pci_memory_map(PCIDevice *dev,
+                            PCIInvalidateMapFunc *cb,
+                            void *opaque,
+                            pcibus_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+extern void pci_memory_unmap(PCIDevice *dev,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+extern void pci_register_iommu(PCIDevice *dev,
+                               PCITranslateFunc *translate);
+extern void pci_memory_invalidate_range(PCIDevice *dev,
+                                        pcibus_t addr,
+                                        pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+extern void pci_st##suffix(PCIDevice *dev,                              \
+                           pcibus_t addr,                               \
+                           uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)                  
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index 3fb2f0b..40c6d58 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct SerialState SerialState;
 typedef struct IRQState *qemu_irq;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/7] AMD IOMMU emulation
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +
 hw/amd_iommu.c  |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 695 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 70a9c1b..6b80a37 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-y += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..2e20888
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,688 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
+    if (dev) {
+        pci_memory_invalidate_range(dev, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    if (st->cmdbuf_head == st->cmdbuf_tail) {
+        return;
+    }
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    /* Increment and wrap head pointer. */
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    if (st->cmdbuf_head >= st->cmdbuf_len) {
+        st->cmdbuf_head = 0;
+    }
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = *base;
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_init_mmio(AMDIOMMUState *st)
+{
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
+    st->mmio_enabled = 1;
+    amd_iommu_init_mmio(st);
+}
+
+static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
+                                     uint32_t addr, int len)
+{
+    return pci_default_cap_read_config(pci_dev, addr, len);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st;
+    unsigned char *capab;
+    int reg;
+
+    st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    capab = st->capab;
+    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
+
+    switch (reg) {
+        case CAPAB_HEADER:
+        case CAPAB_MISC:
+            /* Read-only. */
+            return;
+        case CAPAB_BAR_LOW:
+        case CAPAB_BAR_HIGH:
+        case CAPAB_RANGE:
+            if (st->mmio_enabled)
+                return;
+            pci_default_cap_write_config(dev, addr, val, len);
+            break;
+        default:
+            return;
+    }
+
+    if (capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static int amd_iommu_init_capab(PCIDevice *dev)
+{
+    AMDIOMMUState *st;
+    unsigned char *capab;
+
+    st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    capab = st->dev.config + st->capab_offset;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    st->capab = capab;
+    st->dev.cap.length = CAPAB_SIZE;
+
+    return 0;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(PCIDevice *iommu,
+                               PCIDevice *dev,
+                               pcibus_t addr,
+                               target_phys_addr_t *paddr,
+                               target_phys_addr_t *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /* Get device table entry. */
+    devfn = dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    int err;
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    err = pci_enable_capability_support(&st->dev, st->capab_offset,
+                                        amd_iommu_read_capab,
+                                        amd_iommu_write_capab,
+                                        amd_iommu_init_capab);
+    if (err) {
+        return err;
+    }
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
+
diff --git a/hw/pc.c b/hw/pc.c
index 186e322..4616318 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1066,6 +1066,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 1c675dc..0fb942b 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -216,6 +216,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +
 hw/amd_iommu.c  |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 5 files changed, 695 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 70a9c1b..6b80a37 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-y += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..2e20888
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,688 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "qlist.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+/* Event codes and flags, as stored in the info field */
+#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
+#define EVENT_IOPF                  (0x2U << 24)
+#define   EVENT_IOPF_I              (1U << 3)
+#define   EVENT_IOPF_PR             (1U << 4)
+#define   EVENT_IOPF_RW             (1U << 5)
+#define   EVENT_IOPF_PE             (1U << 6)
+#define   EVENT_IOPF_RZ             (1U << 7)
+#define   EVENT_IOPF_TR             (1U << 8)
+#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
+#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
+#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
+#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
+#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
+#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
+
+#define EVENT_LEN                   16
+
+typedef struct AMDIOMMUState {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    target_phys_addr_t          evtlog_len;
+    target_phys_addr_t          evtlog_head;
+    target_phys_addr_t          evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+} AMDIOMMUState;
+
+typedef struct AMDIOMMUEvent {
+    uint16_t    devfn;
+    uint16_t    reserved;
+    uint16_t    domid;
+    uint16_t    info;
+    uint64_t    addr;
+} __attribute__((packed)) AMDIOMMUEvent;
+
+static void amd_iommu_completion_wait(AMDIOMMUState *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
+                                       uint8_t *cmd)
+{
+    PCIDevice *dev;
+    PCIBus *bus = st->dev.bus;
+    int bus_num = pci_bus_num(bus);
+    int devfn = *(uint16_t *) cmd;
+
+    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
+    if (dev) {
+        pci_memory_invalidate_range(dev, 0, -1);
+    }
+}
+
+static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled) {
+        return;
+    }
+
+    /* Check if there's work to do. */
+    if (st->cmdbuf_head == st->cmdbuf_tail) {
+        return;
+    }
+
+    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            amd_iommu_invalidate_iotlb(st, cmd);
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    /* Increment and wrap head pointer. */
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+    if (st->cmdbuf_head >= st->cmdbuf_len) {
+        st->cmdbuf_head = 0;
+    }
+}
+
+static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size) {
+        return 0;
+    }
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(AMDIOMMUState *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = *base;
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = st->enabled &&
+                                       !!(val & MMIO_CONTROL_CMDBUFEN);
+            
+            /* Update status flags depending on the control register. */
+            if (st->cmdbuf_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
+            }
+            if (st->evtlog_enabled) {
+                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+            } else {
+                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
+            }
+
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    AMDIOMMUState *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    AMDIOMMUState *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_init_mmio(AMDIOMMUState *st)
+{
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_enable_mmio(AMDIOMMUState *st)
+{
+    target_phys_addr_t addr;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0) {
+        return;
+    }
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
+    st->mmio_enabled = 1;
+    amd_iommu_init_mmio(st);
+}
+
+static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
+                                     uint32_t addr, int len)
+{
+    return pci_default_cap_read_config(pci_dev, addr, len);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    AMDIOMMUState *st;
+    unsigned char *capab;
+    int reg;
+
+    st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    capab = st->capab;
+    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
+
+    switch (reg) {
+        case CAPAB_HEADER:
+        case CAPAB_MISC:
+            /* Read-only. */
+            return;
+        case CAPAB_BAR_LOW:
+        case CAPAB_BAR_HIGH:
+        case CAPAB_RANGE:
+            if (st->mmio_enabled)
+                return;
+            pci_default_cap_write_config(dev, addr, val, len);
+            break;
+        default:
+            return;
+    }
+
+    if (capab[CAPAB_BAR_LOW] & 0x1) {
+        amd_iommu_enable_mmio(st);
+    }
+}
+
+static int amd_iommu_init_capab(PCIDevice *dev)
+{
+    AMDIOMMUState *st;
+    unsigned char *capab;
+
+    st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    capab = st->dev.config + st->capab_offset;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    st->capab = capab;
+    st->dev.cap.length = CAPAB_SIZE;
+
+    return 0;
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
+{
+    if (!st->evtlog_enabled ||
+        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
+        return;
+    }
+
+    if (st->evtlog_tail >= st->evtlog_len) {
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
+    }
+
+    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
+                              (uint8_t *) evt, EVENT_LEN);
+
+    st->evtlog_tail += EVENT_LEN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
+}
+
+static void amd_iommu_page_fault(AMDIOMMUState *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    AMDIOMMUEvent evt;
+    unsigned info;
+
+    evt.devfn = cpu_to_le16(devfn);
+    evt.reserved = 0;
+    evt.domid = cpu_to_le16(domid);
+    evt.addr = cpu_to_le64(addr);
+
+    info = EVENT_IOPF;
+    if (present) {
+        info |= EVENT_IOPF_PR;
+    }
+    if (is_write) {
+        info |= EVENT_IOPF_RW;
+    }
+    evt.info = cpu_to_le16(info);
+
+    amd_iommu_log_event(st, &evt);
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(PCIDevice *iommu,
+                               PCIDevice *dev,
+                               pcibus_t addr,
+                               target_phys_addr_t *paddr,
+                               target_phys_addr_t *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
+
+    if (!st->enabled) {
+        goto no_translation;
+    }
+
+    /* Get device table entry. */
+    devfn = dev->devfn;
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    *paddr = addr;
+    *len = -1;
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
+    int err;
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    err = pci_enable_capability_support(&st->dev, st->capab_offset,
+                                        amd_iommu_read_capab,
+                                        amd_iommu_write_capab,
+                                        amd_iommu_init_capab);
+    if (err) {
+        return err;
+    }
+
+    pci_register_iommu(dev, amd_iommu_translate);
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(AMDIOMMUState),
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
+
diff --git a/hw/pc.c b/hw/pc.c
index 186e322..4616318 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1066,6 +1066,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    pci_create_simple(bus, -1, "amd-iommu");
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 1c675dc..0fb942b 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -216,6 +216,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC		0x0F	/* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/7] ide: use the PCI memory access interface
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
 dma.h             |   21 ++++++++++++++++++++-
 hw/ide/core.c     |   15 ++++++++-------
 hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
 hw/ide/pci.c      |    7 +++++++
 5 files changed, 115 insertions(+), 13 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index d4fc077..9c3a21a 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,36 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+static void *qemu_sglist_default_map(void *opaque,
+                                     QEMUSGInvalMapFunc *inval_cb,
+                                     void *inval_opaque,
+                                     target_phys_addr_t addr,
+                                     target_phys_addr_t *len,
+                                     int is_write)
+{
+    return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static void qemu_sglist_default_unmap(void *opaque,
+                                      void *buffer,
+                                      target_phys_addr_t len,
+                                      int is_write,
+                                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+
+    qsg->map = map ? map : qemu_sglist_default_map;
+    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
+    qsg->opaque = opaque;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dbs->sg->unmap(dbs->sg->opaque,
+                       dbs->iov.iov[i].iov_base,
+                       dbs->iov.iov[i].iov_len, !dbs->is_write,
+                       dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
+                           cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..d48f35c 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,19 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef void QEMUSGInvalMapFunc(void *opaque);
+typedef void *QEMUSGMapFunc(void *opaque,
+                            QEMUSGInvalMapFunc *inval_cb,
+                            void *inval_opaque,
+                            target_phys_addr_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+typedef void QEMUSGUnmapFunc(void *opaque,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+
 typedef struct {
     target_phys_addr_t base;
     target_phys_addr_t len;
@@ -25,9 +38,15 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+
+    QEMUSGMapFunc *map;
+    QEMUSGUnmapFunc *unmap;
+    void *opaque;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
+                      void *opaque);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b3b7c2..c19013a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -435,7 +435,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
+                     bm->map, bm->unmap, bm->opaque);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -443,7 +444,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -526,7 +527,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -541,11 +542,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_write(bm, bm->cur_prd_addr,
+                                   s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_read(bm, bm->cur_prd_addr,
+                                  s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index eef1ee1..0f3b707 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -476,6 +476,24 @@ struct IDEDeviceInfo {
 #define BM_CMD_START     0x01
 #define BM_CMD_READ      0x08
 
+typedef void BMDMAInvalMapFunc(void *opaque);
+typedef void BMDMARWFunc(void *opaque,
+                         target_phys_addr_t addr,
+                         uint8_t *buf,
+                         target_phys_addr_t len,
+                         int is_write);
+typedef void *BMDMAMapFunc(void *opaque,
+                           BMDMAInvalMapFunc *inval_cb,
+                           void *inval_opaque,
+                           target_phys_addr_t addr,
+                           target_phys_addr_t *len,
+                           int is_write);
+typedef void BMDMAUnmapFunc(void *opaque,
+                            void *buffer,
+                            target_phys_addr_t len,
+                            int is_write,
+                            target_phys_addr_t access_len);
+
 struct BMDMAState {
     uint8_t cmd;
     uint8_t status;
@@ -495,8 +513,29 @@ struct BMDMAState {
     int64_t sector_num;
     uint32_t nsector;
     QEMUBH *bh;
+
+    BMDMARWFunc *rw;
+    BMDMAMapFunc *map;
+    BMDMAUnmapFunc *unmap;
+    void *opaque;
 };
 
+static inline void bmdma_memory_read(BMDMAState *bm,
+                                     target_phys_addr_t addr,
+                                     uint8_t *buf,
+                                     target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 0);
+}
+
+static inline void bmdma_memory_write(BMDMAState *bm,
+                                      target_phys_addr_t addr,
+                                      uint8_t *buf,
+                                      target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 1);
+}
+
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
     return bus->ifs + bus->unit;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4d95cc5..5879044 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    for (i = 0; i < 2; i++) {
+        d->bmdma[i].rw = (void *) pci_memory_rw;
+        d->bmdma[i].map = (void *) pci_memory_map;
+        d->bmdma[i].unmap = (void *) pci_memory_unmap;
+        d->bmdma[i].opaque = dev;
+    }
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 4/7] ide: use the PCI memory access interface
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

Emulated PCI IDE controllers now use the memory access interface. This
also allows an emulated IOMMU to translate and check accesses.

Map invalidation results in cancelling DMA transfers. Since the guest OS
can't properly recover the DMA results in case the mapping is changed,
this is a fairly good approximation.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 dma-helpers.c     |   46 +++++++++++++++++++++++++++++++++++++++++-----
 dma.h             |   21 ++++++++++++++++++++-
 hw/ide/core.c     |   15 ++++++++-------
 hw/ide/internal.h |   39 +++++++++++++++++++++++++++++++++++++++
 hw/ide/pci.c      |    7 +++++++
 5 files changed, 115 insertions(+), 13 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index d4fc077..9c3a21a 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,12 +10,36 @@
 #include "dma.h"
 #include "block_int.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+static void *qemu_sglist_default_map(void *opaque,
+                                     QEMUSGInvalMapFunc *inval_cb,
+                                     void *inval_opaque,
+                                     target_phys_addr_t addr,
+                                     target_phys_addr_t *len,
+                                     int is_write)
+{
+    return cpu_physical_memory_map(addr, len, is_write);
+}
+
+static void qemu_sglist_default_unmap(void *opaque,
+                                      void *buffer,
+                                      target_phys_addr_t len,
+                                      int is_write,
+                                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+}
+
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap, void *opaque)
 {
     qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;
+
+    qsg->map = map ? map : qemu_sglist_default_map;
+    qsg->unmap = unmap ? unmap : qemu_sglist_default_unmap;
+    qsg->opaque = opaque;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
@@ -73,12 +97,23 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
     int i;
 
     for (i = 0; i < dbs->iov.niov; ++i) {
-        cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-                                  dbs->iov.iov[i].iov_len, !dbs->is_write,
-                                  dbs->iov.iov[i].iov_len);
+        dbs->sg->unmap(dbs->sg->opaque,
+                       dbs->iov.iov[i].iov_base,
+                       dbs->iov.iov[i].iov_len, !dbs->is_write,
+                       dbs->iov.iov[i].iov_len);
     }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+    DMAAIOCB *dbs = opaque;
+
+    bdrv_aio_cancel(dbs->acb);
+    dma_bdrv_unmap(dbs);
+    qemu_iovec_destroy(&dbs->iov);
+    qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
     DMAAIOCB *dbs = (DMAAIOCB *)opaque;
@@ -100,7 +135,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
     while (dbs->sg_cur_index < dbs->sg->nsg) {
         cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
         cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-        mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
+        mem = dbs->sg->map(dbs->sg->opaque, dma_bdrv_cancel, dbs,
+                           cur_addr, &cur_len, !dbs->is_write);
         if (!mem)
             break;
         qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index f3bb275..d48f35c 100644
--- a/dma.h
+++ b/dma.h
@@ -15,6 +15,19 @@
 #include "hw/hw.h"
 #include "block.h"
 
+typedef void QEMUSGInvalMapFunc(void *opaque);
+typedef void *QEMUSGMapFunc(void *opaque,
+                            QEMUSGInvalMapFunc *inval_cb,
+                            void *inval_opaque,
+                            target_phys_addr_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+typedef void QEMUSGUnmapFunc(void *opaque,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+
 typedef struct {
     target_phys_addr_t base;
     target_phys_addr_t len;
@@ -25,9 +38,15 @@ typedef struct {
     int nsg;
     int nalloc;
     target_phys_addr_t size;
+
+    QEMUSGMapFunc *map;
+    QEMUSGUnmapFunc *unmap;
+    void *opaque;
 } QEMUSGList;
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint,
+                      QEMUSGMapFunc *map, QEMUSGUnmapFunc *unmap,
+                      void *opaque);
 void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b3b7c2..c19013a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -435,7 +435,8 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
     } prd;
     int l, len;
 
-    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
+    qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1,
+                     bm->map, bm->unmap, bm->opaque);
     s->io_buffer_size = 0;
     for(;;) {
         if (bm->cur_prd_len == 0) {
@@ -443,7 +444,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -526,7 +527,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bmdma_memory_read(bm, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -541,11 +542,11 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             l = bm->cur_prd_len;
         if (l > 0) {
             if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_write(bm, bm->cur_prd_addr,
+                                   s->io_buffer + s->io_buffer_index, l);
             } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
+                bmdma_memory_read(bm, bm->cur_prd_addr,
+                                  s->io_buffer + s->io_buffer_index, l);
             }
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index eef1ee1..0f3b707 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -476,6 +476,24 @@ struct IDEDeviceInfo {
 #define BM_CMD_START     0x01
 #define BM_CMD_READ      0x08
 
+typedef void BMDMAInvalMapFunc(void *opaque);
+typedef void BMDMARWFunc(void *opaque,
+                         target_phys_addr_t addr,
+                         uint8_t *buf,
+                         target_phys_addr_t len,
+                         int is_write);
+typedef void *BMDMAMapFunc(void *opaque,
+                           BMDMAInvalMapFunc *inval_cb,
+                           void *inval_opaque,
+                           target_phys_addr_t addr,
+                           target_phys_addr_t *len,
+                           int is_write);
+typedef void BMDMAUnmapFunc(void *opaque,
+                            void *buffer,
+                            target_phys_addr_t len,
+                            int is_write,
+                            target_phys_addr_t access_len);
+
 struct BMDMAState {
     uint8_t cmd;
     uint8_t status;
@@ -495,8 +513,29 @@ struct BMDMAState {
     int64_t sector_num;
     uint32_t nsector;
     QEMUBH *bh;
+
+    BMDMARWFunc *rw;
+    BMDMAMapFunc *map;
+    BMDMAUnmapFunc *unmap;
+    void *opaque;
 };
 
+static inline void bmdma_memory_read(BMDMAState *bm,
+                                     target_phys_addr_t addr,
+                                     uint8_t *buf,
+                                     target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 0);
+}
+
+static inline void bmdma_memory_write(BMDMAState *bm,
+                                      target_phys_addr_t addr,
+                                      uint8_t *buf,
+                                      target_phys_addr_t len)
+{
+    bm->rw(bm->opaque, addr, buf, len, 1);
+}
+
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
     return bus->ifs + bus->unit;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4d95cc5..5879044 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -183,4 +183,11 @@ void pci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
             continue;
         ide_create_drive(d->bus+bus[i], unit[i], hd_table[i]);
     }
+
+    for (i = 0; i < 2; i++) {
+        d->bmdma[i].rw = (void *) pci_memory_rw;
+        d->bmdma[i].map = (void *) pci_memory_map;
+        d->bmdma[i].unmap = (void *) pci_memory_unmap;
+        d->bmdma[i].opaque = dev;
+    }
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/7] rtl8139: use the PCI memory access interface
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   99 ++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..99d5f69 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -496,6 +490,14 @@ typedef struct RTL8139State {
 
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    PCIDevice *dev = &s->dev;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCIDevice *dev = &s->dev;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        pci_memory_write(dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1284,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    PCIDevice *dev = &s->dev;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1767,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1791,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    pci_memory_read(dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1897,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1924,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2038,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    pci_memory_read(dev, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,10 +2066,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_write(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    pci_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2364,7 +2378,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2381,7 +2394,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 5/7] rtl8139: use the PCI memory access interface
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   99 ++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..99d5f69 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -412,12 +412,6 @@ typedef struct RTL8139TallyCounters
     uint16_t   TxUndrn;
 } RTL8139TallyCounters;
 
-/* Clears all tally counters */
-static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
-
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
     PCIDevice dev;
     uint8_t phys[8]; /* mac address */
@@ -496,6 +490,14 @@ typedef struct RTL8139State {
 
 } RTL8139State;
 
+/* Clears all tally counters */
+static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
+
+/* Writes tally counters to specified physical memory address */
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -746,6 +748,8 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    PCIDevice *dev = &s->dev;
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +761,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                                 buf, size-wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            pci_memory_write(dev, s->RxBuf + s->RxBufAddr,
+                             buf + (size-wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +778,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    pci_memory_write(dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -814,6 +818,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
 {
     RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+    PCIDevice *dev = &s->dev;
     int size = size_;
 
     uint32_t packet_header = 0;
@@ -968,13 +973,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        pci_memory_read(dev, cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1024,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        pci_memory_write(dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1037,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        pci_memory_write(dev, rx_addr + size, (uint8_t *)&val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1086,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc,    (uint8_t *)&val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        pci_memory_write(dev, cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1284,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void
+RTL8139TallyCounters_physical_memory_write(RTL8139State *s,
+                                           target_phys_addr_t tc_addr)
 {
+    PCIDevice *dev = &s->dev;
+    RTL8139TallyCounters *tally_counters = &s->tally_counters;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 0,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 8,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 16,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 24,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 28,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 30,    (uint8_t *)&val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 32,    (uint8_t *)&val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 36,    (uint8_t *)&val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 40,    (uint8_t *)&val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    pci_memory_write(dev, tc_addr + 48,    (uint8_t *)&val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    pci_memory_write(dev, tc_addr + 56,    (uint8_t *)&val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 60,    (uint8_t *)&val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    pci_memory_write(dev, tc_addr + 62,    (uint8_t *)&val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1767,8 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1791,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    pci_memory_read(dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1897,8 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    PCIDevice *dev = &s->dev;
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1924,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    pci_memory_read(dev, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2038,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    pci_memory_read(dev, tx_addr,
+                    s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,10 +2066,10 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    pci_memory_write(dev, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
-//    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
+//    pci_memory_write(dev, cplus_tx_ring_desc+4,  &val, 4);
 
     /* Now decide if descriptor being processed is holding the last segment of packet */
     if (txdw0 & CP_TX_LS)
@@ -2364,7 +2378,6 @@ static void rtl8139_transmit(RTL8139State *s)
 
 static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32_t val)
 {
-
     int descriptor = txRegOffset/4;
 
     /* handle C+ transmit mode register configuration */
@@ -2381,7 +2394,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(s, tc_addr);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/7] eepro100: use the PCI memory access interface
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   78 ++++++++++++++++++++++++++++++---------------------------
 1 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 97afa2c..6e23271 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -306,10 +306,10 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    pci_memory_write(&s->dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -692,12 +692,12 @@ static void dump_statistics(EEPRO100State * s)
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    pci_memory_write(&s->dev, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +707,8 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    pci_memory_read(&s->dev,
+                    s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -737,18 +738,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+        uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        pci_memory_read(&s->dev,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +760,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+                uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+                uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                pci_memory_read(&s->dev,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +778,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+            uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+            uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            pci_memory_read(&s->dev,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -811,7 +812,7 @@ static void set_multicast_list(EEPRO100State *s)
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        pci_memory_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -845,12 +846,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +883,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        pci_stw(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -947,7 +950,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +961,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,10 +1262,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1721,8 +1724,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    pci_memory_read(&s->dev,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1752,8 +1756,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    pci_memory_write(&s->dev, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 6/7] eepro100: use the PCI memory access interface
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |   78 ++++++++++++++++++++++++++++++---------------------------
 1 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 97afa2c..6e23271 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -306,10 +306,10 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(EEPRO100State * s, pcibus_t addr, uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    pci_memory_write(&s->dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -692,12 +692,12 @@ static void dump_statistics(EEPRO100State * s)
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    pci_memory_write(&s->dev, s->statsaddr,
+                     (uint8_t *) & s->statistics, s->stats_size);
+    stl_le_phys(s, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(s, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(s, s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(s, s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +707,8 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    pci_memory_read(&s->dev,
+                    s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -737,18 +738,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+        uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        pci_memory_read(&s->dev,
+                        tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +760,16 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+                uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+                uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                pci_memory_read(&s->dev,
+                                tx_buffer_address, &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +778,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = pci_ldl(&s->dev, tbd_address);
+            uint16_t tx_buffer_size = pci_lduw(&s->dev, tbd_address + 4);
+            uint16_t tx_buffer_el = pci_lduw(&s->dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            pci_memory_read(&s->dev,
+                            tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -811,7 +812,7 @@ static void set_multicast_list(EEPRO100State *s)
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        pci_memory_read(&s->dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -845,12 +846,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            pci_memory_read(&s->dev,
+                            s->cb_address + 8,
+                            &s->configuration[0], sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +883,7 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        pci_stw(&s->dev, s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -947,7 +950,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +961,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(s, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1259,10 +1262,10 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_read(&s->dev, address, (uint8_t *) & data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        pci_memory_write(&s->dev, address, (uint8_t *) & data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1721,8 +1724,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    pci_memory_read(&s->dev,
+                    s->ru_base + s->ru_offset,
+                    (uint8_t *) & rx, offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1752,8 +1756,8 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    pci_memory_write(&s->dev, s->ru_base + s->ru_offset +
+                     offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/7] ac97: use the PCI memory access interface
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro
  Cc: paul, blauwirbel, anthony, avi, kvm, qemu-devel,
	Eduard - Gabriel Munteanu

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 4319bc8..9ee4894 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    pci_memory_read (&s->dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        pci_memory_read (&s->dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        pci_memory_write (&s->dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 7/7] ac97: use the PCI memory access interface
@ 2010-08-15 19:27   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-15 19:27 UTC (permalink / raw)
  To: joro; +Cc: kvm, qemu-devel, blauwirbel, paul, Eduard - Gabriel Munteanu, avi

This allows the device to work properly with an emulated IOMMU.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 4319bc8..9ee4894 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -223,7 +223,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    pci_memory_read (&s->dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -972,7 +972,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        pci_memory_read (&s->dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1056,7 +1056,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        pci_memory_write (&s->dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] ac97: use the PCI memory access interface
  2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-15 20:42     ` malc
  -1 siblings, 0 replies; 42+ messages in thread
From: malc @ 2010-08-15 20:42 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, kvm, qemu-devel, blauwirbel, paul, avi

On Sun, 15 Aug 2010, Eduard - Gabriel Munteanu wrote:

> This allows the device to work properly with an emulated IOMMU.

Fine with me.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] ac97: use the PCI memory access interface
@ 2010-08-15 20:42     ` malc
  0 siblings, 0 replies; 42+ messages in thread
From: malc @ 2010-08-15 20:42 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, joro, qemu-devel, blauwirbel, avi, paul

On Sun, 15 Aug 2010, Eduard - Gabriel Munteanu wrote:

> This allows the device to work properly with an emulated IOMMU.

Fine with me.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] AMD IOMMU emulation patches v3
  2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-16  1:47   ` Anthony Liguori
  -1 siblings, 0 replies; 42+ messages in thread
From: Anthony Liguori @ 2010-08-16  1:47 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, blauwirbel, avi, kvm, qemu-devel

On 08/15/2010 02:27 PM, Eduard - Gabriel Munteanu wrote:
> Hi,
>
> Please have a look at these and merge if you wish. I hope I've addressed the
> issues people have raised.
>    

It's looking pretty good so far.  I'm very happy with the modifications 
to the PCI layer.

It looks like given the helpers that you've added, converting the PCI 
devices is more or less programmatic.  IOW, it just requires an 
appropriate sed.

I'd rather see an all-at-once conversion of the PCI devices than just 
convert over a couple functions.  In fact, we can go a step further 
after that and start poisoning symbols to prevent the wrong interfaces 
from being used.

Regards,

Anthony Liguori

> Some changes from the previous RFC:
> - included and updated the other two device patches
> - moved map registration and invalidation management into PCI code
> - AMD IOMMU emulation is always enabled (no more configure options)
> - cleaned up code, I now use typedefs as suggested
> - event logging cleanups
>
> BTW, the change to pci_regs.h is properly aligned but the original file contains
> tabs.
>
>
>          Cheers,
>          Eduard
>
> Eduard - Gabriel Munteanu (7):
>    pci: add range_covers_range()
>    pci: memory access API and IOMMU support
>    AMD IOMMU emulation
>    ide: use the PCI memory access interface
>    rtl8139: use the PCI memory access interface
>    eepro100: use the PCI memory access interface
>    ac97: use the PCI memory access interface
>
>   Makefile.target   |    2 +
>   dma-helpers.c     |   46 ++++-
>   dma.h             |   21 ++-
>   hw/ac97.c         |    6 +-
>   hw/amd_iommu.c    |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/eepro100.c     |   78 ++++---
>   hw/ide/core.c     |   15 +-
>   hw/ide/internal.h |   39 +++
>   hw/ide/pci.c      |    7 +
>   hw/pc.c           |    2 +
>   hw/pci.c          |  197 +++++++++++++++-
>   hw/pci.h          |   84 +++++++
>   hw/pci_ids.h      |    2 +
>   hw/pci_regs.h     |    1 +
>   hw/rtl8139.c      |   99 +++++----
>   qemu-common.h     |    1 +
>   16 files changed, 1191 insertions(+), 97 deletions(-)
>   create mode 100644 hw/amd_iommu.c
>
>    


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH 0/7] AMD IOMMU emulation patches v3
@ 2010-08-16  1:47   ` Anthony Liguori
  0 siblings, 0 replies; 42+ messages in thread
From: Anthony Liguori @ 2010-08-16  1:47 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, joro, qemu-devel, blauwirbel, paul, avi

On 08/15/2010 02:27 PM, Eduard - Gabriel Munteanu wrote:
> Hi,
>
> Please have a look at these and merge if you wish. I hope I've addressed the
> issues people have raised.
>    

It's looking pretty good so far.  I'm very happy with the modifications 
to the PCI layer.

It looks like given the helpers that you've added, converting the PCI 
devices is more or less programmatic.  IOW, it just requires an 
appropriate sed.

I'd rather see an all-at-once conversion of the PCI devices than just 
convert over a couple functions.  In fact, we can go a step further 
after that and start poisoning symbols to prevent the wrong interfaces 
from being used.

Regards,

Anthony Liguori

> Some changes from the previous RFC:
> - included and updated the other two device patches
> - moved map registration and invalidation management into PCI code
> - AMD IOMMU emulation is always enabled (no more configure options)
> - cleaned up code, I now use typedefs as suggested
> - event logging cleanups
>
> BTW, the change to pci_regs.h is properly aligned but the original file contains
> tabs.
>
>
>          Cheers,
>          Eduard
>
> Eduard - Gabriel Munteanu (7):
>    pci: add range_covers_range()
>    pci: memory access API and IOMMU support
>    AMD IOMMU emulation
>    ide: use the PCI memory access interface
>    rtl8139: use the PCI memory access interface
>    eepro100: use the PCI memory access interface
>    ac97: use the PCI memory access interface
>
>   Makefile.target   |    2 +
>   dma-helpers.c     |   46 ++++-
>   dma.h             |   21 ++-
>   hw/ac97.c         |    6 +-
>   hw/amd_iommu.c    |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/eepro100.c     |   78 ++++---
>   hw/ide/core.c     |   15 +-
>   hw/ide/internal.h |   39 +++
>   hw/ide/pci.c      |    7 +
>   hw/pc.c           |    2 +
>   hw/pci.c          |  197 +++++++++++++++-
>   hw/pci.h          |   84 +++++++
>   hw/pci_ids.h      |    2 +
>   hw/pci_regs.h     |    1 +
>   hw/rtl8139.c      |   99 +++++----
>   qemu-common.h     |    1 +
>   16 files changed, 1191 insertions(+), 97 deletions(-)
>   create mode 100644 hw/amd_iommu.c
>
>    

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/7] AMD IOMMU emulation
  2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-16 17:57     ` Blue Swirl
  -1 siblings, 0 replies; 42+ messages in thread
From: Blue Swirl @ 2010-08-16 17:57 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, anthony, avi, kvm, qemu-devel

On Sun, Aug 15, 2010 at 7:27 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +
>  hw/amd_iommu.c  |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 695 insertions(+), 0 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 70a9c1b..6b80a37 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o
>  obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
>  obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
>
> +obj-i386-y += amd_iommu.o
> +
>  # Hardware support
>  obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
>  obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..2e20888
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,688 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2010 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +    if (dev) {
> +        pci_memory_invalidate_range(dev, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    if (st->cmdbuf_head == st->cmdbuf_tail) {
> +        return;
> +    }
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +
> +    /* Increment and wrap head pointer. */
> +    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    if (st->cmdbuf_head >= st->cmdbuf_len) {
> +        st->cmdbuf_head = 0;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}
> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];

Since mmio_buf is in native LE endianness, this will not work on a big
endian host.

> +    uint64_t val = *base;

Don't you need ldq_phys?

> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_init_mmio(AMDIOMMUState *st)
> +{
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}

Could this be turned into a reset function?

> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write, st);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
> +    st->mmio_enabled = 1;
> +    amd_iommu_init_mmio(st);
> +}
> +
> +static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
> +                                     uint32_t addr, int len)
> +{
> +    return pci_default_cap_read_config(pci_dev, addr, len);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st;
> +    unsigned char *capab;
> +    int reg;
> +
> +    st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    capab = st->capab;
> +    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
> +
> +    switch (reg) {
> +        case CAPAB_HEADER:
> +        case CAPAB_MISC:
> +            /* Read-only. */
> +            return;
> +        case CAPAB_BAR_LOW:
> +        case CAPAB_BAR_HIGH:
> +        case CAPAB_RANGE:
> +            if (st->mmio_enabled)
> +                return;
> +            pci_default_cap_write_config(dev, addr, val, len);
> +            break;
> +        default:
> +            return;
> +    }
> +
> +    if (capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static int amd_iommu_init_capab(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st;
> +    unsigned char *capab;
> +
> +    st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    capab = st->dev.config + st->capab_offset;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    st->capab = capab;
> +    st->dev.cap.length = CAPAB_SIZE;
> +
> +    return 0;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static int amd_iommu_translate(PCIDevice *iommu,
> +                               PCIDevice *dev,
> +                               pcibus_t addr,
> +                               target_phys_addr_t *paddr,
> +                               target_phys_addr_t *len,
> +                               unsigned perms)
> +{
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /* Get device table entry. */
> +    devfn = dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;
> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    int err;
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC,
> +                                          CAPAB_SIZE);
> +    err = pci_enable_capability_support(&st->dev, st->capab_offset,
> +                                        amd_iommu_read_capab,
> +                                        amd_iommu_write_capab,
> +                                        amd_iommu_init_capab);
> +    if (err) {
> +        return err;
> +    }
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> +
> diff --git a/hw/pc.c b/hw/pc.c
> index 186e322..4616318 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1066,6 +1066,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>     int max_bus;
>     int bus;
>
> +    pci_create_simple(bus, -1, "amd-iommu");
> +
>     max_bus = drive_get_max_bus(IF_SCSI);
>     for (bus = 0; bus <= max_bus; bus++) {
>         pci_create_simple(pci_bus, -1, "lsi53c895a");
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index 39e9f1d..d790312 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -26,6 +26,7 @@
>
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -56,6 +57,7 @@
>
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
>
>  #define PCI_VENDOR_ID_MOTOROLA           0x1057
>  #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index 1c675dc..0fb942b 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -216,6 +216,7 @@
>  #define  PCI_CAP_ID_SHPC       0x0C    /* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID      0x0D    /* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3       0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC                0x0F    /* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP        0x10    /* PCI Express */
>  #define  PCI_CAP_ID_MSIX       0x11    /* MSI-X */
>  #define  PCI_CAP_ID_AF         0x13    /* PCI Advanced Features */
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] Re: [PATCH 3/7] AMD IOMMU emulation
@ 2010-08-16 17:57     ` Blue Swirl
  0 siblings, 0 replies; 42+ messages in thread
From: Blue Swirl @ 2010-08-16 17:57 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, joro, qemu-devel, paul, avi

On Sun, Aug 15, 2010 at 7:27 PM, Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro> wrote:
> This introduces emulation for the AMD IOMMU, described in "AMD I/O
> Virtualization Technology (IOMMU) Specification".
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  Makefile.target |    2 +
>  hw/amd_iommu.c  |  688 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/pc.c         |    2 +
>  hw/pci_ids.h    |    2 +
>  hw/pci_regs.h   |    1 +
>  5 files changed, 695 insertions(+), 0 deletions(-)
>  create mode 100644 hw/amd_iommu.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 70a9c1b..6b80a37 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -219,6 +219,8 @@ obj-i386-y += pcspk.o i8254.o
>  obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
>  obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
>
> +obj-i386-y += amd_iommu.o
> +
>  # Hardware support
>  obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
>  obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
> diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
> new file mode 100644
> index 0000000..2e20888
> --- /dev/null
> +++ b/hw/amd_iommu.c
> @@ -0,0 +1,688 @@
> +/*
> + * AMD IOMMU emulation
> + *
> + * Copyright (c) 2010 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "pc.h"
> +#include "hw.h"
> +#include "pci.h"
> +#include "qlist.h"
> +
> +/* Capability registers */
> +#define CAPAB_HEADER            0x00
> +#define   CAPAB_REV_TYPE        0x02
> +#define   CAPAB_FLAGS           0x03
> +#define CAPAB_BAR_LOW           0x04
> +#define CAPAB_BAR_HIGH          0x08
> +#define CAPAB_RANGE             0x0C
> +#define CAPAB_MISC              0x10
> +
> +#define CAPAB_SIZE              0x14
> +
> +/* Capability header data */
> +#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
> +#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
> +#define CAPAB_FLAG_NPCACHE      (1 << 2)
> +#define CAPAB_INIT_REV          (1 << 3)
> +#define CAPAB_INIT_TYPE         3
> +#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
> +#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
> +#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
> +#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
> +
> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x4000
> +
> +#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
> +#define MMIO_DEVTAB_ENTRY_SIZE  32
> +#define MMIO_DEVTAB_SIZE_UNIT   4096
> +
> +#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
> +#define MMIO_CMDBUF_SIZE_MASK       0x0F
> +#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
> +#define MMIO_CMDBUF_DEFAULT_SIZE    8
> +#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
> +#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
> +#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
> +#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
> +#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
> +#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
> +
> +#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
> +#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
> +#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
> +#define MMIO_EXCL_LIMIT_LOW         0xFFF
> +
> +#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
> +#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
> +#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
> +#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
> +#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
> +#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
> +
> +#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
> +#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
> +#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
> +#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
> +#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
> +
> +#define CMDBUF_ID_BYTE              0x07
> +#define CMDBUF_ID_RSHIFT            4
> +#define CMDBUF_ENTRY_SIZE           0x10
> +
> +#define CMD_COMPLETION_WAIT         0x01
> +#define CMD_INVAL_DEVTAB_ENTRY      0x02
> +#define CMD_INVAL_IOMMU_PAGES       0x03
> +#define CMD_INVAL_IOTLB_PAGES       0x04
> +#define CMD_INVAL_INTR_TABLE        0x05
> +
> +#define DEVTAB_ENTRY_SIZE           32
> +
> +/* Device table entry bits 0:63 */
> +#define DEV_VALID                   (1ULL << 0)
> +#define DEV_TRANSLATION_VALID       (1ULL << 1)
> +#define DEV_MODE_MASK               0x7
> +#define DEV_MODE_RSHIFT             9
> +#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
> +#define DEV_PT_ROOT_RSHIFT          12
> +#define DEV_PERM_SHIFT              61
> +#define DEV_PERM_READ               (1ULL << 61)
> +#define DEV_PERM_WRITE              (1ULL << 62)
> +
> +/* Device table entry bits 64:127 */
> +#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
> +#define DEV_IOTLB_SUPPORT           (1ULL << 17)
> +#define DEV_SUPPRESS_PF             (1ULL << 18)
> +#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
> +#define DEV_IOCTL_MASK              ~3
> +#define DEV_IOCTL_RSHIFT            20
> +#define   DEV_IOCTL_DENY            0
> +#define   DEV_IOCTL_PASSTHROUGH     1
> +#define   DEV_IOCTL_TRANSLATE       2
> +#define DEV_CACHE                   (1ULL << 37)
> +#define DEV_SNOOP_DISABLE           (1ULL << 38)
> +#define DEV_EXCL                    (1ULL << 39)
> +
> +/* Event codes and flags, as stored in the info field */
> +#define EVENT_ILLEGAL_DEVTAB_ENTRY  (0x1U << 24)
> +#define EVENT_IOPF                  (0x2U << 24)
> +#define   EVENT_IOPF_I              (1U << 3)
> +#define   EVENT_IOPF_PR             (1U << 4)
> +#define   EVENT_IOPF_RW             (1U << 5)
> +#define   EVENT_IOPF_PE             (1U << 6)
> +#define   EVENT_IOPF_RZ             (1U << 7)
> +#define   EVENT_IOPF_TR             (1U << 8)
> +#define EVENT_DEV_TAB_HW_ERROR      (0x3U << 24)
> +#define EVENT_PAGE_TAB_HW_ERROR     (0x4U << 24)
> +#define EVENT_ILLEGAL_COMMAND_ERROR (0x5U << 24)
> +#define EVENT_COMMAND_HW_ERROR      (0x6U << 24)
> +#define EVENT_IOTLB_INV_TIMEOUT     (0x7U << 24)
> +#define EVENT_INVALID_DEV_REQUEST   (0x8U << 24)
> +
> +#define EVENT_LEN                   16
> +
> +typedef struct AMDIOMMUState {
> +    PCIDevice                   dev;
> +
> +    int                         capab_offset;
> +    unsigned char               *capab;
> +
> +    int                         mmio_index;
> +    target_phys_addr_t          mmio_addr;
> +    unsigned char               *mmio_buf;
> +    int                         mmio_enabled;
> +
> +    int                         enabled;
> +    int                         ats_enabled;
> +
> +    target_phys_addr_t          devtab;
> +    size_t                      devtab_len;
> +
> +    target_phys_addr_t          cmdbuf;
> +    int                         cmdbuf_enabled;
> +    size_t                      cmdbuf_len;
> +    size_t                      cmdbuf_head;
> +    size_t                      cmdbuf_tail;
> +    int                         completion_wait_intr;
> +
> +    target_phys_addr_t          evtlog;
> +    int                         evtlog_enabled;
> +    int                         evtlog_intr;
> +    target_phys_addr_t          evtlog_len;
> +    target_phys_addr_t          evtlog_head;
> +    target_phys_addr_t          evtlog_tail;
> +
> +    target_phys_addr_t          excl_base;
> +    target_phys_addr_t          excl_limit;
> +    int                         excl_enabled;
> +    int                         excl_allow;
> +} AMDIOMMUState;
> +
> +typedef struct AMDIOMMUEvent {
> +    uint16_t    devfn;
> +    uint16_t    reserved;
> +    uint16_t    domid;
> +    uint16_t    info;
> +    uint64_t    addr;
> +} __attribute__((packed)) AMDIOMMUEvent;
> +
> +static void amd_iommu_completion_wait(AMDIOMMUState *st,
> +                                      uint8_t *cmd)
> +{
> +    uint64_t addr;
> +
> +    if (cmd[0] & 1) {
> +        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
> +        cpu_physical_memory_write(addr, cmd + 8, 8);
> +    }
> +
> +    if (cmd[0] & 2)
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
> +}
> +
> +static void amd_iommu_invalidate_iotlb(AMDIOMMUState *st,
> +                                       uint8_t *cmd)
> +{
> +    PCIDevice *dev;
> +    PCIBus *bus = st->dev.bus;
> +    int bus_num = pci_bus_num(bus);
> +    int devfn = *(uint16_t *) cmd;
> +
> +    dev = pci_find_device(bus, bus_num, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +    if (dev) {
> +        pci_memory_invalidate_range(dev, 0, -1);
> +    }
> +}
> +
> +static void amd_iommu_cmdbuf_run(AMDIOMMUState *st)
> +{
> +    uint8_t cmd[16];
> +    int type;
> +
> +    if (!st->cmdbuf_enabled) {
> +        return;
> +    }
> +
> +    /* Check if there's work to do. */
> +    if (st->cmdbuf_head == st->cmdbuf_tail) {
> +        return;
> +    }
> +
> +    cpu_physical_memory_read(st->cmdbuf + st->cmdbuf_head, cmd, 16);
> +    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
> +    switch (type) {
> +        case CMD_COMPLETION_WAIT:
> +            amd_iommu_completion_wait(st, cmd);
> +            break;
> +        case CMD_INVAL_DEVTAB_ENTRY:
> +            break;
> +        case CMD_INVAL_IOMMU_PAGES:
> +            break;
> +        case CMD_INVAL_IOTLB_PAGES:
> +            amd_iommu_invalidate_iotlb(st, cmd);
> +            break;
> +        case CMD_INVAL_INTR_TABLE:
> +            break;
> +        default:
> +            break;
> +    }
> +
> +    /* Increment and wrap head pointer. */
> +    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
> +    if (st->cmdbuf_head >= st->cmdbuf_len) {
> +        st->cmdbuf_head = 0;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_buf_read(AMDIOMMUState *st,
> +                                        size_t offset,
> +                                        size_t size)
> +{
> +    ssize_t i;
> +    uint32_t ret;
> +
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = st->mmio_buf[offset + size - 1];
> +    for (i = size - 2; i >= 0; i--) {
> +        ret <<= 8;
> +        ret |= st->mmio_buf[offset + i];
> +    }
> +
> +    return ret;
> +}
> +
> +static void amd_iommu_mmio_buf_write(AMDIOMMUState *st,
> +                                     size_t offset,
> +                                     size_t size,
> +                                     uint32_t val)
> +{
> +    size_t i;
> +
> +    for (i = 0; i < size; i++) {
> +        st->mmio_buf[offset + i] = val & 0xFF;
> +        val >>= 8;
> +    }
> +}
> +
> +static void amd_iommu_update_mmio(AMDIOMMUState *st,
> +                                  target_phys_addr_t addr)
> +{
> +    size_t reg = addr & ~0x07;
> +    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];

Since mmio_buf is in native LE endianness, this will not work on a big
endian host.

> +    uint64_t val = *base;

Don't you need ldq_phys?

> +
> +    switch (reg) {
> +        case MMIO_CONTROL:
> +            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
> +            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
> +            st->evtlog_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_EVENTLOGEN);
> +            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
> +            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
> +            st->cmdbuf_enabled       = st->enabled &&
> +                                       !!(val & MMIO_CONTROL_CMDBUFEN);
> +
> +            /* Update status flags depending on the control register. */
> +            if (st->cmdbuf_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_CMDBUF_RUN;
> +            }
> +            if (st->evtlog_enabled) {
> +                st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
> +            } else {
> +                st->mmio_buf[MMIO_STATUS] &= ~MMIO_STATUS_EVTLOG_RUN;
> +            }
> +
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_DEVICE_TABLE:
> +            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
> +            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
> +                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
> +            break;
> +        case MMIO_COMMAND_BASE:
> +            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
> +            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
> +                                     MMIO_CMDBUF_SIZE_MASK);
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_HEAD:
> +            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_COMMAND_TAIL:
> +            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
> +            amd_iommu_cmdbuf_run(st);
> +            break;
> +        case MMIO_EVENT_BASE:
> +            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
> +            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
> +                                     MMIO_EVTLOG_SIZE_MASK);
> +            break;
> +        case MMIO_EVENT_HEAD:
> +            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
> +            break;
> +        case MMIO_EVENT_TAIL:
> +            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
> +            break;
> +        case MMIO_EXCL_BASE:
> +            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
> +            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
> +            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
> +            break;
> +        case MMIO_EXCL_LIMIT:
> +            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
> +                                                   MMIO_EXCL_LIMIT_LOW);
> +            break;
> +        default:
> +            break;
> +    }
> +}
> +
> +static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 1);
> +}
> +
> +static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 2);
> +}
> +
> +static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    return amd_iommu_mmio_buf_read(st, addr, 4);
> +}
> +
> +static void amd_iommu_mmio_writeb(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 1, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writew(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 2, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static void amd_iommu_mmio_writel(void *opaque,
> +                                  target_phys_addr_t addr,
> +                                  uint32_t val)
> +{
> +    AMDIOMMUState *st = opaque;
> +
> +    amd_iommu_mmio_buf_write(st, addr, 4, val);
> +    amd_iommu_update_mmio(st, addr);
> +}
> +
> +static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
> +    amd_iommu_mmio_readb,
> +    amd_iommu_mmio_readw,
> +    amd_iommu_mmio_readl,
> +};
> +
> +static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
> +    amd_iommu_mmio_writeb,
> +    amd_iommu_mmio_writew,
> +    amd_iommu_mmio_writel,
> +};
> +
> +static void amd_iommu_init_mmio(AMDIOMMUState *st)
> +{
> +    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
> +    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
> +}

Could this be turned into a reset function?

> +
> +static void amd_iommu_enable_mmio(AMDIOMMUState *st)
> +{
> +    target_phys_addr_t addr;
> +
> +    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
> +                                            amd_iommu_mmio_write, st);
> +    if (st->mmio_index < 0) {
> +        return;
> +    }
> +
> +    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
> +    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
> +
> +    st->mmio_addr = addr;
> +    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
> +    st->mmio_enabled = 1;
> +    amd_iommu_init_mmio(st);
> +}
> +
> +static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
> +                                     uint32_t addr, int len)
> +{
> +    return pci_default_cap_read_config(pci_dev, addr, len);
> +}
> +
> +static void amd_iommu_write_capab(PCIDevice *dev,
> +                                  uint32_t addr, uint32_t val, int len)
> +{
> +    AMDIOMMUState *st;
> +    unsigned char *capab;
> +    int reg;
> +
> +    st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    capab = st->capab;
> +    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
> +
> +    switch (reg) {
> +        case CAPAB_HEADER:
> +        case CAPAB_MISC:
> +            /* Read-only. */
> +            return;
> +        case CAPAB_BAR_LOW:
> +        case CAPAB_BAR_HIGH:
> +        case CAPAB_RANGE:
> +            if (st->mmio_enabled)
> +                return;
> +            pci_default_cap_write_config(dev, addr, val, len);
> +            break;
> +        default:
> +            return;
> +    }
> +
> +    if (capab[CAPAB_BAR_LOW] & 0x1) {
> +        amd_iommu_enable_mmio(st);
> +    }
> +}
> +
> +static int amd_iommu_init_capab(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st;
> +    unsigned char *capab;
> +
> +    st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    capab = st->dev.config + st->capab_offset;
> +
> +    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
> +    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
> +    capab[CAPAB_BAR_LOW]   = 0;
> +    capab[CAPAB_BAR_HIGH]  = 0;
> +    capab[CAPAB_RANGE]     = 0;
> +    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
> +
> +    st->capab = capab;
> +    st->dev.cap.length = CAPAB_SIZE;
> +
> +    return 0;
> +}
> +
> +static void amd_iommu_log_event(AMDIOMMUState *st, AMDIOMMUEvent *evt)
> +{
> +    if (!st->evtlog_enabled ||
> +        (st->mmio_buf[MMIO_STATUS] | MMIO_STATUS_EVTLOG_OF)) {
> +        return;
> +    }
> +
> +    if (st->evtlog_tail >= st->evtlog_len) {
> +        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_OF;
> +    }
> +
> +    cpu_physical_memory_write(st->evtlog + st->evtlog_tail,
> +                              (uint8_t *) evt, EVENT_LEN);
> +
> +    st->evtlog_tail += EVENT_LEN;
> +    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_INTR;
> +}
> +
> +static void amd_iommu_page_fault(AMDIOMMUState *st,
> +                                 int devfn,
> +                                 unsigned domid,
> +                                 target_phys_addr_t addr,
> +                                 int present,
> +                                 int is_write)
> +{
> +    AMDIOMMUEvent evt;
> +    unsigned info;
> +
> +    evt.devfn = cpu_to_le16(devfn);
> +    evt.reserved = 0;
> +    evt.domid = cpu_to_le16(domid);
> +    evt.addr = cpu_to_le64(addr);
> +
> +    info = EVENT_IOPF;
> +    if (present) {
> +        info |= EVENT_IOPF_PR;
> +    }
> +    if (is_write) {
> +        info |= EVENT_IOPF_RW;
> +    }
> +    evt.info = cpu_to_le16(info);
> +
> +    amd_iommu_log_event(st, &evt);
> +}
> +
> +static inline uint64_t amd_iommu_get_perms(uint64_t entry)
> +{
> +    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
> +}
> +
> +static int amd_iommu_translate(PCIDevice *iommu,
> +                               PCIDevice *dev,
> +                               pcibus_t addr,
> +                               target_phys_addr_t *paddr,
> +                               target_phys_addr_t *len,
> +                               unsigned perms)
> +{
> +    int devfn, present;
> +    target_phys_addr_t entry_addr, pte_addr;
> +    uint64_t entry[4], pte, page_offset, pte_perms;
> +    unsigned level, domid;
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu);
> +
> +    if (!st->enabled) {
> +        goto no_translation;
> +    }
> +
> +    /* Get device table entry. */
> +    devfn = dev->devfn;
> +    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
> +    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
> +
> +    pte = entry[0];
> +    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
> +        goto no_translation;
> +    }
> +    domid = entry[1] & DEV_DOMAIN_ID_MASK;
> +    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    while (level > 0) {
> +        /*
> +         * Check permissions: the bitwise
> +         * implication perms -> entry_perms must be true.
> +         */
> +        pte_perms = amd_iommu_get_perms(pte);
> +        present = pte & 1;
> +        if (!present || perms != (perms & pte_perms)) {
> +            amd_iommu_page_fault(st, devfn, domid, addr,
> +                                 present, !!(perms & IOMMU_PERM_WRITE));
> +            return -EPERM;
> +        }
> +
> +        /* Go to the next lower level. */
> +        pte_addr = pte & DEV_PT_ROOT_MASK;
> +        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
> +        pte = ldq_phys(pte_addr);
> +        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
> +    }
> +    page_offset = addr & 4095;
> +    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
> +    *len = 4096 - page_offset;
> +
> +    return 0;
> +
> +no_translation:
> +    *paddr = addr;
> +    *len = -1;
> +    return 0;
> +}
> +
> +static int amd_iommu_pci_initfn(PCIDevice *dev)
> +{
> +    AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, dev);
> +    int err;
> +
> +    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
> +    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
> +    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
> +
> +    st->capab_offset = pci_add_capability(&st->dev,
> +                                          PCI_CAP_ID_SEC,
> +                                          CAPAB_SIZE);
> +    err = pci_enable_capability_support(&st->dev, st->capab_offset,
> +                                        amd_iommu_read_capab,
> +                                        amd_iommu_write_capab,
> +                                        amd_iommu_init_capab);
> +    if (err) {
> +        return err;
> +    }
> +
> +    pci_register_iommu(dev, amd_iommu_translate);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_amd_iommu = {
> +    .name                       = "amd-iommu",
> +    .version_id                 = 1,
> +    .minimum_version_id         = 1,
> +    .minimum_version_id_old     = 1,
> +    .fields                     = (VMStateField []) {
> +        VMSTATE_PCI_DEVICE(dev, AMDIOMMUState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static PCIDeviceInfo amd_iommu_pci_info = {
> +    .qdev.name    = "amd-iommu",
> +    .qdev.desc    = "AMD IOMMU",
> +    .qdev.size    = sizeof(AMDIOMMUState),
> +    .qdev.vmsd    = &vmstate_amd_iommu,
> +    .init         = amd_iommu_pci_initfn,
> +};
> +
> +static void amd_iommu_register(void)
> +{
> +    pci_qdev_register(&amd_iommu_pci_info);
> +}
> +
> +device_init(amd_iommu_register);
> +
> diff --git a/hw/pc.c b/hw/pc.c
> index 186e322..4616318 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1066,6 +1066,8 @@ void pc_pci_device_init(PCIBus *pci_bus)
>     int max_bus;
>     int bus;
>
> +    pci_create_simple(bus, -1, "amd-iommu");
> +
>     max_bus = drive_get_max_bus(IF_SCSI);
>     for (bus = 0; bus <= max_bus; bus++) {
>         pci_create_simple(pci_bus, -1, "lsi53c895a");
> diff --git a/hw/pci_ids.h b/hw/pci_ids.h
> index 39e9f1d..d790312 100644
> --- a/hw/pci_ids.h
> +++ b/hw/pci_ids.h
> @@ -26,6 +26,7 @@
>
>  #define PCI_CLASS_MEMORY_RAM             0x0500
>
> +#define PCI_CLASS_SYSTEM_IOMMU           0x0806
>  #define PCI_CLASS_SYSTEM_OTHER           0x0880
>
>  #define PCI_CLASS_SERIAL_USB             0x0c03
> @@ -56,6 +57,7 @@
>
>  #define PCI_VENDOR_ID_AMD                0x1022
>  #define PCI_DEVICE_ID_AMD_LANCE          0x2000
> +#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
>
>  #define PCI_VENDOR_ID_MOTOROLA           0x1057
>  #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
> diff --git a/hw/pci_regs.h b/hw/pci_regs.h
> index 1c675dc..0fb942b 100644
> --- a/hw/pci_regs.h
> +++ b/hw/pci_regs.h
> @@ -216,6 +216,7 @@
>  #define  PCI_CAP_ID_SHPC       0x0C    /* PCI Standard Hot-Plug Controller */
>  #define  PCI_CAP_ID_SSVID      0x0D    /* Bridge subsystem vendor/device ID */
>  #define  PCI_CAP_ID_AGP3       0x0E    /* AGP Target PCI-PCI bridge */
> +#define  PCI_CAP_ID_SEC                0x0F    /* Secure Device (AMD IOMMU) */
>  #define  PCI_CAP_ID_EXP        0x10    /* PCI Express */
>  #define  PCI_CAP_ID_MSIX       0x11    /* MSI-X */
>  #define  PCI_CAP_ID_AF         0x13    /* PCI Advanced Features */
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] pci: add range_covers_range()
  2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-18  4:39     ` Isaku Yamahata
  -1 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-08-18  4:39 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, kvm, qemu-devel, blauwirbel, paul, avi

This function seems same to ranges_overlap().
Please use ranges_overlap().

On Sun, Aug 15, 2010 at 10:27:16PM +0300, Eduard - Gabriel Munteanu wrote:
> This helper function allows map invalidation code to determine which
> maps must be invalidated.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.h |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.h b/hw/pci.h
> index 4bd8a1a..5a6cdb5 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -419,6 +419,16 @@ static inline int range_covers_byte(uint64_t offset, uint64_t len,
>      return offset <= byte && byte <= range_get_last(offset, len);
>  }
>  
> +/* Check whether a given range completely covers another. */
> +static inline int range_covers_range(uint64_t first_big, uint64_t len_big,
> +                                     uint64_t first_small, uint64_t len_small)
> +{
> +    uint64_t last_big = range_get_last(first_big, len_big);
> +    uint64_t last_small = range_get_last(first_small, len_small);
> +
> +    return first_big <= first_small && last_small <= last_big;
> +}
> +
>  /* Check whether 2 given ranges overlap.
>   * Undefined if ranges that wrap around 0. */
>  static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> -- 
> 1.7.1
> 
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] pci: add range_covers_range()
@ 2010-08-18  4:39     ` Isaku Yamahata
  0 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-08-18  4:39 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: kvm, joro, qemu-devel, blauwirbel, avi, paul

This function seems same to ranges_overlap().
Please use ranges_overlap().

On Sun, Aug 15, 2010 at 10:27:16PM +0300, Eduard - Gabriel Munteanu wrote:
> This helper function allows map invalidation code to determine which
> maps must be invalidated.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.h |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.h b/hw/pci.h
> index 4bd8a1a..5a6cdb5 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -419,6 +419,16 @@ static inline int range_covers_byte(uint64_t offset, uint64_t len,
>      return offset <= byte && byte <= range_get_last(offset, len);
>  }
>  
> +/* Check whether a given range completely covers another. */
> +static inline int range_covers_range(uint64_t first_big, uint64_t len_big,
> +                                     uint64_t first_small, uint64_t len_small)
> +{
> +    uint64_t last_big = range_get_last(first_big, len_big);
> +    uint64_t last_small = range_get_last(first_small, len_small);
> +
> +    return first_big <= first_small && last_small <= last_big;
> +}
> +
>  /* Check whether 2 given ranges overlap.
>   * Undefined if ranges that wrap around 0. */
>  static inline int ranges_overlap(uint64_t first1, uint64_t len1,
> -- 
> 1.7.1
> 
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-08-18  5:02     ` Isaku Yamahata
  -1 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-08-18  5:02 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: joro, kvm, qemu-devel, blauwirbel, paul, avi, mst

Added Cc: mst@redhat.com
This patch doesn't apply to MST's pci tree.
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git pci

Because Michael is the pci maintainer, please rebase to the tree
and resend it?

On Sun, Aug 15, 2010 at 10:27:17PM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.c      |  197 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/pci.h      |   74 +++++++++++++++++++++
>  qemu-common.h |    1 +
>  3 files changed, 271 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 6871728..8668e06 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -58,6 +58,18 @@ struct PCIBus {
>         Keep a count of the number of devices with raised IRQs.  */
>      int nirq;
>      int *irq_count;
> +
> +    PCIDevice                       *iommu;
> +    PCITranslateFunc                *translate;
> +};
> +
> +struct PCIMemoryMap {
> +    pcibus_t                        addr;
> +    pcibus_t                        len;
> +    target_phys_addr_t              paddr;
> +    PCIInvalidateMapFunc            *invalidate;
> +    void                            *invalidate_opaque;
> +    QLIST_ENTRY(PCIMemoryMap)       list;
>  };
>  
>  static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
> @@ -166,6 +178,19 @@ static void pci_device_reset(PCIDevice *dev)
>      pci_update_mappings(dev);
>  }
>  
> +static int pci_no_translate(PCIDevice *iommu,
> +                            PCIDevice *dev,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *paddr,
> +                            target_phys_addr_t *len,
> +                            unsigned perms)
> +{
> +    *paddr = addr;
> +    *len = -1;
> +
> +    return 0;
> +}
> +
>  static void pci_bus_reset(void *opaque)
>  {
>      PCIBus *bus = opaque;
> @@ -227,7 +252,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
>                           const char *name, int devfn_min)
>  {
>      qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
> -    bus->devfn_min = devfn_min;
> +
> +    bus->devfn_min              = devfn_min;
> +    bus->iommu                  = NULL;
> +    bus->translate              = pci_no_translate;
>  
>      /* host bridge */
>      QLIST_INIT(&bus->child);
> @@ -2029,6 +2057,173 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
>      }
>  }
>  
> +void pci_register_iommu(PCIDevice *iommu,
> +                        PCITranslateFunc *translate)
> +{
> +    iommu->bus->iommu = iommu;
> +    iommu->bus->translate = translate;
> +}
> +
> +void pci_memory_rw(PCIDevice *dev,
> +                   pcibus_t addr,
> +                   uint8_t *buf,
> +                   pcibus_t len,
> +                   int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    while (len) {
> +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +        if (err)
> +            return;
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len)
> +            plen = len;
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static void pci_memory_register_map(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    pcibus_t len,
> +                                    target_phys_addr_t paddr,
> +                                    PCIInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    PCIMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(PCIMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> +}
> +
> +static void pci_memory_unregister_map(PCIDevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      target_phys_addr_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void pci_memory_invalidate_range(PCIDevice *dev,
> +                                 pcibus_t addr,
> +                                 pcibus_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (range_covers_range(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *pci_memory_map(PCIDevice *dev,
> +                     PCIInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     pcibus_t addr,
> +                     target_phys_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    plen = *len;
> +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +    if (err)
> +        return NULL;
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len)
> +        *len = plen;
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb)
> +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void pci_memory_unmap(PCIDevice *dev,
> +                      void *buffer,
> +                      target_phys_addr_t len,
> +                      int is_write,
> +                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
> +}
> +
> +#define DEFINE_PCI_LD(suffix, size)                                       \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_PCI_ST(suffix, size)                                       \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_PCI_LD(ub, 8)
> +DEFINE_PCI_LD(uw, 16)
> +DEFINE_PCI_LD(l, 32)
> +DEFINE_PCI_LD(q, 64)                  
> +
> +DEFINE_PCI_ST(b, 8)
> +DEFINE_PCI_ST(w, 16)
> +DEFINE_PCI_ST(l, 32)
> +DEFINE_PCI_ST(q, 64)
> +
>  static PCIDeviceInfo bridge_info = {
>      .qdev.name    = "pci-bridge",
>      .qdev.size    = sizeof(PCIBridge),
> diff --git a/hw/pci.h b/hw/pci.h
> index 5a6cdb5..a62bc8e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -203,6 +203,8 @@ struct PCIDevice {
>          PCICapConfigReadFunc *config_read;
>          PCICapConfigWriteFunc *config_write;
>      } cap;
> +
> +    QLIST_HEAD(, PCIMemoryMap) memory_maps;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -440,4 +442,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>      return !(last2 < first1 || last1 < first2);
>  }
>  
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ     (1 << 0)
> +#define IOMMU_PERM_WRITE    (1 << 1)
> +#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> +                             PCIDevice *dev,
> +                             pcibus_t addr,
> +                             target_phys_addr_t *paddr,
> +                             target_phys_addr_t *len,
> +                             unsigned perms);
> +
> +extern void pci_memory_rw(PCIDevice *dev,
> +                          pcibus_t addr,
> +                          uint8_t *buf,
> +                          pcibus_t len,
> +                          int is_write);
> +extern void *pci_memory_map(PCIDevice *dev,
> +                            PCIInvalidateMapFunc *cb,
> +                            void *opaque,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +extern void pci_memory_unmap(PCIDevice *dev,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +extern void pci_register_iommu(PCIDevice *dev,
> +                               PCITranslateFunc *translate);
> +extern void pci_memory_invalidate_range(PCIDevice *dev,
> +                                        pcibus_t addr,
> +                                        pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size)                                    \
> +extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size)                                    \
> +extern void pci_st##suffix(PCIDevice *dev,                              \
> +                           pcibus_t addr,                               \
> +                           uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)                  
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> +                                   pcibus_t addr,
> +                                   uint8_t *buf,
> +                                   pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    const uint8_t *buf,
> +                                    pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
>  #endif
> diff --git a/qemu-common.h b/qemu-common.h
> index 3fb2f0b..40c6d58 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
>  typedef struct PCIHostState PCIHostState;
>  typedef struct PCIExpressHost PCIExpressHost;
>  typedef struct PCIBus PCIBus;
> +typedef struct PCIMemoryMap PCIMemoryMap;
>  typedef struct PCIDevice PCIDevice;
>  typedef struct SerialState SerialState;
>  typedef struct IRQState *qemu_irq;
> -- 
> 1.7.1
> 
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-08-18  5:02     ` Isaku Yamahata
  0 siblings, 0 replies; 42+ messages in thread
From: Isaku Yamahata @ 2010-08-18  5:02 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, avi, paul

Added Cc: mst@redhat.com
This patch doesn't apply to MST's pci tree.
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git pci

Because Michael is the pci maintainer, please rebase to the tree
and resend it?

On Sun, Aug 15, 2010 at 10:27:17PM +0300, Eduard - Gabriel Munteanu wrote:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
> 
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.c      |  197 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/pci.h      |   74 +++++++++++++++++++++
>  qemu-common.h |    1 +
>  3 files changed, 271 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 6871728..8668e06 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -58,6 +58,18 @@ struct PCIBus {
>         Keep a count of the number of devices with raised IRQs.  */
>      int nirq;
>      int *irq_count;
> +
> +    PCIDevice                       *iommu;
> +    PCITranslateFunc                *translate;
> +};
> +
> +struct PCIMemoryMap {
> +    pcibus_t                        addr;
> +    pcibus_t                        len;
> +    target_phys_addr_t              paddr;
> +    PCIInvalidateMapFunc            *invalidate;
> +    void                            *invalidate_opaque;
> +    QLIST_ENTRY(PCIMemoryMap)       list;
>  };
>  
>  static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent);
> @@ -166,6 +178,19 @@ static void pci_device_reset(PCIDevice *dev)
>      pci_update_mappings(dev);
>  }
>  
> +static int pci_no_translate(PCIDevice *iommu,
> +                            PCIDevice *dev,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *paddr,
> +                            target_phys_addr_t *len,
> +                            unsigned perms)
> +{
> +    *paddr = addr;
> +    *len = -1;
> +
> +    return 0;
> +}
> +
>  static void pci_bus_reset(void *opaque)
>  {
>      PCIBus *bus = opaque;
> @@ -227,7 +252,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
>                           const char *name, int devfn_min)
>  {
>      qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
> -    bus->devfn_min = devfn_min;
> +
> +    bus->devfn_min              = devfn_min;
> +    bus->iommu                  = NULL;
> +    bus->translate              = pci_no_translate;
>  
>      /* host bridge */
>      QLIST_INIT(&bus->child);
> @@ -2029,6 +2057,173 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
>      }
>  }
>  
> +void pci_register_iommu(PCIDevice *iommu,
> +                        PCITranslateFunc *translate)
> +{
> +    iommu->bus->iommu = iommu;
> +    iommu->bus->translate = translate;
> +}
> +
> +void pci_memory_rw(PCIDevice *dev,
> +                   pcibus_t addr,
> +                   uint8_t *buf,
> +                   pcibus_t len,
> +                   int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    while (len) {
> +        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +        if (err)
> +            return;
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len)
> +            plen = len;
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +static void pci_memory_register_map(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    pcibus_t len,
> +                                    target_phys_addr_t paddr,
> +                                    PCIInvalidateMapFunc *invalidate,
> +                                    void *invalidate_opaque)
> +{
> +    PCIMemoryMap *map;
> +
> +    map = qemu_malloc(sizeof(PCIMemoryMap));
> +    map->addr               = addr;
> +    map->len                = len;
> +    map->paddr              = paddr;
> +    map->invalidate         = invalidate;
> +    map->invalidate_opaque  = invalidate_opaque;
> +
> +    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
> +}
> +
> +static void pci_memory_unregister_map(PCIDevice *dev,
> +                                      target_phys_addr_t paddr,
> +                                      target_phys_addr_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (map->paddr == paddr && map->len == len) {
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void pci_memory_invalidate_range(PCIDevice *dev,
> +                                 pcibus_t addr,
> +                                 pcibus_t len)
> +{
> +    PCIMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->memory_maps, list) {
> +        if (range_covers_range(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *pci_memory_map(PCIDevice *dev,
> +                     PCIInvalidateMapFunc *cb,
> +                     void *opaque,
> +                     pcibus_t addr,
> +                     target_phys_addr_t *len,
> +                     int is_write)
> +{
> +    int err;
> +    unsigned perms;
> +    PCIDevice *iommu = dev->bus->iommu;
> +    target_phys_addr_t paddr, plen;
> +
> +    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
> +
> +    plen = *len;
> +    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
> +    if (err)
> +        return NULL;
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len)
> +        *len = plen;
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb)
> +        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
> +
> +    return cpu_physical_memory_map(paddr, len, is_write);
> +}
> +
> +void pci_memory_unmap(PCIDevice *dev,
> +                      void *buffer,
> +                      target_phys_addr_t len,
> +                      int is_write,
> +                      target_phys_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
> +}
> +
> +#define DEFINE_PCI_LD(suffix, size)                                       \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_PCI_ST(suffix, size)                                       \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    err = dev->bus->translate(dev->bus->iommu, dev,                       \
> +                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +
> +DEFINE_PCI_LD(ub, 8)
> +DEFINE_PCI_LD(uw, 16)
> +DEFINE_PCI_LD(l, 32)
> +DEFINE_PCI_LD(q, 64)                  
> +
> +DEFINE_PCI_ST(b, 8)
> +DEFINE_PCI_ST(w, 16)
> +DEFINE_PCI_ST(l, 32)
> +DEFINE_PCI_ST(q, 64)
> +
>  static PCIDeviceInfo bridge_info = {
>      .qdev.name    = "pci-bridge",
>      .qdev.size    = sizeof(PCIBridge),
> diff --git a/hw/pci.h b/hw/pci.h
> index 5a6cdb5..a62bc8e 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -203,6 +203,8 @@ struct PCIDevice {
>          PCICapConfigReadFunc *config_read;
>          PCICapConfigWriteFunc *config_write;
>      } cap;
> +
> +    QLIST_HEAD(, PCIMemoryMap) memory_maps;
>  };
>  
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -440,4 +442,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>      return !(last2 < first1 || last1 < first2);
>  }
>  
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ     (1 << 0)
> +#define IOMMU_PERM_WRITE    (1 << 1)
> +#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> +                             PCIDevice *dev,
> +                             pcibus_t addr,
> +                             target_phys_addr_t *paddr,
> +                             target_phys_addr_t *len,
> +                             unsigned perms);
> +
> +extern void pci_memory_rw(PCIDevice *dev,
> +                          pcibus_t addr,
> +                          uint8_t *buf,
> +                          pcibus_t len,
> +                          int is_write);
> +extern void *pci_memory_map(PCIDevice *dev,
> +                            PCIInvalidateMapFunc *cb,
> +                            void *opaque,
> +                            pcibus_t addr,
> +                            target_phys_addr_t *len,
> +                            int is_write);
> +extern void pci_memory_unmap(PCIDevice *dev,
> +                             void *buffer,
> +                             target_phys_addr_t len,
> +                             int is_write,
> +                             target_phys_addr_t access_len);
> +extern void pci_register_iommu(PCIDevice *dev,
> +                               PCITranslateFunc *translate);
> +extern void pci_memory_invalidate_range(PCIDevice *dev,
> +                                        pcibus_t addr,
> +                                        pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size)                                    \
> +extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size)                                    \
> +extern void pci_st##suffix(PCIDevice *dev,                              \
> +                           pcibus_t addr,                               \
> +                           uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)                  
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> +                                   pcibus_t addr,
> +                                   uint8_t *buf,
> +                                   pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> +                                    pcibus_t addr,
> +                                    const uint8_t *buf,
> +                                    pcibus_t len)
> +{
> +    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
>  #endif
> diff --git a/qemu-common.h b/qemu-common.h
> index 3fb2f0b..40c6d58 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
>  typedef struct PCIHostState PCIHostState;
>  typedef struct PCIExpressHost PCIExpressHost;
>  typedef struct PCIBus PCIBus;
> +typedef struct PCIMemoryMap PCIMemoryMap;
>  typedef struct PCIDevice PCIDevice;
>  typedef struct SerialState SerialState;
>  typedef struct IRQState *qemu_irq;
> -- 
> 1.7.1
> 
> 

-- 
yamahata

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02 16:05         ` Stefan Weil
@ 2010-09-02 16:14           ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 16:14 UTC (permalink / raw)
  To: Stefan Weil; +Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 06:05:39PM +0200, Stefan Weil wrote:
> Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:

[snip]

> >> The functions pci_memory_read and pci_memory_write not only read
> >> or write byte data but many different data types which leads to
> >> a lot of type casts in your other patches.
> >>
> >> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> >> Then all those type casts could be removed.
> >>
> >> Regards
> >> Stefan Weil
> >>      
> > I only followed an approach similar to how cpu_physical_memory_{read,write}()
> > is defined. I think I should change both cpu_physical_memory_* stuff and
> > pci_memory_* stuff, not only the latter, if I decide to go on that
> > approach.
> >
> >
> > 	Eduard
> >    
> 
> 
> Yes, cpu_physical_memory_read, cpu_physical_memory_write
> and cpu_physical_memory_rw should be changed, too.
> 
> They also require several type casts today.
> 
> But this change can be done in an independent patch.
> 
> Stefan

Roger, I'm on it. The existing casts could remain there AFAICT, so it's
a pretty simple change.


	Eduard


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 16:14           ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02 16:14 UTC (permalink / raw)
  To: Stefan Weil; +Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 06:05:39PM +0200, Stefan Weil wrote:
> Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:

[snip]

> >> The functions pci_memory_read and pci_memory_write not only read
> >> or write byte data but many different data types which leads to
> >> a lot of type casts in your other patches.
> >>
> >> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> >> Then all those type casts could be removed.
> >>
> >> Regards
> >> Stefan Weil
> >>      
> > I only followed an approach similar to how cpu_physical_memory_{read,write}()
> > is defined. I think I should change both cpu_physical_memory_* stuff and
> > pci_memory_* stuff, not only the latter, if I decide to go on that
> > approach.
> >
> >
> > 	Eduard
> >    
> 
> 
> Yes, cpu_physical_memory_read, cpu_physical_memory_write
> and cpu_physical_memory_rw should be changed, too.
> 
> They also require several type casts today.
> 
> But this change can be done in an independent patch.
> 
> Stefan

Roger, I'm on it. The existing casts could remain there AFAICT, so it's
a pretty simple change.


	Eduard

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  8:51       ` Eduard - Gabriel Munteanu
@ 2010-09-02 16:05         ` Stefan Weil
  -1 siblings, 0 replies; 42+ messages in thread
From: Stefan Weil @ 2010-09-02 16:05 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>    
>> Please see my comments at the end of this mail.
>>
>>
>> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
>>      
>>> PCI devices should access memory through pci_memory_*() instead of
>>> cpu_physical_memory_*(). This also provides support for translation and
>>> access checking in case an IOMMU is emulated.
>>>
>>> Memory maps are treated as remote IOTLBs (that is, translation caches
>>> belonging to the IOMMU-aware device itself). Clients (devices) must
>>> provide callbacks for map invalidation in case these maps are
>>> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>>>
>>> Signed-off-by: Eduard - Gabriel Munteanu<eduard.munteanu@linux360.ro>
>>> ---
>>>        
> [snip]
>
>    
>>> +static inline void pci_memory_read(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>> +}
>>> +
>>> +static inline void pci_memory_write(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + const uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>> +}
>>> +
>>> #endif
>>>        
>> The functions pci_memory_read and pci_memory_write not only read
>> or write byte data but many different data types which leads to
>> a lot of type casts in your other patches.
>>
>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>> Then all those type casts could be removed.
>>
>> Regards
>> Stefan Weil
>>      
> I only followed an approach similar to how cpu_physical_memory_{read,write}()
> is defined. I think I should change both cpu_physical_memory_* stuff and
> pci_memory_* stuff, not only the latter, if I decide to go on that
> approach.
>
>
> 	Eduard
>    


Yes, cpu_physical_memory_read, cpu_physical_memory_write
and cpu_physical_memory_rw should be changed, too.

They also require several type casts today.

But this change can be done in an independent patch.

Stefan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 16:05         ` Stefan Weil
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Weil @ 2010-09-02 16:05 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Am 02.09.2010 10:51, schrieb Eduard - Gabriel Munteanu:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>    
>> Please see my comments at the end of this mail.
>>
>>
>> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
>>      
>>> PCI devices should access memory through pci_memory_*() instead of
>>> cpu_physical_memory_*(). This also provides support for translation and
>>> access checking in case an IOMMU is emulated.
>>>
>>> Memory maps are treated as remote IOTLBs (that is, translation caches
>>> belonging to the IOMMU-aware device itself). Clients (devices) must
>>> provide callbacks for map invalidation in case these maps are
>>> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>>>
>>> Signed-off-by: Eduard - Gabriel Munteanu<eduard.munteanu@linux360.ro>
>>> ---
>>>        
> [snip]
>
>    
>>> +static inline void pci_memory_read(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>> +}
>>> +
>>> +static inline void pci_memory_write(PCIDevice *dev,
>>> + pcibus_t addr,
>>> + const uint8_t *buf,
>>> + pcibus_t len)
>>> +{
>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>> +}
>>> +
>>> #endif
>>>        
>> The functions pci_memory_read and pci_memory_write not only read
>> or write byte data but many different data types which leads to
>> a lot of type casts in your other patches.
>>
>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>> Then all those type casts could be removed.
>>
>> Regards
>> Stefan Weil
>>      
> I only followed an approach similar to how cpu_physical_memory_{read,write}()
> is defined. I think I should change both cpu_physical_memory_* stuff and
> pci_memory_* stuff, not only the latter, if I decide to go on that
> approach.
>
>
> 	Eduard
>    


Yes, cpu_physical_memory_read, cpu_physical_memory_write
and cpu_physical_memory_rw should be changed, too.

They also require several type casts today.

But this change can be done in an independent patch.

Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  9:08         ` Eduard - Gabriel Munteanu
@ 2010-09-02 13:24           ` Anthony Liguori
  -1 siblings, 0 replies; 42+ messages in thread
From: Anthony Liguori @ 2010-09-02 13:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: Michael S. Tsirkin, Stefan Weil, kvm, joro, qemu-devel,
	blauwirbel, yamahata, paul, avi

On 09/02/2010 04:08 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
>    
>> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>>      
>>>> +static inline void pci_memory_read(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>>> +}
>>>> +
>>>> +static inline void pci_memory_write(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + const uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>>> +}
>>>> +
>>>> #endif
>>>>          
>>> The functions pci_memory_read and pci_memory_write not only read
>>> or write byte data but many different data types which leads to
>>> a lot of type casts in your other patches.
>>>
>>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>>> Then all those type casts could be removed.
>>>
>>> Regards
>>> Stefan Weil
>>>        
>> Further, I am not sure pcibus_t is a good type to use here.
>> This also forces use of pci specific types in e.g. ide, or resorting to
>> casts as this patch does. We probably should use a more generic type
>> for this.
>>      
> It only forces use of PCI-specific types in the IDE controller, which is
> already a PCI device.
>    

But IDE controllers are not always PCI devices...  This isn't an issue 
with your patch, per-say, but with how we're modelling the IDE 
controller today.  There's no great solution but I think your patch is 
an improvement over what we have today.

I do agree with Stefan though that void * would make a lot more sense.

Regards,

Anthony Liguori

> 	Eduard
>
>    
>> -- 
>> MST
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02 13:24           ` Anthony Liguori
  0 siblings, 0 replies; 42+ messages in thread
From: Anthony Liguori @ 2010-09-02 13:24 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, Michael S. Tsirkin, joro, qemu-devel, blauwirbel, paul, avi,
	yamahata

On 09/02/2010 04:08 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
>    
>> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
>>      
>>>> +static inline void pci_memory_read(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, buf, len, 0);
>>>> +}
>>>> +
>>>> +static inline void pci_memory_write(PCIDevice *dev,
>>>> + pcibus_t addr,
>>>> + const uint8_t *buf,
>>>> + pcibus_t len)
>>>> +{
>>>> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
>>>> +}
>>>> +
>>>> #endif
>>>>          
>>> The functions pci_memory_read and pci_memory_write not only read
>>> or write byte data but many different data types which leads to
>>> a lot of type casts in your other patches.
>>>
>>> I'd prefer "void *buf" and "const void *buf" in the argument lists.
>>> Then all those type casts could be removed.
>>>
>>> Regards
>>> Stefan Weil
>>>        
>> Further, I am not sure pcibus_t is a good type to use here.
>> This also forces use of pci specific types in e.g. ide, or resorting to
>> casts as this patch does. We probably should use a more generic type
>> for this.
>>      
> It only forces use of PCI-specific types in the IDE controller, which is
> already a PCI device.
>    

But IDE controllers are not always PCI devices...  This isn't an issue 
with your patch, per-say, but with how we're modelling the IDE 
controller today.  There's no great solution but I think your patch is 
an improvement over what we have today.

I do agree with Stefan though that void * would make a lot more sense.

Regards,

Anthony Liguori

> 	Eduard
>
>    
>> -- 
>> MST
>>      
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-02  6:00       ` Michael S. Tsirkin
@ 2010-09-02  9:08         ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Weil, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> > >+static inline void pci_memory_read(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, buf, len, 0);
> > >+}
> > >+
> > >+static inline void pci_memory_write(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ const uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > >+}
> > >+
> > >#endif
> > 
> > The functions pci_memory_read and pci_memory_write not only read
> > or write byte data but many different data types which leads to
> > a lot of type casts in your other patches.
> > 
> > I'd prefer "void *buf" and "const void *buf" in the argument lists.
> > Then all those type casts could be removed.
> > 
> > Regards
> > Stefan Weil
> 
> Further, I am not sure pcibus_t is a good type to use here.
> This also forces use of pci specific types in e.g. ide, or resorting to
> casts as this patch does. We probably should use a more generic type
> for this.

It only forces use of PCI-specific types in the IDE controller, which is
already a PCI device.


	Eduard

> -- 
> MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  9:08         ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  9:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, yamahata, joro, qemu-devel, blauwirbel, paul, avi

On Thu, Sep 02, 2010 at 09:00:46AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> > >+static inline void pci_memory_read(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, buf, len, 0);
> > >+}
> > >+
> > >+static inline void pci_memory_write(PCIDevice *dev,
> > >+ pcibus_t addr,
> > >+ const uint8_t *buf,
> > >+ pcibus_t len)
> > >+{
> > >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > >+}
> > >+
> > >#endif
> > 
> > The functions pci_memory_read and pci_memory_write not only read
> > or write byte data but many different data types which leads to
> > a lot of type casts in your other patches.
> > 
> > I'd prefer "void *buf" and "const void *buf" in the argument lists.
> > Then all those type casts could be removed.
> > 
> > Regards
> > Stefan Weil
> 
> Further, I am not sure pcibus_t is a good type to use here.
> This also forces use of pci specific types in e.g. ide, or resorting to
> casts as this patch does. We probably should use a more generic type
> for this.

It only forces use of PCI-specific types in the IDE controller, which is
already a PCI device.


	Eduard

> -- 
> MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-01 20:10     ` Stefan Weil
@ 2010-09-02  8:51       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:51 UTC (permalink / raw)
  To: Stefan Weil; +Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> Please see my comments at the end of this mail.
> 
> 
> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> >
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void pci_memory_read(PCIDevice *dev,
> > + pcibus_t addr,
> > + uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, buf, len, 0);
> > +}
> > +
> > +static inline void pci_memory_write(PCIDevice *dev,
> > + pcibus_t addr,
> > + const uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > +}
> > +
> > #endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

I only followed an approach similar to how cpu_physical_memory_{read,write}()
is defined. I think I should change both cpu_physical_memory_* stuff and
pci_memory_* stuff, not only the latter, if I decide to go on that
approach.


	Eduard


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  8:51       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-09-02  8:51 UTC (permalink / raw)
  To: Stefan Weil; +Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> Please see my comments at the end of this mail.
> 
> 
> Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> > PCI devices should access memory through pci_memory_*() instead of
> > cpu_physical_memory_*(). This also provides support for translation and
> > access checking in case an IOMMU is emulated.
> >
> > Memory maps are treated as remote IOTLBs (that is, translation caches
> > belonging to the IOMMU-aware device itself). Clients (devices) must
> > provide callbacks for map invalidation in case these maps are
> > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> > ---

[snip]

> > +static inline void pci_memory_read(PCIDevice *dev,
> > + pcibus_t addr,
> > + uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, buf, len, 0);
> > +}
> > +
> > +static inline void pci_memory_write(PCIDevice *dev,
> > + pcibus_t addr,
> > + const uint8_t *buf,
> > + pcibus_t len)
> > +{
> > + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> > +}
> > +
> > #endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

I only followed an approach similar to how cpu_physical_memory_{read,write}()
is defined. I think I should change both cpu_physical_memory_* stuff and
pci_memory_* stuff, not only the latter, if I decide to go on that
approach.


	Eduard

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-09-01 20:10     ` Stefan Weil
@ 2010-09-02  6:00       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  6:00 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Eduard - Gabriel Munteanu, kvm, joro, qemu-devel, blauwirbel,
	yamahata, paul, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> >+static inline void pci_memory_read(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, buf, len, 0);
> >+}
> >+
> >+static inline void pci_memory_write(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ const uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> >+}
> >+
> >#endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

Further, I am not sure pcibus_t is a good type to use here.
This also forces use of pci specific types in e.g. ide, or resorting to
casts as this patch does. We probably should use a more generic type
for this.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-02  6:00       ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2010-09-02  6:00 UTC (permalink / raw)
  To: Stefan Weil
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

On Wed, Sep 01, 2010 at 10:10:30PM +0200, Stefan Weil wrote:
> >+static inline void pci_memory_read(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, buf, len, 0);
> >+}
> >+
> >+static inline void pci_memory_write(PCIDevice *dev,
> >+ pcibus_t addr,
> >+ const uint8_t *buf,
> >+ pcibus_t len)
> >+{
> >+ pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> >+}
> >+
> >#endif
> 
> The functions pci_memory_read and pci_memory_write not only read
> or write byte data but many different data types which leads to
> a lot of type casts in your other patches.
> 
> I'd prefer "void *buf" and "const void *buf" in the argument lists.
> Then all those type casts could be removed.
> 
> Regards
> Stefan Weil

Further, I am not sure pcibus_t is a good type to use here.
This also forces use of pci specific types in e.g. ide, or resorting to
casts as this patch does. We probably should use a more generic type
for this.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-29 22:08 ` [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
@ 2010-09-01 20:10     ` Stefan Weil
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Weil @ 2010-09-01 20:10 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: mst, kvm, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Please see my comments at the end of this mail.


Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
>
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
> hw/pci.c | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> hw/pci.h | 69 +++++++++++++++++++
> hw/pci_internals.h | 12 +++
> qemu-common.h | 1 +
> 4 files changed, 272 insertions(+), 1 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..afcb33c 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
>
> ...
>
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..c95863a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
> char *romfile;
> ram_addr_t rom_offset;
> uint32_t rom_bar;
> +
> + QLIST_HEAD(, PCIMemoryMap) memory_maps;
> };
>
> PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, 
> uint64_t len1,
> return !(last2 < first1 || last1 < first2);
> }
>
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ (1 << 0)
> +#define IOMMU_PERM_WRITE (1 << 1)
> +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms);
> +
> +void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write);
> +void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write);
> +void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len);
> +void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
> +void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, 
> pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size) \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size) \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> + pcibus_t addr,
> + const uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
> #endif

The functions pci_memory_read and pci_memory_write not only read
or write byte data but many different data types which leads to
a lot of type casts in your other patches.

I'd prefer "void *buf" and "const void *buf" in the argument lists.
Then all those type casts could be removed.

Regards
Stefan Weil


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
@ 2010-09-01 20:10     ` Stefan Weil
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Weil @ 2010-09-01 20:10 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu
  Cc: kvm, mst, joro, qemu-devel, blauwirbel, yamahata, paul, avi

Please see my comments at the end of this mail.


Am 30.08.2010 00:08, schrieb Eduard - Gabriel Munteanu:
> PCI devices should access memory through pci_memory_*() instead of
> cpu_physical_memory_*(). This also provides support for translation and
> access checking in case an IOMMU is emulated.
>
> Memory maps are treated as remote IOTLBs (that is, translation caches
> belonging to the IOMMU-aware device itself). Clients (devices) must
> provide callbacks for map invalidation in case these maps are
> persistent beyond the current I/O context, e.g. AIO DMA transfers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
> hw/pci.c | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> hw/pci.h | 69 +++++++++++++++++++
> hw/pci_internals.h | 12 +++
> qemu-common.h | 1 +
> 4 files changed, 272 insertions(+), 1 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 2dc1577..afcb33c 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
>
> ...
>
> diff --git a/hw/pci.h b/hw/pci.h
> index c551f96..c95863a 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -172,6 +172,8 @@ struct PCIDevice {
> char *romfile;
> ram_addr_t rom_offset;
> uint32_t rom_bar;
> +
> + QLIST_HEAD(, PCIMemoryMap) memory_maps;
> };
>
> PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> @@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, 
> uint64_t len1,
> return !(last2 < first1 || last1 < first2);
> }
>
> +/*
> + * Memory I/O and PCI IOMMU definitions.
> + */
> +
> +#define IOMMU_PERM_READ (1 << 0)
> +#define IOMMU_PERM_WRITE (1 << 1)
> +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
> +
> +typedef int PCIInvalidateMapFunc(void *opaque);
> +typedef int PCITranslateFunc(PCIDevice *iommu,
> + PCIDevice *dev,
> + pcibus_t addr,
> + target_phys_addr_t *paddr,
> + target_phys_addr_t *len,
> + unsigned perms);
> +
> +void pci_memory_rw(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len,
> + int is_write);
> +void *pci_memory_map(PCIDevice *dev,
> + PCIInvalidateMapFunc *cb,
> + void *opaque,
> + pcibus_t addr,
> + target_phys_addr_t *len,
> + int is_write);
> +void pci_memory_unmap(PCIDevice *dev,
> + void *buffer,
> + target_phys_addr_t len,
> + int is_write,
> + target_phys_addr_t access_len);
> +void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
> +void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, 
> pcibus_t len);
> +
> +#define DECLARE_PCI_LD(suffix, size) \
> +uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
> +
> +#define DECLARE_PCI_ST(suffix, size) \
> +void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
> +
> +DECLARE_PCI_LD(ub, 8)
> +DECLARE_PCI_LD(uw, 16)
> +DECLARE_PCI_LD(l, 32)
> +DECLARE_PCI_LD(q, 64)
> +
> +DECLARE_PCI_ST(b, 8)
> +DECLARE_PCI_ST(w, 16)
> +DECLARE_PCI_ST(l, 32)
> +DECLARE_PCI_ST(q, 64)
> +
> +static inline void pci_memory_read(PCIDevice *dev,
> + pcibus_t addr,
> + uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void pci_memory_write(PCIDevice *dev,
> + pcibus_t addr,
> + const uint8_t *buf,
> + pcibus_t len)
> +{
> + pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
> +}
> +
> #endif

The functions pci_memory_read and pci_memory_write not only read
or write byte data but many different data types which leads to
a lot of type casts in your other patches.

I'd prefer "void *buf" and "const void *buf" in the argument lists.
Then all those type casts could be removed.

Regards
Stefan Weil

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-29 20:44 [PATCH 0/7] AMD IOMMU emulation patchset v4 Blue Swirl
@ 2010-08-29 22:08 ` Eduard - Gabriel Munteanu
  2010-09-01 20:10     ` Stefan Weil
  0 siblings, 1 reply; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-29 22:08 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  191 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   69 +++++++++++++++++++
 hw/pci_internals.h |   12 +++
 qemu-common.h      |    1 +
 4 files changed, 272 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..afcb33c 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,176 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err) {
+            return;
+        }
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len) {
+            plen = len;
+        }
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err) {
+        return NULL;
+    }
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len) {
+        *len = plen;
+    }
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb) {
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+    }
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8)) {                                       \
+        return 0;                                                         \
+    }                                                                     \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8)) {                                       \
+        return;                                                           \
+    }                                                                     \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..c95863a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,71 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write);
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write);
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len);
+void pci_register_iommu(PCIDevice *dev, PCITranslateFunc *translate);
+void pci_memory_invalidate_range(PCIDevice *dev, pcibus_t addr, pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support
  2010-08-28 14:54 [PATCH 0/7] AMD IOMMU emulation patchset v4 Eduard - Gabriel Munteanu
@ 2010-08-28 14:54 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 42+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-08-28 14:54 UTC (permalink / raw)
  To: mst
  Cc: kvm, joro, qemu-devel, blauwirbel, yamahata, paul,
	Eduard - Gabriel Munteanu, avi

PCI devices should access memory through pci_memory_*() instead of
cpu_physical_memory_*(). This also provides support for translation and
access checking in case an IOMMU is emulated.

Memory maps are treated as remote IOTLBs (that is, translation caches
belonging to the IOMMU-aware device itself). Clients (devices) must
provide callbacks for map invalidation in case these maps are
persistent beyond the current I/O context, e.g. AIO DMA transfers.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c           |  185 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/pci.h           |   74 +++++++++++++++++++++
 hw/pci_internals.h |   12 ++++
 qemu-common.h      |    1 +
 4 files changed, 271 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 2dc1577..b460905 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -158,6 +158,19 @@ static void pci_device_reset(PCIDevice *dev)
     pci_update_mappings(dev);
 }
 
+static int pci_no_translate(PCIDevice *iommu,
+                            PCIDevice *dev,
+                            pcibus_t addr,
+                            target_phys_addr_t *paddr,
+                            target_phys_addr_t *len,
+                            unsigned perms)
+{
+    *paddr = addr;
+    *len = -1;
+
+    return 0;
+}
+
 static void pci_bus_reset(void *opaque)
 {
     PCIBus *bus = opaque;
@@ -220,7 +233,10 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 {
     qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
     assert(PCI_FUNC(devfn_min) == 0);
-    bus->devfn_min = devfn_min;
+
+    bus->devfn_min  = devfn_min;
+    bus->iommu      = NULL;
+    bus->translate  = pci_no_translate;
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -1789,3 +1805,170 @@ static char *pcibus_get_dev_path(DeviceState *dev)
     return strdup(path);
 }
 
+void pci_register_iommu(PCIDevice *iommu,
+                        PCITranslateFunc *translate)
+{
+    iommu->bus->iommu = iommu;
+    iommu->bus->translate = translate;
+}
+
+void pci_memory_rw(PCIDevice *dev,
+                   pcibus_t addr,
+                   uint8_t *buf,
+                   pcibus_t len,
+                   int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    while (len) {
+        err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return;
+
+        /* The translation might be valid for larger regions. */
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    }
+}
+
+static void pci_memory_register_map(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    pcibus_t len,
+                                    target_phys_addr_t paddr,
+                                    PCIInvalidateMapFunc *invalidate,
+                                    void *invalidate_opaque)
+{
+    PCIMemoryMap *map;
+
+    map = qemu_malloc(sizeof(PCIMemoryMap));
+    map->addr               = addr;
+    map->len                = len;
+    map->paddr              = paddr;
+    map->invalidate         = invalidate;
+    map->invalidate_opaque  = invalidate_opaque;
+
+    QLIST_INSERT_HEAD(&dev->memory_maps, map, list);
+}
+
+static void pci_memory_unregister_map(PCIDevice *dev,
+                                      target_phys_addr_t paddr,
+                                      target_phys_addr_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (map->paddr == paddr && map->len == len) {
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void pci_memory_invalidate_range(PCIDevice *dev,
+                                 pcibus_t addr,
+                                 pcibus_t len)
+{
+    PCIMemoryMap *map;
+
+    QLIST_FOREACH(map, &dev->memory_maps, list) {
+        if (ranges_overlap(addr, len, map->addr, map->len)) {
+            map->invalidate(map->invalidate_opaque);
+            QLIST_REMOVE(map, list);
+            free(map);
+        }
+    }
+}
+
+void *pci_memory_map(PCIDevice *dev,
+                     PCIInvalidateMapFunc *cb,
+                     void *opaque,
+                     pcibus_t addr,
+                     target_phys_addr_t *len,
+                     int is_write)
+{
+    int err;
+    unsigned perms;
+    PCIDevice *iommu = dev->bus->iommu;
+    target_phys_addr_t paddr, plen;
+
+    perms = is_write ? IOMMU_PERM_WRITE : IOMMU_PERM_READ;
+
+    plen = *len;
+    err = dev->bus->translate(iommu, dev, addr, &paddr, &plen, perms);
+    if (err)
+        return NULL;
+
+    /*
+     * If this is true, the virtual region is contiguous,
+     * but the translated physical region isn't. We just
+     * clamp *len, much like cpu_physical_memory_map() does.
+     */
+    if (plen < *len)
+        *len = plen;
+
+    /* We treat maps as remote TLBs to cope with stuff like AIO. */
+    if (cb)
+        pci_memory_register_map(dev, addr, *len, paddr, cb, opaque);
+
+    return cpu_physical_memory_map(paddr, len, is_write);
+}
+
+void pci_memory_unmap(PCIDevice *dev,
+                      void *buffer,
+                      target_phys_addr_t len,
+                      int is_write,
+                      target_phys_addr_t access_len)
+{
+    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
+    pci_memory_unregister_map(dev, (target_phys_addr_t) buffer, len);
+}
+
+#define DEFINE_PCI_LD(suffix, size)                                       \
+uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr)              \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_READ);      \
+    if (err || (plen < size / 8))                                         \
+        return 0;                                                         \
+                                                                          \
+    return ld##suffix##_phys(paddr);                                      \
+}
+
+#define DEFINE_PCI_ST(suffix, size)                                       \
+void pci_st##suffix(PCIDevice *dev, pcibus_t addr, uint##size##_t val)    \
+{                                                                         \
+    int err;                                                              \
+    target_phys_addr_t paddr, plen;                                       \
+                                                                          \
+    err = dev->bus->translate(dev->bus->iommu, dev,                       \
+                              addr, &paddr, &plen, IOMMU_PERM_WRITE);     \
+    if (err || (plen < size / 8))                                         \
+        return;                                                           \
+                                                                          \
+    st##suffix##_phys(paddr, val);                                        \
+}
+
+DEFINE_PCI_LD(ub, 8)
+DEFINE_PCI_LD(uw, 16)
+DEFINE_PCI_LD(l, 32)
+DEFINE_PCI_LD(q, 64)
+
+DEFINE_PCI_ST(b, 8)
+DEFINE_PCI_ST(w, 16)
+DEFINE_PCI_ST(l, 32)
+DEFINE_PCI_ST(q, 64)
+
diff --git a/hw/pci.h b/hw/pci.h
index c551f96..3131016 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,8 @@ struct PCIDevice {
     char *romfile;
     ram_addr_t rom_offset;
     uint32_t rom_bar;
+
+    QLIST_HEAD(, PCIMemoryMap) memory_maps;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
@@ -391,4 +393,76 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Memory I/O and PCI IOMMU definitions.
+ */
+
+#define IOMMU_PERM_READ     (1 << 0)
+#define IOMMU_PERM_WRITE    (1 << 1)
+#define IOMMU_PERM_RW       (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+typedef int PCIInvalidateMapFunc(void *opaque);
+typedef int PCITranslateFunc(PCIDevice *iommu,
+                             PCIDevice *dev,
+                             pcibus_t addr,
+                             target_phys_addr_t *paddr,
+                             target_phys_addr_t *len,
+                             unsigned perms);
+
+extern void pci_memory_rw(PCIDevice *dev,
+                          pcibus_t addr,
+                          uint8_t *buf,
+                          pcibus_t len,
+                          int is_write);
+extern void *pci_memory_map(PCIDevice *dev,
+                            PCIInvalidateMapFunc *cb,
+                            void *opaque,
+                            pcibus_t addr,
+                            target_phys_addr_t *len,
+                            int is_write);
+extern void pci_memory_unmap(PCIDevice *dev,
+                             void *buffer,
+                             target_phys_addr_t len,
+                             int is_write,
+                             target_phys_addr_t access_len);
+extern void pci_register_iommu(PCIDevice *dev,
+                               PCITranslateFunc *translate);
+extern void pci_memory_invalidate_range(PCIDevice *dev,
+                                        pcibus_t addr,
+                                        pcibus_t len);
+
+#define DECLARE_PCI_LD(suffix, size)                                    \
+extern uint##size##_t pci_ld##suffix(PCIDevice *dev, pcibus_t addr);
+
+#define DECLARE_PCI_ST(suffix, size)                                    \
+extern void pci_st##suffix(PCIDevice *dev,                              \
+                           pcibus_t addr,                               \
+                           uint##size##_t val);
+
+DECLARE_PCI_LD(ub, 8)
+DECLARE_PCI_LD(uw, 16)
+DECLARE_PCI_LD(l, 32)
+DECLARE_PCI_LD(q, 64)
+
+DECLARE_PCI_ST(b, 8)
+DECLARE_PCI_ST(w, 16)
+DECLARE_PCI_ST(l, 32)
+DECLARE_PCI_ST(q, 64)
+
+static inline void pci_memory_read(PCIDevice *dev,
+                                   pcibus_t addr,
+                                   uint8_t *buf,
+                                   pcibus_t len)
+{
+    pci_memory_rw(dev, addr, buf, len, 0);
+}
+
+static inline void pci_memory_write(PCIDevice *dev,
+                                    pcibus_t addr,
+                                    const uint8_t *buf,
+                                    pcibus_t len)
+{
+    pci_memory_rw(dev, addr, (uint8_t *) buf, len, 1);
+}
+
 #endif
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index e3c93a3..fb134b9 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -33,6 +33,9 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    PCIDevice                       *iommu;
+    PCITranslateFunc                *translate;
 };
 
 struct PCIBridge {
@@ -44,4 +47,13 @@ struct PCIBridge {
     const char *bus_name;
 };
 
+struct PCIMemoryMap {
+    pcibus_t                        addr;
+    pcibus_t                        len;
+    target_phys_addr_t              paddr;
+    PCIInvalidateMapFunc            *invalidate;
+    void                            *invalidate_opaque;
+    QLIST_ENTRY(PCIMemoryMap)       list;
+};
+
 #endif /* QEMU_PCI_INTERNALS_H */
diff --git a/qemu-common.h b/qemu-common.h
index d735235..8b060e8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -218,6 +218,7 @@ typedef struct SMBusDevice SMBusDevice;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIBus PCIBus;
+typedef struct PCIMemoryMap PCIMemoryMap;
 typedef struct PCIDevice PCIDevice;
 typedef struct PCIBridge PCIBridge;
 typedef struct SerialState SerialState;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-09-02 16:16 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-15 19:27 [PATCH 0/7] AMD IOMMU emulation patches v3 Eduard - Gabriel Munteanu
2010-08-15 19:27 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-15 19:27 ` [PATCH 1/7] pci: add range_covers_range() Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-18  4:39   ` Isaku Yamahata
2010-08-18  4:39     ` Isaku Yamahata
2010-08-15 19:27 ` [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-18  5:02   ` Isaku Yamahata
2010-08-18  5:02     ` Isaku Yamahata
2010-08-15 19:27 ` [PATCH 3/7] AMD IOMMU emulation Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-16 17:57   ` Blue Swirl
2010-08-16 17:57     ` [Qemu-devel] " Blue Swirl
2010-08-15 19:27 ` [PATCH 4/7] ide: use the PCI memory access interface Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-15 19:27 ` [PATCH 5/7] rtl8139: " Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-15 19:27 ` [PATCH 6/7] eepro100: " Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-15 19:27 ` [PATCH 7/7] ac97: " Eduard - Gabriel Munteanu
2010-08-15 19:27   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-08-15 20:42   ` malc
2010-08-15 20:42     ` malc
2010-08-16  1:47 ` [PATCH 0/7] AMD IOMMU emulation patches v3 Anthony Liguori
2010-08-16  1:47   ` [Qemu-devel] " Anthony Liguori
2010-08-28 14:54 [PATCH 0/7] AMD IOMMU emulation patchset v4 Eduard - Gabriel Munteanu
2010-08-28 14:54 ` [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
2010-08-29 20:44 [PATCH 0/7] AMD IOMMU emulation patchset v4 Blue Swirl
2010-08-29 22:08 ` [Qemu-devel] [PATCH 2/7] pci: memory access API and IOMMU support Eduard - Gabriel Munteanu
2010-09-01 20:10   ` Stefan Weil
2010-09-01 20:10     ` Stefan Weil
2010-09-02  6:00     ` Michael S. Tsirkin
2010-09-02  6:00       ` Michael S. Tsirkin
2010-09-02  9:08       ` Eduard - Gabriel Munteanu
2010-09-02  9:08         ` Eduard - Gabriel Munteanu
2010-09-02 13:24         ` Anthony Liguori
2010-09-02 13:24           ` Anthony Liguori
2010-09-02  8:51     ` Eduard - Gabriel Munteanu
2010-09-02  8:51       ` Eduard - Gabriel Munteanu
2010-09-02 16:05       ` Stefan Weil
2010-09-02 16:05         ` Stefan Weil
2010-09-02 16:14         ` Eduard - Gabriel Munteanu
2010-09-02 16:14           ` Eduard - Gabriel Munteanu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.