All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] AMD IOMMU emulation patchset
@ 2010-07-14  5:45 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Hi everybody,

This is my work on the AMD IOMMU emulation project. I've put this, along
with the SeaBIOS patches (which you need to test), in my Git repos here:

http://repo.or.cz/w/qemu-kvm/amd-iommu.git
http://repo.or.cz/w/seabios/amd-iommu.git

While it works for Linux guests (didn't try anything else), it's not yet
complete. But the ported devices work, and so may others that either
don't do DMA (VGA for example) or they're not on a PCI bus.

Passing devices from a guest to a nested guest also works. It seems no
modifications were needed for that (well, except getting a host kernel
on which nested SVM works right ;), and I'd recommend 2.6.34.1), but I
only tested with the rtl8139. Will test soon with a doubly nested guest
to cover that scenario as well, along with testing other emulated hw.

I'd like your opinions on this. Perhaps you could test it and report if
you find any issues. Some things aren't done/complete yet:
- fixing the theoretical AIO issue, but it's not a priority right now
- actually testing that access checking works (should inject faults)
- implementing skipped translation levels
- a translation cache might be a good idea
- implementing features not used by Linux (e.g. interrupt remapping)

That being said, any feedback is welcome.


	Thanks,
	Eduard

P.S.: I'd also like to thank Paul Brook for his help on figuring out
some aspects of the IOMMU layer.


Eduard - Gabriel Munteanu (7):
  Generic IOMMU layer
  AMD IOMMU emulation
  pci: call IOMMU hooks
  ide: IOMMU support
  rtl8139: IOMMU support
  eepro100: IOMMU support
  ac97: IOMMU support

 Makefile.target |    3 +
 configure       |   11 +
 hw/ac97.c       |   20 ++-
 hw/amd_iommu.c  |  621 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/eepro100.c   |  141 +++++++++----
 hw/ide/core.c   |   46 +++--
 hw/iommu.c      |   82 ++++++++
 hw/iommu.h      |  260 +++++++++++++++++++++++
 hw/pc.c         |    4 +
 hw/pc.h         |    3 +
 hw/pci.c        |   21 ++
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 hw/qdev.h       |    6 +
 hw/rtl8139.c    |   98 ++++++----
 15 files changed, 1225 insertions(+), 94 deletions(-)
 create mode 100644 hw/amd_iommu.c
 create mode 100644 hw/iommu.c
 create mode 100644 hw/iommu.h


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 0/7] AMD IOMMU emulation patchset
@ 2010-07-14  5:45 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Hi everybody,

This is my work on the AMD IOMMU emulation project. I've put this, along
with the SeaBIOS patches (which you need to test), in my Git repos here:

http://repo.or.cz/w/qemu-kvm/amd-iommu.git
http://repo.or.cz/w/seabios/amd-iommu.git

While it works for Linux guests (didn't try anything else), it's not yet
complete. But the ported devices work, and so may others that either
don't do DMA (VGA for example) or they're not on a PCI bus.

Passing devices from a guest to a nested guest also works. It seems no
modifications were needed for that (well, except getting a host kernel
on which nested SVM works right ;), and I'd recommend 2.6.34.1), but I
only tested with the rtl8139. Will test soon with a doubly nested guest
to cover that scenario as well, along with testing other emulated hw.

I'd like your opinions on this. Perhaps you could test it and report if
you find any issues. Some things aren't done/complete yet:
- fixing the theoretical AIO issue, but it's not a priority right now
- actually testing that access checking works (should inject faults)
- implementing skipped translation levels
- a translation cache might be a good idea
- implementing features not used by Linux (e.g. interrupt remapping)

That being said, any feedback is welcome.


	Thanks,
	Eduard

P.S.: I'd also like to thank Paul Brook for his help on figuring out
some aspects of the IOMMU layer.


Eduard - Gabriel Munteanu (7):
  Generic IOMMU layer
  AMD IOMMU emulation
  pci: call IOMMU hooks
  ide: IOMMU support
  rtl8139: IOMMU support
  eepro100: IOMMU support
  ac97: IOMMU support

 Makefile.target |    3 +
 configure       |   11 +
 hw/ac97.c       |   20 ++-
 hw/amd_iommu.c  |  621 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/eepro100.c   |  141 +++++++++----
 hw/ide/core.c   |   46 +++--
 hw/iommu.c      |   82 ++++++++
 hw/iommu.h      |  260 +++++++++++++++++++++++
 hw/pc.c         |    4 +
 hw/pc.h         |    3 +
 hw/pci.c        |   21 ++
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 hw/qdev.h       |    6 +
 hw/rtl8139.c    |   98 ++++++----
 15 files changed, 1225 insertions(+), 94 deletions(-)
 create mode 100644 hw/amd_iommu.c
 create mode 100644 hw/iommu.c
 create mode 100644 hw/iommu.h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [RFC PATCH 1/7] Generic IOMMU layer
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

This provides an API for abstracting IOMMU functions. Hardware emulation
code can use it to request address translation and access checking. In
the absence of an emulated IOMMU, no translation/checking happens and
I/O goes through as before.

IOMMU emulation code must provide implementation-specific hooks for this
layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    1 +
 hw/iommu.c      |   82 +++++++++++++++++
 hw/iommu.h      |  260 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/qdev.h       |    6 ++
 4 files changed, 349 insertions(+), 0 deletions(-)
 create mode 100644 hw/iommu.c
 create mode 100644 hw/iommu.h

diff --git a/Makefile.target b/Makefile.target
index 70a9c1b..3f895ae 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -183,6 +183,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
+obj-$(CONFIG_IOMMU) += iommu.o
 
 # MSI-X depends on kvm for interrupt injection,
 # so moved it from Makefile.objs to Makefile.target for now
diff --git a/hw/iommu.c b/hw/iommu.c
new file mode 100644
index 0000000..511756b
--- /dev/null
+++ b/hw/iommu.c
@@ -0,0 +1,82 @@
+/*
+ * Generic IOMMU layer
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <errno.h>
+
+#include "iommu.h"
+
+struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev)
+{
+    BusState *bus;
+
+    while (dev) {
+        bus = dev->parent_bus;
+        if (!bus)
+            goto out;
+
+        if (bus->iommu) {
+            *real_dev = dev;
+            return bus->iommu;
+        }
+
+        dev = bus->parent;
+    }
+
+out:
+    *real_dev = NULL;
+    return NULL;
+}
+
+int __iommu_rw(struct iommu *iommu,
+               DeviceState *dev,
+               target_phys_addr_t addr,
+               uint8_t *buf,
+               int len,
+               int is_write)
+{
+    int plen, err;
+    target_phys_addr_t paddr;
+    unsigned perms;
+
+    if (!is_write)
+        perms = IOMMU_PERM_READ;
+    else
+        perms = IOMMU_PERM_WRITE;
+
+    do {
+        err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return err;
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    } while (len);
+
+    return 0;
+}
diff --git a/hw/iommu.h b/hw/iommu.h
new file mode 100644
index 0000000..01996a6
--- /dev/null
+++ b/hw/iommu.h
@@ -0,0 +1,260 @@
+#ifndef QEMU_IOMMU_H
+#define QEMU_IOMMU_H
+
+#include "pci.h"
+#include "targphys.h"
+#include "qdev.h"
+
+/* Don't use directly. */
+struct iommu {
+    void *opaque;
+
+    void (*register_device)(struct iommu *iommu,
+                            DeviceState *dev);
+    int (*translate)(struct iommu *iommu,
+                     DeviceState *dev,
+                     target_phys_addr_t addr,
+                     target_phys_addr_t *paddr,
+                     int *len,
+                     unsigned perms);
+    int (*start_transaction)(struct iommu *iommu,
+                             DeviceState *dev);
+    void (*end_transaction)(struct iommu *iommu,
+                            DeviceState *dev);
+};
+
+#define IOMMU_PERM_READ   (1 << 0)
+#define IOMMU_PERM_WRITE  (1 << 1)
+
+#define IOMMU_PERM_RW     (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+static inline int iommu_nop_translate(struct iommu *iommu,
+                                      DeviceState *dev,
+                                      target_phys_addr_t addr,
+                                      target_phys_addr_t *paddr,
+                                      int *len,
+                                      unsigned perms)
+{
+    *paddr = addr;
+    *len = INT_MAX;
+
+    return 0;
+}
+
+static inline int iommu_nop_rw(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               uint8_t *buf,
+                               int len,
+                               int is_write)
+{
+    cpu_physical_memory_rw(addr, buf, len, is_write);
+
+    return 0;
+}
+
+static inline int iommu_register_device(struct iommu *iommu,
+                                        DeviceState *dev)
+{
+    if (iommu && iommu->register_device)
+        iommu->register_device(iommu, dev);
+
+    return 0;
+}
+
+#ifdef CONFIG_IOMMU
+
+extern struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev);
+
+/**
+ * Translates an address for the given device and performs access checking.
+ *
+ * Defined in implementation-specific IOMMU code.
+ *
+ * @iommu   IOMMU
+ * @dev     qdev device
+ * @addr    address to be translated
+ * @paddr   translated address
+ * @len     number of bytes for which the translation is valid
+ * @rw      read or write?
+ *
+ * Returns 0 iff translation and access checking succeeded.
+ */
+static inline int iommu_translate(struct iommu *iommu,
+                                  DeviceState *dev,
+                                  target_phys_addr_t addr,
+                                  target_phys_addr_t *paddr,
+                                  int *len,
+                                  unsigned perms)
+{
+    if (iommu && iommu->translate)
+        return iommu->translate(iommu, dev, addr, paddr, len, perms);
+
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
+
+extern int __iommu_rw(struct iommu *iommu,
+                      DeviceState *dev,
+                      target_phys_addr_t addr,
+                      uint8_t *buf,
+                      int len,
+                      int is_write);
+
+/**
+ * Performs I/O with address translation and access checking.
+ *
+ * Defined in generic IOMMU code.
+ *
+ * @iommu   IOMMU
+ * @dev     qdev device
+ * @addr    address where to perform I/O
+ * @buf     buffer to read from or write to
+ * @len     length of the operation
+ * @rw      read or write?
+ *
+ * Returns 0 iff the I/O operation succeeded.
+ */
+static inline int iommu_rw(struct iommu *iommu,
+                           DeviceState *dev,
+                           target_phys_addr_t addr,
+                           uint8_t *buf,
+                           int len,
+                           int is_write)
+{
+    if (iommu && iommu->translate)
+        return __iommu_rw(iommu, dev, addr, buf, len, is_write);
+
+    return iommu_nop_rw(iommu, dev, addr, buf, len, is_write);
+}
+
+static inline int iommu_start_transaction(struct iommu *iommu,
+                                          DeviceState *dev)
+{
+    if (iommu && iommu->start_transaction)
+        return iommu->start_transaction(iommu, dev);
+
+    return 0;
+}
+
+static inline void iommu_end_transaction(struct iommu *iommu,
+                                         DeviceState *dev)
+{
+    if (iommu && iommu->end_transaction)
+        iommu->end_transaction(iommu, dev);
+}
+
+#define DEFINE_LD_PHYS(suffix, size)                                        \
+static inline uint##size##_t iommu_ld##suffix(struct iommu *iommu,          \
+                                             DeviceState *dev,              \
+                                             target_phys_addr_t addr)       \
+{                                                                           \
+    int len, err;                                                           \
+    target_phys_addr_t paddr;                                               \
+                                                                            \
+    err = iommu_translate(iommu, dev, addr, &paddr, &len, IOMMU_PERM_READ); \
+    if (err || (len < size / 8))                                            \
+        return err;                                                         \
+    return ld##suffix##_phys(paddr);                                        \
+}
+
+#define DEFINE_ST_PHYS(suffix, size)                                        \
+static inline void iommu_st##suffix(struct iommu *iommu,                    \
+                                    DeviceState *dev,                       \
+                                    target_phys_addr_t addr,                \
+                                    uint##size##_t val)                     \
+{                                                                           \
+    int len, err;                                                           \
+    target_phys_addr_t paddr;                                               \
+                                                                            \
+    err = iommu_translate(iommu, dev, addr, &paddr, &len, IOMMU_PERM_WRITE);\
+    if (err || (len < size / 8))                                            \
+        return;                                                             \
+    st##suffix##_phys(paddr, val);                                          \
+}
+
+#else /* CONFIG_IOMMU */
+
+static inline struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev)
+{
+    return NULL;
+}
+
+static inline int iommu_translate(struct iommu *iommu,
+                                  DeviceState *dev,
+                                  target_phys_addr_t addr,
+                                  target_phys_addr_t *paddr,
+                                  int *len,
+                                  unsigned perms)
+{
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
+
+static inline int iommu_rw(struct iommu *iommu,
+                           DeviceState *dev,
+                           target_phys_addr_t addr,
+                           uint8_t *buf,
+                           int len,
+                           int is_write)
+{
+    return iommu_nop_rw(iommu, dev, addr, buf, len, is_write);
+}
+
+static inline int iommu_start_transaction(struct iommu *iommu,
+                                          DeviceState *dev)
+{
+    return 0;
+}
+
+static inline void iommu_end_transaction(struct iommu *iommu,
+                                         DeviceState *dev)
+{
+}
+
+#define DEFINE_LD_PHYS(suffix, size)                                        \
+static inline uint##size##_t iommu_ld##suffix(struct iommu *iommu,          \
+                                             DeviceState *dev,              \
+                                             target_phys_addr_t addr)       \
+{                                                                           \
+    return ld##suffix##_phys(addr);                                         \
+}
+
+#define DEFINE_ST_PHYS(suffix, size)                                        \
+static inline void iommu_st##suffix(struct iommu *iommu,                    \
+                                    DeviceState *dev,                       \
+                                    target_phys_addr_t addr,                \
+                                    uint##size##_t val)                     \
+{                                                                           \
+    st##suffix##_phys(addr, val);                                           \
+}
+
+#endif /* CONFIG_IOMMU */
+
+static inline int iommu_read(struct iommu *iommu,
+                             DeviceState *dev,
+                             target_phys_addr_t addr,
+                             uint8_t *buf,
+                             int len)
+{
+    return iommu_rw(iommu, dev, addr, buf, len, 0);
+}
+
+static inline int iommu_write(struct iommu *iommu,
+                              DeviceState *dev,
+                              target_phys_addr_t addr,
+                              const uint8_t *buf,
+                              int len)
+{
+    return iommu_rw(iommu, dev, addr, (uint8_t *) buf, len, 1);
+}
+
+DEFINE_LD_PHYS(ub, 8)
+DEFINE_LD_PHYS(uw, 16)
+DEFINE_LD_PHYS(l, 32)
+DEFINE_LD_PHYS(q, 64)
+
+DEFINE_ST_PHYS(b, 8)
+DEFINE_ST_PHYS(w, 16)
+DEFINE_ST_PHYS(l, 32)
+DEFINE_ST_PHYS(q, 64)
+
+#endif
diff --git a/hw/qdev.h b/hw/qdev.h
index be5ad67..deb71fd 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -56,6 +56,8 @@ struct BusInfo {
     Property *props;
 };
 
+struct iommu;
+
 struct BusState {
     DeviceState *parent;
     BusInfo *info;
@@ -64,6 +66,10 @@ struct BusState {
     int qdev_allocated;
     QLIST_HEAD(, DeviceState) children;
     QLIST_ENTRY(BusState) sibling;
+
+#ifdef CONFIG_IOMMU
+    struct iommu *iommu;
+#endif
 };
 
 struct Property {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 1/7] Generic IOMMU layer
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

This provides an API for abstracting IOMMU functions. Hardware emulation
code can use it to request address translation and access checking. In
the absence of an emulated IOMMU, no translation/checking happens and
I/O goes through as before.

IOMMU emulation code must provide implementation-specific hooks for this
layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    1 +
 hw/iommu.c      |   82 +++++++++++++++++
 hw/iommu.h      |  260 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/qdev.h       |    6 ++
 4 files changed, 349 insertions(+), 0 deletions(-)
 create mode 100644 hw/iommu.c
 create mode 100644 hw/iommu.h

diff --git a/Makefile.target b/Makefile.target
index 70a9c1b..3f895ae 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -183,6 +183,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
+obj-$(CONFIG_IOMMU) += iommu.o
 
 # MSI-X depends on kvm for interrupt injection,
 # so moved it from Makefile.objs to Makefile.target for now
diff --git a/hw/iommu.c b/hw/iommu.c
new file mode 100644
index 0000000..511756b
--- /dev/null
+++ b/hw/iommu.c
@@ -0,0 +1,82 @@
+/*
+ * Generic IOMMU layer
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <errno.h>
+
+#include "iommu.h"
+
+struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev)
+{
+    BusState *bus;
+
+    while (dev) {
+        bus = dev->parent_bus;
+        if (!bus)
+            goto out;
+
+        if (bus->iommu) {
+            *real_dev = dev;
+            return bus->iommu;
+        }
+
+        dev = bus->parent;
+    }
+
+out:
+    *real_dev = NULL;
+    return NULL;
+}
+
+int __iommu_rw(struct iommu *iommu,
+               DeviceState *dev,
+               target_phys_addr_t addr,
+               uint8_t *buf,
+               int len,
+               int is_write)
+{
+    int plen, err;
+    target_phys_addr_t paddr;
+    unsigned perms;
+
+    if (!is_write)
+        perms = IOMMU_PERM_READ;
+    else
+        perms = IOMMU_PERM_WRITE;
+
+    do {
+        err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms);
+        if (err)
+            return err;
+        if (plen > len)
+            plen = len;
+
+        cpu_physical_memory_rw(paddr, buf, plen, is_write);
+
+        len -= plen;
+        addr += plen;
+        buf += plen;
+    } while (len);
+
+    return 0;
+}
diff --git a/hw/iommu.h b/hw/iommu.h
new file mode 100644
index 0000000..01996a6
--- /dev/null
+++ b/hw/iommu.h
@@ -0,0 +1,260 @@
+#ifndef QEMU_IOMMU_H
+#define QEMU_IOMMU_H
+
+#include "pci.h"
+#include "targphys.h"
+#include "qdev.h"
+
+/* Don't use directly. */
+struct iommu {
+    void *opaque;
+
+    void (*register_device)(struct iommu *iommu,
+                            DeviceState *dev);
+    int (*translate)(struct iommu *iommu,
+                     DeviceState *dev,
+                     target_phys_addr_t addr,
+                     target_phys_addr_t *paddr,
+                     int *len,
+                     unsigned perms);
+    int (*start_transaction)(struct iommu *iommu,
+                             DeviceState *dev);
+    void (*end_transaction)(struct iommu *iommu,
+                            DeviceState *dev);
+};
+
+#define IOMMU_PERM_READ   (1 << 0)
+#define IOMMU_PERM_WRITE  (1 << 1)
+
+#define IOMMU_PERM_RW     (IOMMU_PERM_READ | IOMMU_PERM_WRITE)
+
+static inline int iommu_nop_translate(struct iommu *iommu,
+                                      DeviceState *dev,
+                                      target_phys_addr_t addr,
+                                      target_phys_addr_t *paddr,
+                                      int *len,
+                                      unsigned perms)
+{
+    *paddr = addr;
+    *len = INT_MAX;
+
+    return 0;
+}
+
+static inline int iommu_nop_rw(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               uint8_t *buf,
+                               int len,
+                               int is_write)
+{
+    cpu_physical_memory_rw(addr, buf, len, is_write);
+
+    return 0;
+}
+
+static inline int iommu_register_device(struct iommu *iommu,
+                                        DeviceState *dev)
+{
+    if (iommu && iommu->register_device)
+        iommu->register_device(iommu, dev);
+
+    return 0;
+}
+
+#ifdef CONFIG_IOMMU
+
+extern struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev);
+
+/**
+ * Translates an address for the given device and performs access checking.
+ *
+ * Defined in implementation-specific IOMMU code.
+ *
+ * @iommu   IOMMU
+ * @dev     qdev device
+ * @addr    address to be translated
+ * @paddr   translated address
+ * @len     number of bytes for which the translation is valid
+ * @rw      read or write?
+ *
+ * Returns 0 iff translation and access checking succeeded.
+ */
+static inline int iommu_translate(struct iommu *iommu,
+                                  DeviceState *dev,
+                                  target_phys_addr_t addr,
+                                  target_phys_addr_t *paddr,
+                                  int *len,
+                                  unsigned perms)
+{
+    if (iommu && iommu->translate)
+        return iommu->translate(iommu, dev, addr, paddr, len, perms);
+
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
+
+extern int __iommu_rw(struct iommu *iommu,
+                      DeviceState *dev,
+                      target_phys_addr_t addr,
+                      uint8_t *buf,
+                      int len,
+                      int is_write);
+
+/**
+ * Performs I/O with address translation and access checking.
+ *
+ * Defined in generic IOMMU code.
+ *
+ * @iommu   IOMMU
+ * @dev     qdev device
+ * @addr    address where to perform I/O
+ * @buf     buffer to read from or write to
+ * @len     length of the operation
+ * @rw      read or write?
+ *
+ * Returns 0 iff the I/O operation succeeded.
+ */
+static inline int iommu_rw(struct iommu *iommu,
+                           DeviceState *dev,
+                           target_phys_addr_t addr,
+                           uint8_t *buf,
+                           int len,
+                           int is_write)
+{
+    if (iommu && iommu->translate)
+        return __iommu_rw(iommu, dev, addr, buf, len, is_write);
+
+    return iommu_nop_rw(iommu, dev, addr, buf, len, is_write);
+}
+
+static inline int iommu_start_transaction(struct iommu *iommu,
+                                          DeviceState *dev)
+{
+    if (iommu && iommu->start_transaction)
+        return iommu->start_transaction(iommu, dev);
+
+    return 0;
+}
+
+static inline void iommu_end_transaction(struct iommu *iommu,
+                                         DeviceState *dev)
+{
+    if (iommu && iommu->end_transaction)
+        iommu->end_transaction(iommu, dev);
+}
+
+#define DEFINE_LD_PHYS(suffix, size)                                        \
+static inline uint##size##_t iommu_ld##suffix(struct iommu *iommu,          \
+                                             DeviceState *dev,              \
+                                             target_phys_addr_t addr)       \
+{                                                                           \
+    int len, err;                                                           \
+    target_phys_addr_t paddr;                                               \
+                                                                            \
+    err = iommu_translate(iommu, dev, addr, &paddr, &len, IOMMU_PERM_READ); \
+    if (err || (len < size / 8))                                            \
+        return err;                                                         \
+    return ld##suffix##_phys(paddr);                                        \
+}
+
+#define DEFINE_ST_PHYS(suffix, size)                                        \
+static inline void iommu_st##suffix(struct iommu *iommu,                    \
+                                    DeviceState *dev,                       \
+                                    target_phys_addr_t addr,                \
+                                    uint##size##_t val)                     \
+{                                                                           \
+    int len, err;                                                           \
+    target_phys_addr_t paddr;                                               \
+                                                                            \
+    err = iommu_translate(iommu, dev, addr, &paddr, &len, IOMMU_PERM_WRITE);\
+    if (err || (len < size / 8))                                            \
+        return;                                                             \
+    st##suffix##_phys(paddr, val);                                          \
+}
+
+#else /* CONFIG_IOMMU */
+
+static inline struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev)
+{
+    return NULL;
+}
+
+static inline int iommu_translate(struct iommu *iommu,
+                                  DeviceState *dev,
+                                  target_phys_addr_t addr,
+                                  target_phys_addr_t *paddr,
+                                  int *len,
+                                  unsigned perms)
+{
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
+
+static inline int iommu_rw(struct iommu *iommu,
+                           DeviceState *dev,
+                           target_phys_addr_t addr,
+                           uint8_t *buf,
+                           int len,
+                           int is_write)
+{
+    return iommu_nop_rw(iommu, dev, addr, buf, len, is_write);
+}
+
+static inline int iommu_start_transaction(struct iommu *iommu,
+                                          DeviceState *dev)
+{
+    return 0;
+}
+
+static inline void iommu_end_transaction(struct iommu *iommu,
+                                         DeviceState *dev)
+{
+}
+
+#define DEFINE_LD_PHYS(suffix, size)                                        \
+static inline uint##size##_t iommu_ld##suffix(struct iommu *iommu,          \
+                                             DeviceState *dev,              \
+                                             target_phys_addr_t addr)       \
+{                                                                           \
+    return ld##suffix##_phys(addr);                                         \
+}
+
+#define DEFINE_ST_PHYS(suffix, size)                                        \
+static inline void iommu_st##suffix(struct iommu *iommu,                    \
+                                    DeviceState *dev,                       \
+                                    target_phys_addr_t addr,                \
+                                    uint##size##_t val)                     \
+{                                                                           \
+    st##suffix##_phys(addr, val);                                           \
+}
+
+#endif /* CONFIG_IOMMU */
+
+static inline int iommu_read(struct iommu *iommu,
+                             DeviceState *dev,
+                             target_phys_addr_t addr,
+                             uint8_t *buf,
+                             int len)
+{
+    return iommu_rw(iommu, dev, addr, buf, len, 0);
+}
+
+static inline int iommu_write(struct iommu *iommu,
+                              DeviceState *dev,
+                              target_phys_addr_t addr,
+                              const uint8_t *buf,
+                              int len)
+{
+    return iommu_rw(iommu, dev, addr, (uint8_t *) buf, len, 1);
+}
+
+DEFINE_LD_PHYS(ub, 8)
+DEFINE_LD_PHYS(uw, 16)
+DEFINE_LD_PHYS(l, 32)
+DEFINE_LD_PHYS(q, 64)
+
+DEFINE_ST_PHYS(b, 8)
+DEFINE_ST_PHYS(w, 16)
+DEFINE_ST_PHYS(l, 32)
+DEFINE_ST_PHYS(q, 64)
+
+#endif
diff --git a/hw/qdev.h b/hw/qdev.h
index be5ad67..deb71fd 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -56,6 +56,8 @@ struct BusInfo {
     Property *props;
 };
 
+struct iommu;
+
 struct BusState {
     DeviceState *parent;
     BusInfo *info;
@@ -64,6 +66,10 @@ struct BusState {
     int qdev_allocated;
     QLIST_HEAD(, DeviceState) children;
     QLIST_ENTRY(BusState) sibling;
+
+#ifdef CONFIG_IOMMU
+    struct iommu *iommu;
+#endif
 };
 
 struct Property {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 2/7] AMD IOMMU emulation
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +
 configure       |   11 +
 hw/amd_iommu.c  |  621 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    4 +
 hw/pc.h         |    3 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 7 files changed, 644 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 3f895ae..eb164ba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -220,6 +220,8 @@ obj-i386-y += pcspk.o i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/configure b/configure
index af50607..a0730b7 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,7 @@ io_thread="no"
 mixemu="no"
 kvm_cap_pit=""
 kvm_cap_device_assignment=""
+amd_iommu="no"
 kerneldir=""
 aix="no"
 blobs="yes"
@@ -629,6 +630,8 @@ for opt do
   ;;
   --enable-kvm-device-assignment) kvm_cap_device_assignment="yes"
   ;;
+  --enable-amd-iommu-emul) amd_iommu="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -871,6 +874,8 @@ echo "  --disable-kvm-pit        disable KVM pit support"
 echo "  --enable-kvm-pit         enable KVM pit support"
 echo "  --disable-kvm-device-assignment  disable KVM device assignment support"
 echo "  --enable-kvm-device-assignment   enable KVM device assignment support"
+echo "  --disable-amd-iommu-emul disable AMD IOMMU emulation"
+echo "  --enable-amd-iommu-emul  enable AMD IOMMU emulation"
 echo "  --disable-nptl           disable usermode NPTL support"
 echo "  --enable-nptl            enable usermode NPTL support"
 echo "  --enable-system          enable all system emulation targets"
@@ -2251,6 +2256,7 @@ echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
 echo "KVM PIT support   $kvm_cap_pit"
 echo "KVM device assig. $kvm_cap_device_assignment"
+echo "AMD IOMMU emul.   $amd_iommu"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -2645,6 +2651,11 @@ case "$target_arch2" in
   x86_64)
     TARGET_BASE_ARCH=i386
     target_phys_bits=64
+    if test "$amd_iommu" = "yes"; then
+      echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak
+      echo "CONFIG_IOMMU=y" >> $config_target_mak
+      echo "CONFIG_IOMMU=y" >> $config_host_mak
+    fi
   ;;
   ia64)
     target_phys_bits=64
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..e72f0c0
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,621 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "iommu.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+struct amd_iommu_state {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    size_t                      evtlog_len;
+    size_t                      evtlog_head;
+    size_t                      evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+};
+
+static void amd_iommu_completion_wait(struct amd_iommu_state *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_cmdbuf_run(struct amd_iommu_state *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled)
+        return;
+
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+
+    if (st->cmdbuf_head == st->cmdbuf_tail)
+        return;
+
+    cpu_physical_memory_read(st->cmdbuf, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+}
+
+static uint32_t amd_iommu_mmio_buf_read(struct amd_iommu_state *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size)
+        return 0;
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(struct amd_iommu_state *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(struct amd_iommu_state *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = *base;
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = !!(val & MMIO_CONTROL_CMDBUFEN);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_init_mmio(struct amd_iommu_state *st)
+{
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_enable_mmio(struct amd_iommu_state *st)
+{
+    target_phys_addr_t addr;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0)
+        return;
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
+    st->mmio_enabled = 1;
+    amd_iommu_init_mmio(st);
+}
+
+static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
+                                     uint32_t addr, int len)
+{
+    return pci_default_cap_read_config(pci_dev, addr, len);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+    int reg;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->capab;
+    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
+
+    switch (reg) {
+        case CAPAB_HEADER:
+        case CAPAB_MISC:
+            /* Read-only. */
+            return;
+        case CAPAB_BAR_LOW:
+        case CAPAB_BAR_HIGH:
+        case CAPAB_RANGE:
+            if (st->mmio_enabled)
+                return;
+            pci_default_cap_write_config(dev, addr, val, len);
+            break;
+        default:
+            return;
+    }
+
+    if (capab[CAPAB_BAR_LOW] & 0x1)
+        amd_iommu_enable_mmio(st);
+}
+
+static int amd_iommu_init_capab(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->dev.config + st->capab_offset;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    st->capab = capab;
+    st->dev.cap.length = CAPAB_SIZE;
+
+    return 0;
+}
+
+static int amd_iommu_translate(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               target_phys_addr_t *paddr,
+                               int *len,
+                               unsigned perms);
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    struct iommu *iommu;
+    int err;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    err = pci_enable_capability_support(&st->dev, st->capab_offset,
+                                        amd_iommu_read_capab,
+                                        amd_iommu_write_capab,
+                                        amd_iommu_init_capab);
+    if (err)
+        return err;
+
+    iommu = qemu_mallocz(sizeof(struct iommu));
+    iommu->opaque = st;
+    iommu->translate = amd_iommu_translate;
+    st->dev.qdev.parent_bus->iommu = iommu;
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, struct amd_iommu_state),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(struct amd_iommu_state),
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+void amd_iommu_init(PCIBus *bus)
+{
+    pci_create_simple(bus, -1, "amd-iommu");
+}
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
+
+static void amd_iommu_page_fault(struct amd_iommu_state *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    uint16_t entry[8];
+    uint64_t *entry_addr = (uint64_t *) &entry[4];
+
+    entry[0] = cpu_to_le16(devfn);
+    entry[1] = 0;
+    entry[2] = cpu_to_le16(domid);
+    entry[3] = (2UL << 12) | (!!present << 4) | (!!is_write << 5);
+    *entry_addr = cpu_to_le64(addr);
+
+    cpu_physical_memory_write((target_phys_addr_t) st->evtlog + st->evtlog_tail, (uint8_t *) &entry, 128);
+    st->evtlog_tail += 128;
+}
+
+static int amd_iommu_qdev_to_devfn(DeviceState *dev)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
+
+    return pci_dev->devfn;
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               target_phys_addr_t *paddr,
+                               int *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    struct amd_iommu_state *st = iommu->opaque;
+
+    if (!st->enabled)
+        goto no_translation;
+
+    /* Get device table entry. */
+    devfn = amd_iommu_qdev_to_devfn(dev);
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
diff --git a/hw/pc.c b/hw/pc.c
index 186e322..4c929f9 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1066,6 +1066,10 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+#ifdef CONFIG_AMD_IOMMU
+    amd_iommu_init(pci_bus);
+#endif
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pc.h b/hw/pc.h
index 3ef2f75..255ad93 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -191,4 +191,7 @@ void extboot_init(BlockDriverState *bs);
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
 
+/* amd_iommu.c */
+void amd_iommu_init(PCIBus *bus);
+
 #endif
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 1c675dc..6399b5d 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -216,6 +216,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC     0x0F    /* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 2/7] AMD IOMMU emulation
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

This introduces emulation for the AMD IOMMU, described in "AMD I/O
Virtualization Technology (IOMMU) Specification".

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +
 configure       |   11 +
 hw/amd_iommu.c  |  621 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    4 +
 hw/pc.h         |    3 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 7 files changed, 644 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 3f895ae..eb164ba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -220,6 +220,8 @@ obj-i386-y += pcspk.o i8254.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/configure b/configure
index af50607..a0730b7 100755
--- a/configure
+++ b/configure
@@ -317,6 +317,7 @@ io_thread="no"
 mixemu="no"
 kvm_cap_pit=""
 kvm_cap_device_assignment=""
+amd_iommu="no"
 kerneldir=""
 aix="no"
 blobs="yes"
@@ -629,6 +630,8 @@ for opt do
   ;;
   --enable-kvm-device-assignment) kvm_cap_device_assignment="yes"
   ;;
+  --enable-amd-iommu-emul) amd_iommu="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -871,6 +874,8 @@ echo "  --disable-kvm-pit        disable KVM pit support"
 echo "  --enable-kvm-pit         enable KVM pit support"
 echo "  --disable-kvm-device-assignment  disable KVM device assignment support"
 echo "  --enable-kvm-device-assignment   enable KVM device assignment support"
+echo "  --disable-amd-iommu-emul disable AMD IOMMU emulation"
+echo "  --enable-amd-iommu-emul  enable AMD IOMMU emulation"
 echo "  --disable-nptl           disable usermode NPTL support"
 echo "  --enable-nptl            enable usermode NPTL support"
 echo "  --enable-system          enable all system emulation targets"
@@ -2251,6 +2256,7 @@ echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
 echo "KVM PIT support   $kvm_cap_pit"
 echo "KVM device assig. $kvm_cap_device_assignment"
+echo "AMD IOMMU emul.   $amd_iommu"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -2645,6 +2651,11 @@ case "$target_arch2" in
   x86_64)
     TARGET_BASE_ARCH=i386
     target_phys_bits=64
+    if test "$amd_iommu" = "yes"; then
+      echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak
+      echo "CONFIG_IOMMU=y" >> $config_target_mak
+      echo "CONFIG_IOMMU=y" >> $config_host_mak
+    fi
   ;;
   ia64)
     target_phys_bits=64
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..e72f0c0
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,621 @@
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+#include "iommu.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x4000
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1ULL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1ULL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1ULL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1ULL << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1ULL << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define MMIO_CONTROL_IOMMUEN        (1ULL << 0)
+#define MMIO_CONTROL_HTTUNEN        (1ULL << 1)
+#define MMIO_CONTROL_EVENTLOGEN     (1ULL << 2)
+#define MMIO_CONTROL_EVENTINTEN     (1ULL << 3)
+#define MMIO_CONTROL_COMWAITINTEN   (1ULL << 4)
+#define MMIO_CONTROL_CMDBUFEN       (1ULL << 12)
+
+#define MMIO_STATUS_EVTLOG_OF       (1ULL << 0)
+#define MMIO_STATUS_EVTLOG_INTR     (1ULL << 1)
+#define MMIO_STATUS_COMWAIT_INTR    (1ULL << 2)
+#define MMIO_STATUS_EVTLOG_RUN      (1ULL << 3)
+#define MMIO_STATUS_CMDBUF_RUN      (1ULL << 4)
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ID_RSHIFT            4
+#define CMDBUF_ENTRY_SIZE           0x10
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+#define DEVTAB_ENTRY_SIZE           32
+
+/* Device table entry bits 0:63 */
+#define DEV_VALID                   (1ULL << 0)
+#define DEV_TRANSLATION_VALID       (1ULL << 1)
+#define DEV_MODE_MASK               0x7
+#define DEV_MODE_RSHIFT             9
+#define DEV_PT_ROOT_MASK            0xFFFFFFFFFF000
+#define DEV_PT_ROOT_RSHIFT          12
+#define DEV_PERM_SHIFT              61
+#define DEV_PERM_READ               (1ULL << 61)
+#define DEV_PERM_WRITE              (1ULL << 62)
+
+/* Device table entry bits 64:127 */
+#define DEV_DOMAIN_ID_MASK          ((1ULL << 16) - 1)
+#define DEV_IOTLB_SUPPORT           (1ULL << 17)
+#define DEV_SUPPRESS_PF             (1ULL << 18)
+#define DEV_SUPPRESS_ALL_PF         (1ULL << 19)
+#define DEV_IOCTL_MASK              ~3
+#define DEV_IOCTL_RSHIFT            20
+#define   DEV_IOCTL_DENY            0
+#define   DEV_IOCTL_PASSTHROUGH     1
+#define   DEV_IOCTL_TRANSLATE       2
+#define DEV_CACHE                   (1ULL << 37)
+#define DEV_SNOOP_DISABLE           (1ULL << 38)
+#define DEV_EXCL                    (1ULL << 39)
+
+struct amd_iommu_state {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+    int                         ats_enabled;
+
+    target_phys_addr_t          devtab;
+    size_t                      devtab_len;
+
+    target_phys_addr_t          cmdbuf;
+    int                         cmdbuf_enabled;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+    int                         completion_wait_intr;
+
+    target_phys_addr_t          evtlog;
+    int                         evtlog_enabled;
+    int                         evtlog_intr;
+    size_t                      evtlog_len;
+    size_t                      evtlog_head;
+    size_t                      evtlog_tail;
+
+    target_phys_addr_t          excl_base;
+    target_phys_addr_t          excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+};
+
+static void amd_iommu_completion_wait(struct amd_iommu_state *st,
+                                      uint8_t *cmd)
+{
+    uint64_t addr;
+
+    if (cmd[0] & 1) {
+        addr = le64_to_cpu(*(uint64_t *) cmd) & 0xFFFFFFFFFFFF8;
+        cpu_physical_memory_write(addr, cmd + 8, 8);
+    }
+
+    if (cmd[0] & 2)
+        st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+}
+
+static void amd_iommu_cmdbuf_run(struct amd_iommu_state *st)
+{
+    uint8_t cmd[16];
+    int type;
+
+    if (!st->cmdbuf_enabled)
+        return;
+
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_CMDBUF_RUN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_EVTLOG_RUN;
+    st->mmio_buf[MMIO_STATUS] |= MMIO_STATUS_COMWAIT_INTR;
+
+    if (st->cmdbuf_head == st->cmdbuf_tail)
+        return;
+
+    cpu_physical_memory_read(st->cmdbuf, cmd, 16);
+    type = cmd[CMDBUF_ID_BYTE] >> CMDBUF_ID_RSHIFT;
+    switch (type) {
+        case CMD_COMPLETION_WAIT:
+            amd_iommu_completion_wait(st, cmd);
+            break;
+        case CMD_INVAL_DEVTAB_ENTRY:
+            break;
+        case CMD_INVAL_IOMMU_PAGES:
+            break;
+        case CMD_INVAL_IOTLB_PAGES:
+            break;
+        case CMD_INVAL_INTR_TABLE:
+            break;
+        default:
+            break;
+    }
+
+    st->cmdbuf_head += CMDBUF_ENTRY_SIZE;
+}
+
+static uint32_t amd_iommu_mmio_buf_read(struct amd_iommu_state *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    ssize_t i;
+    uint32_t ret;
+
+    if (!size)
+        return 0;
+
+    ret = st->mmio_buf[offset + size - 1];
+    for (i = size - 2; i >= 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(struct amd_iommu_state *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(struct amd_iommu_state *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+    uint64_t val = *base;
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled              = !!(val & MMIO_CONTROL_IOMMUEN);
+            st->ats_enabled          = !!(val & MMIO_CONTROL_HTTUNEN);
+            st->evtlog_enabled       = !!(val & MMIO_CONTROL_EVENTLOGEN);
+            st->evtlog_intr          = !!(val & MMIO_CONTROL_EVENTINTEN);
+            st->completion_wait_intr = !!(val & MMIO_CONTROL_COMWAITINTEN);
+            st->cmdbuf_enabled       = !!(val & MMIO_CONTROL_CMDBUFEN);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (target_phys_addr_t) (val & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((val & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (target_phys_addr_t) (val & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = val & MMIO_CMDBUF_HEAD_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = val & MMIO_CMDBUF_TAIL_MASK;
+            amd_iommu_cmdbuf_run(st);
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (target_phys_addr_t) (val & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = val & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = val & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (target_phys_addr_t) (val & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = val & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = val & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (target_phys_addr_t) ((val & MMIO_EXCL_LIMIT_MASK) |
+                                                   MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_init_mmio(struct amd_iommu_state *st)
+{
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_enable_mmio(struct amd_iommu_state *st)
+{
+    target_phys_addr_t addr;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0)
+        return;
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
+    st->mmio_enabled = 1;
+    amd_iommu_init_mmio(st);
+}
+
+static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
+                                     uint32_t addr, int len)
+{
+    return pci_default_cap_read_config(pci_dev, addr, len);
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+    int reg;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->capab;
+    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
+
+    switch (reg) {
+        case CAPAB_HEADER:
+        case CAPAB_MISC:
+            /* Read-only. */
+            return;
+        case CAPAB_BAR_LOW:
+        case CAPAB_BAR_HIGH:
+        case CAPAB_RANGE:
+            if (st->mmio_enabled)
+                return;
+            pci_default_cap_write_config(dev, addr, val, len);
+            break;
+        default:
+            return;
+    }
+
+    if (capab[CAPAB_BAR_LOW] & 0x1)
+        amd_iommu_enable_mmio(st);
+}
+
+static int amd_iommu_init_capab(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->dev.config + st->capab_offset;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    st->capab = capab;
+    st->dev.cap.length = CAPAB_SIZE;
+
+    return 0;
+}
+
+static int amd_iommu_translate(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               target_phys_addr_t *paddr,
+                               int *len,
+                               unsigned perms);
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    struct iommu *iommu;
+    int err;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    err = pci_enable_capability_support(&st->dev, st->capab_offset,
+                                        amd_iommu_read_capab,
+                                        amd_iommu_write_capab,
+                                        amd_iommu_init_capab);
+    if (err)
+        return err;
+
+    iommu = qemu_mallocz(sizeof(struct iommu));
+    iommu->opaque = st;
+    iommu->translate = amd_iommu_translate;
+    st->dev.qdev.parent_bus->iommu = iommu;
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, struct amd_iommu_state),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(struct amd_iommu_state),
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+void amd_iommu_init(PCIBus *bus)
+{
+    pci_create_simple(bus, -1, "amd-iommu");
+}
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+}
+
+device_init(amd_iommu_register);
+
+static void amd_iommu_page_fault(struct amd_iommu_state *st,
+                                 int devfn,
+                                 unsigned domid,
+                                 target_phys_addr_t addr,
+                                 int present,
+                                 int is_write)
+{
+    uint16_t entry[8];
+    uint64_t *entry_addr = (uint64_t *) &entry[4];
+
+    entry[0] = cpu_to_le16(devfn);
+    entry[1] = 0;
+    entry[2] = cpu_to_le16(domid);
+    entry[3] = (2UL << 12) | (!!present << 4) | (!!is_write << 5);
+    *entry_addr = cpu_to_le64(addr);
+
+    cpu_physical_memory_write((target_phys_addr_t) st->evtlog + st->evtlog_tail, (uint8_t *) &entry, 128);
+    st->evtlog_tail += 128;
+}
+
+static int amd_iommu_qdev_to_devfn(DeviceState *dev)
+{
+    PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
+
+    return pci_dev->devfn;
+}
+
+static inline uint64_t amd_iommu_get_perms(uint64_t entry)
+{
+    return (entry & (DEV_PERM_READ | DEV_PERM_WRITE)) >> DEV_PERM_SHIFT;
+}
+
+static int amd_iommu_translate(struct iommu *iommu,
+                               DeviceState *dev,
+                               target_phys_addr_t addr,
+                               target_phys_addr_t *paddr,
+                               int *len,
+                               unsigned perms)
+{
+    int devfn, present;
+    target_phys_addr_t entry_addr, pte_addr;
+    uint64_t entry[4], pte, page_offset, pte_perms;
+    unsigned level, domid;
+    struct amd_iommu_state *st = iommu->opaque;
+
+    if (!st->enabled)
+        goto no_translation;
+
+    /* Get device table entry. */
+    devfn = amd_iommu_qdev_to_devfn(dev);
+    entry_addr = st->devtab + devfn * DEVTAB_ENTRY_SIZE;
+    cpu_physical_memory_read(entry_addr, (uint8_t *) entry, 32);
+
+    pte = entry[0];
+    if (!(pte & DEV_VALID) || !(pte & DEV_TRANSLATION_VALID)) {
+        goto no_translation;
+    }
+    domid = entry[1] & DEV_DOMAIN_ID_MASK;
+    level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    while (level > 0) {
+        /*
+         * Check permissions: the bitwise
+         * implication perms -> entry_perms must be true.
+         */
+        pte_perms = amd_iommu_get_perms(pte);
+        present = pte & 1;
+        if (!present || perms != (perms & pte_perms)) {
+            amd_iommu_page_fault(st, devfn, domid, addr,
+                                 present, !!(perms & IOMMU_PERM_WRITE));
+            return -EPERM;
+        }
+
+        /* Go to the next lower level. */
+        pte_addr = pte & DEV_PT_ROOT_MASK;
+        pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
+        pte = ldq_phys(pte_addr);
+        level = (pte >> DEV_MODE_RSHIFT) & DEV_MODE_MASK;
+    }
+    page_offset = addr & 4095;
+    *paddr = (pte & DEV_PT_ROOT_MASK) + page_offset;
+    *len = 4096 - page_offset;
+
+    return 0;
+
+no_translation:
+    return iommu_nop_translate(iommu, dev, addr, paddr, len, perms);
+}
diff --git a/hw/pc.c b/hw/pc.c
index 186e322..4c929f9 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1066,6 +1066,10 @@ void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+#ifdef CONFIG_AMD_IOMMU
+    amd_iommu_init(pci_bus);
+#endif
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pc.h b/hw/pc.h
index 3ef2f75..255ad93 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -191,4 +191,7 @@ void extboot_init(BlockDriverState *bs);
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
 
+/* amd_iommu.c */
+void amd_iommu_init(PCIBus *bus);
+
 #endif
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 39e9f1d..d790312 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 1c675dc..6399b5d 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -216,6 +216,7 @@
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC     0x0F    /* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 3/7] pci: call IOMMU hooks
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6871728..9c5d706 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "monitor.h"
 #include "net.h"
@@ -733,12 +734,25 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
     pci_config_free(pci_dev);
 }
 
+#ifdef CONFIG_IOMMU
+static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
+{
+    return iommu_register_device(bus->qbus.iommu, &dev->qdev);
+}
+#else
+static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
+{
+    return 0;
+}
+#endif
+
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
                                int instance_size, int devfn,
                                PCIConfigReadFunc *config_read,
                                PCIConfigWriteFunc *config_write)
 {
     PCIDevice *pci_dev;
+    int err;
 
     pci_dev = qemu_mallocz(instance_size);
     pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
@@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
     if (pci_dev == NULL) {
         hw_error("PCI: can't register device\n");
     }
+
+    err = pci_iommu_register_device(bus, pci_dev);
+    if (err) {
+        hw_error("PCI: can't register device with IOMMU\n");
+        return NULL;
+    }
+
     return pci_dev;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/pci.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6871728..9c5d706 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "monitor.h"
 #include "net.h"
@@ -733,12 +734,25 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
     pci_config_free(pci_dev);
 }
 
+#ifdef CONFIG_IOMMU
+static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
+{
+    return iommu_register_device(bus->qbus.iommu, &dev->qdev);
+}
+#else
+static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
+{
+    return 0;
+}
+#endif
+
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
                                int instance_size, int devfn,
                                PCIConfigReadFunc *config_read,
                                PCIConfigWriteFunc *config_write)
 {
     PCIDevice *pci_dev;
+    int err;
 
     pci_dev = qemu_mallocz(instance_size);
     pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
@@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
     if (pci_dev == NULL) {
         hw_error("PCI: can't register device\n");
     }
+
+    err = pci_iommu_register_device(bus, pci_dev);
+    if (err) {
+        hw_error("PCI: can't register device with IOMMU\n");
+        return NULL;
+    }
+
     return pci_dev;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ide/core.c |   46 +++++++++++++++++++++++++++++++---------------
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b3b7c2..7f8f7df 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -26,6 +26,7 @@
 #include <hw/pc.h>
 #include <hw/pci.h>
 #include <hw/scsi.h>
+#include <hw/iommu.h>
 #include "qemu-timer.h"
 #include "sysemu.h"
 #include "dma.h"
@@ -433,7 +434,12 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
         uint32_t addr;
         uint32_t size;
     } prd;
-    int l, len;
+    int l, len, err, io_len;
+    struct iommu *iommu;
+    DeviceState *dev;
+    target_phys_addr_t io_addr;
+
+    iommu = iommu_get(s->bus->qbus.parent, &dev);
 
     qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
     s->io_buffer_size = 0;
@@ -443,7 +449,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            err = iommu_read(iommu, dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -455,11 +461,22 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             bm->cur_prd_last = (prd.size & 0x80000000);
         }
         l = bm->cur_prd_len;
-        if (l > 0) {
-            qemu_sglist_add(&s->sg, bm->cur_prd_addr, l);
-            bm->cur_prd_addr += l;
-            bm->cur_prd_len -= l;
-            s->io_buffer_size += l;
+        while (l > 0) {
+            /*
+             * In case translation / access checking fails no
+             * transfer happens but we pretend it went through.
+             */
+            err = iommu_translate(iommu, dev, bm->cur_prd_addr,
+                                  &io_addr, &io_len, !is_write);
+            if (!err) {
+                if (io_len > l)
+                    io_len = l;
+                qemu_sglist_add(&s->sg, io_addr, io_len);
+            }
+            bm->cur_prd_addr += io_len;
+            bm->cur_prd_len -= io_len;
+            s->io_buffer_size += io_len;
+            l -= io_len;
         }
     }
     return 1;
@@ -516,6 +533,10 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
         uint32_t size;
     } prd;
     int l, len;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(s->bus->qbus.parent, &dev);
 
     for(;;) {
         l = s->io_buffer_size - s->io_buffer_index;
@@ -526,7 +547,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            iommu_read(iommu, dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -540,13 +561,8 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
         if (l > bm->cur_prd_len)
             l = bm->cur_prd_len;
         if (l > 0) {
-            if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            }
+            iommu_rw(iommu, dev, bm->cur_prd_addr,
+                     s->io_buffer + s->io_buffer_index, l, is_write);
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
             s->io_buffer_index += l;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ide/core.c |   46 +++++++++++++++++++++++++++++++---------------
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0b3b7c2..7f8f7df 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -26,6 +26,7 @@
 #include <hw/pc.h>
 #include <hw/pci.h>
 #include <hw/scsi.h>
+#include <hw/iommu.h>
 #include "qemu-timer.h"
 #include "sysemu.h"
 #include "dma.h"
@@ -433,7 +434,12 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
         uint32_t addr;
         uint32_t size;
     } prd;
-    int l, len;
+    int l, len, err, io_len;
+    struct iommu *iommu;
+    DeviceState *dev;
+    target_phys_addr_t io_addr;
+
+    iommu = iommu_get(s->bus->qbus.parent, &dev);
 
     qemu_sglist_init(&s->sg, s->nsector / (IDE_PAGE_SIZE / 512) + 1);
     s->io_buffer_size = 0;
@@ -443,7 +449,7 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return s->io_buffer_size != 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            err = iommu_read(iommu, dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -455,11 +461,22 @@ static int dma_buf_prepare(BMDMAState *bm, int is_write)
             bm->cur_prd_last = (prd.size & 0x80000000);
         }
         l = bm->cur_prd_len;
-        if (l > 0) {
-            qemu_sglist_add(&s->sg, bm->cur_prd_addr, l);
-            bm->cur_prd_addr += l;
-            bm->cur_prd_len -= l;
-            s->io_buffer_size += l;
+        while (l > 0) {
+            /*
+             * In case translation / access checking fails no
+             * transfer happens but we pretend it went through.
+             */
+            err = iommu_translate(iommu, dev, bm->cur_prd_addr,
+                                  &io_addr, &io_len, !is_write);
+            if (!err) {
+                if (io_len > l)
+                    io_len = l;
+                qemu_sglist_add(&s->sg, io_addr, io_len);
+            }
+            bm->cur_prd_addr += io_len;
+            bm->cur_prd_len -= io_len;
+            s->io_buffer_size += io_len;
+            l -= io_len;
         }
     }
     return 1;
@@ -516,6 +533,10 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
         uint32_t size;
     } prd;
     int l, len;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(s->bus->qbus.parent, &dev);
 
     for(;;) {
         l = s->io_buffer_size - s->io_buffer_index;
@@ -526,7 +547,7 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
             if (bm->cur_prd_last ||
                 (bm->cur_addr - bm->addr) >= IDE_PAGE_SIZE)
                 return 0;
-            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            iommu_read(iommu, dev, bm->cur_addr, (uint8_t *)&prd, 8);
             bm->cur_addr += 8;
             prd.addr = le32_to_cpu(prd.addr);
             prd.size = le32_to_cpu(prd.size);
@@ -540,13 +561,8 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
         if (l > bm->cur_prd_len)
             l = bm->cur_prd_len;
         if (l > 0) {
-            if (is_write) {
-                cpu_physical_memory_write(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            } else {
-                cpu_physical_memory_read(bm->cur_prd_addr,
-                                          s->io_buffer + s->io_buffer_index, l);
-            }
+            iommu_rw(iommu, dev, bm->cur_prd_addr,
+                     s->io_buffer + s->io_buffer_index, l, is_write);
             bm->cur_prd_addr += l;
             bm->cur_prd_len -= l;
             s->io_buffer_index += l;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 5/7] rtl8139: IOMMU support
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   98 ++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 62 insertions(+), 36 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..0f78a69 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -48,6 +48,7 @@
  */
 
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "qemu-timer.h"
 #include "net.h"
@@ -416,7 +417,7 @@ typedef struct RTL8139TallyCounters
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
 
 /* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
+static void RTL8139TallyCounters_physical_memory_write(DeviceState *qdev, target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
 
 typedef struct RTL8139State {
     PCIDevice dev;
@@ -746,6 +747,11 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +763,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr,
+                            buf, size - wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr,
+                        buf + (size - wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +780,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -822,6 +828,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     DEBUG_PRINT((">>> RTL8139: received len=%d\n", size));
 
     /* test if board clock is stopped */
@@ -968,13 +979,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc, (uint8_t *) &val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 4, (uint8_t *) &val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 8,  (uint8_t *) &val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 12, (uint8_t *) &val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1030,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        iommu_write(iommu, dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1043,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, rx_addr + size, (uint8_t *) &val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1092,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, cplus_rx_ring_desc, (uint8_t *) &val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, cplus_rx_ring_desc + 4, (uint8_t *) &val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1290,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void RTL8139TallyCounters_physical_memory_write(DeviceState *qdev, target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
+    iommu = iommu_get(qdev, &dev);
+
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 0,     (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 8,     (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 16,    (uint8_t *) &val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 24,    (uint8_t *) &val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 28,    (uint8_t *) &val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 30,    (uint8_t *) &val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 32,    (uint8_t *) &val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 36,    (uint8_t *) &val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 40,    (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 48,    (uint8_t *) &val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 56,    (uint8_t *) &val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 60,    (uint8_t *) &val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 62,    (uint8_t *) &val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1773,11 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1800,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    iommu_read(iommu, dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1906,11 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1936,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc,    (uint8_t *) &val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+4,  (uint8_t *) &val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+8,  (uint8_t *) &val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+12, (uint8_t *) &val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2050,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    iommu_read(iommu, dev, tx_addr,
+               s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,7 +2078,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    iommu_write(iommu, dev, cplus_tx_ring_desc, (uint8_t *) &val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
 //    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
@@ -2381,7 +2407,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(&s->dev.qdev, tc_addr, &s->tally_counters);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 5/7] rtl8139: IOMMU support
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/rtl8139.c |   98 ++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 62 insertions(+), 36 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 72e2242..0f78a69 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -48,6 +48,7 @@
  */
 
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "qemu-timer.h"
 #include "net.h"
@@ -416,7 +417,7 @@ typedef struct RTL8139TallyCounters
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
 
 /* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
+static void RTL8139TallyCounters_physical_memory_write(DeviceState *qdev, target_phys_addr_t tc_addr, RTL8139TallyCounters* counters);
 
 typedef struct RTL8139State {
     PCIDevice dev;
@@ -746,6 +747,11 @@ static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
 
 static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (s->RxBufAddr + size > s->RxBufferSize)
     {
         int wrapped = MOD2(s->RxBufAddr + size, s->RxBufferSize);
@@ -757,15 +763,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
 
             if (size > wrapped)
             {
-                cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                           buf, size-wrapped );
+                iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr,
+                            buf, size - wrapped);
             }
 
             /* reset buffer pointer */
             s->RxBufAddr = 0;
 
-            cpu_physical_memory_write( s->RxBuf + s->RxBufAddr,
-                                       buf + (size-wrapped), wrapped );
+            iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr,
+                        buf + (size - wrapped), wrapped);
 
             s->RxBufAddr = wrapped;
 
@@ -774,7 +780,7 @@ static void rtl8139_write_buffer(RTL8139State *s, const void *buf, int size)
     }
 
     /* non-wrapping path or overwrapping enabled */
-    cpu_physical_memory_write( s->RxBuf + s->RxBufAddr, buf, size );
+    iommu_write(iommu, dev, s->RxBuf + s->RxBufAddr, buf, size);
 
     s->RxBufAddr += size;
 }
@@ -822,6 +828,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     DEBUG_PRINT((">>> RTL8139: received len=%d\n", size));
 
     /* test if board clock is stopped */
@@ -968,13 +979,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-        cpu_physical_memory_read(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc, (uint8_t *) &val, 4);
         rxdw0 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 4, (uint8_t *) &val, 4);
         rxdw1 = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 8,  (uint8_t *) &val, 4);
         rxbufLO = le32_to_cpu(val);
-        cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)&val, 4);
+        iommu_read(iommu, dev, cplus_rx_ring_desc + 12, (uint8_t *) &val, 4);
         rxbufHI = le32_to_cpu(val);
 
         DEBUG_PRINT(("RTL8139: +++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
@@ -1019,7 +1030,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
         target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
         /* receive/copy to target memory */
-        cpu_physical_memory_write( rx_addr, buf, size );
+        iommu_write(iommu, dev, rx_addr, buf, size);
 
         if (s->CpCmd & CPlusRxChkSum)
         {
@@ -1032,7 +1043,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 #else
         val = 0;
 #endif
-        cpu_physical_memory_write( rx_addr+size, (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, rx_addr + size, (uint8_t *) &val, 4);
 
 /* first segment of received packet flag */
 #define CP_RX_STATUS_FS (1<<29)
@@ -1081,9 +1092,9 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         /* update ring data */
         val = cpu_to_le32(rxdw0);
-        cpu_physical_memory_write(cplus_rx_ring_desc,    (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, cplus_rx_ring_desc, (uint8_t *) &val, 4);
         val = cpu_to_le32(rxdw1);
-        cpu_physical_memory_write(cplus_rx_ring_desc+4,  (uint8_t *)&val, 4);
+        iommu_write(iommu, dev, cplus_rx_ring_desc + 4, (uint8_t *) &val, 4);
 
         /* update tally counter */
         ++s->tally_counters.RxOk;
@@ -1279,50 +1290,54 @@ static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters)
     counters->TxUndrn = 0;
 }
 
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
+static void RTL8139TallyCounters_physical_memory_write(DeviceState *qdev, target_phys_addr_t tc_addr, RTL8139TallyCounters* tally_counters)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
     uint16_t val16;
     uint32_t val32;
     uint64_t val64;
 
+    iommu = iommu_get(qdev, &dev);
+
     val64 = cpu_to_le64(tally_counters->TxOk);
-    cpu_physical_memory_write(tc_addr + 0,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 0,     (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOk);
-    cpu_physical_memory_write(tc_addr + 8,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 8,     (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->TxERR);
-    cpu_physical_memory_write(tc_addr + 16,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 16,    (uint8_t *) &val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxERR);
-    cpu_physical_memory_write(tc_addr + 24,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 24,    (uint8_t *) &val32, 4);
 
     val16 = cpu_to_le16(tally_counters->MissPkt);
-    cpu_physical_memory_write(tc_addr + 28,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 28,    (uint8_t *) &val16, 2);
 
     val16 = cpu_to_le16(tally_counters->FAE);
-    cpu_physical_memory_write(tc_addr + 30,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 30,    (uint8_t *) &val16, 2);
 
     val32 = cpu_to_le32(tally_counters->Tx1Col);
-    cpu_physical_memory_write(tc_addr + 32,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 32,    (uint8_t *) &val32, 4);
 
     val32 = cpu_to_le32(tally_counters->TxMCol);
-    cpu_physical_memory_write(tc_addr + 36,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 36,    (uint8_t *) &val32, 4);
 
     val64 = cpu_to_le64(tally_counters->RxOkPhy);
-    cpu_physical_memory_write(tc_addr + 40,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 40,    (uint8_t *) &val64, 8);
 
     val64 = cpu_to_le64(tally_counters->RxOkBrd);
-    cpu_physical_memory_write(tc_addr + 48,    (uint8_t *)&val64, 8);
+    iommu_write(iommu, dev, tc_addr + 48,    (uint8_t *) &val64, 8);
 
     val32 = cpu_to_le32(tally_counters->RxOkMul);
-    cpu_physical_memory_write(tc_addr + 56,    (uint8_t *)&val32, 4);
+    iommu_write(iommu, dev, tc_addr + 56,    (uint8_t *) &val32, 4);
 
     val16 = cpu_to_le16(tally_counters->TxAbt);
-    cpu_physical_memory_write(tc_addr + 60,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 60,    (uint8_t *) &val16, 2);
 
     val16 = cpu_to_le16(tally_counters->TxUndrn);
-    cpu_physical_memory_write(tc_addr + 62,    (uint8_t *)&val16, 2);
+    iommu_write(iommu, dev, tc_addr + 62,    (uint8_t *) &val16, 2);
 }
 
 /* Loads values of tally counters from VM state file */
@@ -1758,6 +1773,11 @@ static void rtl8139_transfer_frame(RTL8139State *s, const uint8_t *buf, int size
 
 static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ cannot transmit from descriptor %d: transmitter disabled\n",
@@ -1780,7 +1800,7 @@ static int rtl8139_transmit_one(RTL8139State *s, int descriptor)
     DEBUG_PRINT(("RTL8139: +++ transmit reading %d bytes from host memory at 0x%08x\n",
                  txsize, s->TxAddr[descriptor]));
 
-    cpu_physical_memory_read(s->TxAddr[descriptor], txbuffer, txsize);
+    iommu_read(iommu, dev, s->TxAddr[descriptor], txbuffer, txsize);
 
     /* Mark descriptor as transferred */
     s->TxStatus[descriptor] |= TxHostOwns;
@@ -1886,6 +1906,11 @@ static uint16_t ip_checksum(void *data, size_t len)
 
 static int rtl8139_cplus_transmit_one(RTL8139State *s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     if (!rtl8139_transmitter_enabled(s))
     {
         DEBUG_PRINT(("RTL8139: +++ C+ mode: transmitter disabled\n"));
@@ -1911,14 +1936,14 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     uint32_t val, txdw0,txdw1,txbufLO,txbufHI;
 
-    cpu_physical_memory_read(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc,    (uint8_t *) &val, 4);
     txdw0 = le32_to_cpu(val);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
-    cpu_physical_memory_read(cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+4,  (uint8_t *) &val, 4);
     txdw1 = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+8,  (uint8_t *) &val, 4);
     txbufLO = le32_to_cpu(val);
-    cpu_physical_memory_read(cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
+    iommu_read(iommu, dev, cplus_tx_ring_desc+12, (uint8_t *) &val, 4);
     txbufHI = le32_to_cpu(val);
 
     DEBUG_PRINT(("RTL8139: +++ C+ mode TX descriptor %d %08x %08x %08x %08x\n",
@@ -2025,7 +2050,8 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
     DEBUG_PRINT(("RTL8139: +++ C+ mode transmit reading %d bytes from host memory at %016" PRIx64 " to offset %d\n",
                  txsize, (uint64_t)tx_addr, s->cplus_txbuffer_offset));
 
-    cpu_physical_memory_read(tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
+    iommu_read(iommu, dev, tx_addr,
+               s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
     s->cplus_txbuffer_offset += txsize;
 
     /* seek to next Rx descriptor */
@@ -2052,7 +2078,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
     /* update ring data */
     val = cpu_to_le32(txdw0);
-    cpu_physical_memory_write(cplus_tx_ring_desc,    (uint8_t *)&val, 4);
+    iommu_write(iommu, dev, cplus_tx_ring_desc, (uint8_t *) &val, 4);
     /* TODO: implement VLAN tagging support, VLAN tag data is read to txdw1 */
 //    val = cpu_to_le32(txdw1);
 //    cpu_physical_memory_write(cplus_tx_ring_desc+4,  &val, 4);
@@ -2381,7 +2407,7 @@ static void rtl8139_TxStatus_write(RTL8139State *s, uint32_t txRegOffset, uint32
             target_phys_addr_t tc_addr = rtl8139_addr64(s->TxStatus[0] & ~0x3f, s->TxStatus[1]);
 
             /* dump tally counters to specified memory location */
-            RTL8139TallyCounters_physical_memory_write( tc_addr, &s->tally_counters);
+            RTL8139TallyCounters_physical_memory_write(&s->dev.qdev, tc_addr, &s->tally_counters);
 
             /* mark dump completed */
             s->TxStatus[0] &= ~0x8;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 6/7] eepro100: IOMMU support
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |  141 +++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 101 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 97afa2c..74e1d15 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -43,6 +43,7 @@
 
 #include <stddef.h>             /* offsetof */
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "net.h"
 #include "eeprom93xx.h"
@@ -306,10 +307,13 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(struct iommu *iommu,
+                        DeviceState *dev,
+                        target_phys_addr_t addr,
+                        uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    iommu_write(iommu, dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -687,17 +691,25 @@ static void set_ru_state(EEPRO100State * s, ru_state_t state)
 
 static void dump_statistics(EEPRO100State * s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+    
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     /* Dump statistical data. Most data is never changed by the emulation
      * and always 0, so we first just copy the whole block and then those
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    err = iommu_write(iommu, dev, s->statsaddr,
+                      (uint8_t *) &s->statistics, s->stats_size);
+    stl_le_phys(iommu, dev, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(iommu, dev, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(iommu, dev,
+                s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(iommu, dev,
+                s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +719,13 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
+    err = iommu_read(iommu, dev, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -723,6 +741,12 @@ static void tx_command(EEPRO100State *s)
     uint8_t buf[2600];
     uint16_t size = 0;
     uint32_t tbd_address = s->cb_address + 0x10;
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     TRACE(RXTX, logout
         ("transmit, TBD array address 0x%08x, TCB byte count 0x%04x, TBD count %u\n",
          tbd_array, tcb_bytes, s->tx.tbd_count));
@@ -737,18 +761,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = iommu_ldl(iommu, dev, tbd_address);
+        uint16_t tx_buffer_size = iommu_lduw(iommu, dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = iommu_lduw(tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        err = iommu_read(iommu, dev,
+                         tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +783,18 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = iommu_ldl(iommu, dev,
+                                                       tbd_address);
+                uint16_t tx_buffer_size = iommu_lduw(iommu, dev,
+                                                     tbd_address + 4);
+                uint16_t tx_buffer_el = iommu_lduw(iommu, dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                err = iommu_read(iommu, dev, tx_buffer_address,
+                                 &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +803,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = iommu_ldl(iommu, dev, tbd_address);
+            uint16_t tx_buffer_size = iommu_lduw(iommu, dev, tbd_address + 4);
+            uint16_t tx_buffer_el = iommu_lduw(iommu, dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            err = iommu_read(iommu, dev,
+                             tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -807,11 +833,17 @@ static void set_multicast_list(EEPRO100State *s)
 {
     uint16_t multicast_count = s->tx.tbd_array_addr & BITS(13, 0);
     uint16_t i;
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     memset(&s->mult[0], 0, sizeof(s->mult));
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        err = iommu_read(iommu, dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -821,6 +853,11 @@ static void set_multicast_list(EEPRO100State *s)
 
 static void action_command(EEPRO100State *s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     for (;;) {
         bool bit_el;
         bool bit_s;
@@ -845,12 +882,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            iommu_read(iommu, dev,
+                       s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            iommu_read(iommu, dev,
+                       s->cb_address + 8, &s->configuration[0],
+                       sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +919,8 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        iommu_stw(iommu, dev,
+                  s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -907,6 +947,11 @@ static void action_command(EEPRO100State *s)
 static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
 {
     cu_state_t cu_state;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     switch (val) {
     case CU_NOP:
         /* No operation. */
@@ -947,7 +992,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(iommu, dev, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +1003,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(iommu, dev, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1252,6 +1297,11 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     val = le32_to_cpu(val);
     uint32_t address = (val & ~PORT_SELECTION_MASK);
     uint8_t selection = (val & PORT_SELECTION_MASK);
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     switch (selection) {
     case PORT_SOFTWARE_RESET:
         nic_reset(s);
@@ -1259,10 +1309,12 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        iommu_read(iommu, dev,
+                   address, (uint8_t *) &data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        iommu_write(iommu, dev,
+                    address, (uint8_t *) &data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1646,6 +1698,10 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     uint16_t rfd_status = 0xa000;
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
 
     /* TODO: check multiple IA bit. */
     if (s->configuration[20] & BIT(6)) {
@@ -1721,8 +1777,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    iommu_read(iommu, dev,
+               s->ru_base + s->ru_offset, (uint8_t *) & rx,
+               offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1736,9 +1793,12 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    iommu_stw(iommu, dev,
+              s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+              rfd_status);
+    iommu_stw(iommu, dev,
+              s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count),
+              size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1752,8 +1812,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    iommu_write(iommu, dev,
+                s->ru_base + s->ru_offset +
+                offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 6/7] eepro100: IOMMU support
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/eepro100.c |  141 +++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 101 insertions(+), 40 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 97afa2c..74e1d15 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -43,6 +43,7 @@
 
 #include <stddef.h>             /* offsetof */
 #include "hw.h"
+#include "iommu.h"
 #include "pci.h"
 #include "net.h"
 #include "eeprom93xx.h"
@@ -306,10 +307,13 @@ static const uint16_t eepro100_mdi_mask[] = {
 };
 
 /* XXX: optimize */
-static void stl_le_phys(target_phys_addr_t addr, uint32_t val)
+static void stl_le_phys(struct iommu *iommu,
+                        DeviceState *dev,
+                        target_phys_addr_t addr,
+                        uint32_t val)
 {
     val = cpu_to_le32(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, sizeof(val));
+    iommu_write(iommu, dev, addr, (const uint8_t *)&val, sizeof(val));
 }
 
 #define POLYNOMIAL 0x04c11db6
@@ -687,17 +691,25 @@ static void set_ru_state(EEPRO100State * s, ru_state_t state)
 
 static void dump_statistics(EEPRO100State * s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+    
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     /* Dump statistical data. Most data is never changed by the emulation
      * and always 0, so we first just copy the whole block and then those
      * values which really matter.
      * Number of data should check configuration!!!
      */
-    cpu_physical_memory_write(s->statsaddr,
-                              (uint8_t *) & s->statistics, s->stats_size);
-    stl_le_phys(s->statsaddr + 0, s->statistics.tx_good_frames);
-    stl_le_phys(s->statsaddr + 36, s->statistics.rx_good_frames);
-    stl_le_phys(s->statsaddr + 48, s->statistics.rx_resource_errors);
-    stl_le_phys(s->statsaddr + 60, s->statistics.rx_short_frame_errors);
+    err = iommu_write(iommu, dev, s->statsaddr,
+                      (uint8_t *) &s->statistics, s->stats_size);
+    stl_le_phys(iommu, dev, s->statsaddr + 0, s->statistics.tx_good_frames);
+    stl_le_phys(iommu, dev, s->statsaddr + 36, s->statistics.rx_good_frames);
+    stl_le_phys(iommu, dev,
+                s->statsaddr + 48, s->statistics.rx_resource_errors);
+    stl_le_phys(iommu, dev,
+                s->statsaddr + 60, s->statistics.rx_short_frame_errors);
 #if 0
     stw_le_phys(s->statsaddr + 76, s->statistics.xmt_tco_frames);
     stw_le_phys(s->statsaddr + 78, s->statistics.rcv_tco_frames);
@@ -707,7 +719,13 @@ static void dump_statistics(EEPRO100State * s)
 
 static void read_cb(EEPRO100State *s)
 {
-    cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
+    err = iommu_read(iommu, dev, s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
     s->tx.status = le16_to_cpu(s->tx.status);
     s->tx.command = le16_to_cpu(s->tx.command);
     s->tx.link = le32_to_cpu(s->tx.link);
@@ -723,6 +741,12 @@ static void tx_command(EEPRO100State *s)
     uint8_t buf[2600];
     uint16_t size = 0;
     uint32_t tbd_address = s->cb_address + 0x10;
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     TRACE(RXTX, logout
         ("transmit, TBD array address 0x%08x, TCB byte count 0x%04x, TBD count %u\n",
          tbd_array, tcb_bytes, s->tx.tbd_count));
@@ -737,18 +761,18 @@ static void tx_command(EEPRO100State *s)
     }
     assert(tcb_bytes <= sizeof(buf));
     while (size < tcb_bytes) {
-        uint32_t tx_buffer_address = ldl_phys(tbd_address);
-        uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
+        uint32_t tx_buffer_address = iommu_ldl(iommu, dev, tbd_address);
+        uint16_t tx_buffer_size = iommu_lduw(iommu, dev, tbd_address + 4);
 #if 0
-        uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+        uint16_t tx_buffer_el = iommu_lduw(tbd_address + 6);
 #endif
         tbd_address += 8;
         TRACE(RXTX, logout
             ("TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n",
              tx_buffer_address, tx_buffer_size));
         tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-        cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                 tx_buffer_size);
+        err = iommu_read(iommu, dev,
+                         tx_buffer_address, &buf[size], tx_buffer_size);
         size += tx_buffer_size;
     }
     if (tbd_array == 0xffffffff) {
@@ -759,16 +783,18 @@ static void tx_command(EEPRO100State *s)
         if (s->has_extended_tcb_support && !(s->configuration[6] & BIT(4))) {
             /* Extended Flexible TCB. */
             for (; tbd_count < 2; tbd_count++) {
-                uint32_t tx_buffer_address = ldl_phys(tbd_address);
-                uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-                uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+                uint32_t tx_buffer_address = iommu_ldl(iommu, dev,
+                                                       tbd_address);
+                uint16_t tx_buffer_size = iommu_lduw(iommu, dev,
+                                                     tbd_address + 4);
+                uint16_t tx_buffer_el = iommu_lduw(iommu, dev, tbd_address + 6);
                 tbd_address += 8;
                 TRACE(RXTX, logout
                     ("TBD (extended flexible mode): buffer address 0x%08x, size 0x%04x\n",
                      tx_buffer_address, tx_buffer_size));
                 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-                cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                         tx_buffer_size);
+                err = iommu_read(iommu, dev, tx_buffer_address,
+                                 &buf[size], tx_buffer_size);
                 size += tx_buffer_size;
                 if (tx_buffer_el & 1) {
                     break;
@@ -777,16 +803,16 @@ static void tx_command(EEPRO100State *s)
         }
         tbd_address = tbd_array;
         for (; tbd_count < s->tx.tbd_count; tbd_count++) {
-            uint32_t tx_buffer_address = ldl_phys(tbd_address);
-            uint16_t tx_buffer_size = lduw_phys(tbd_address + 4);
-            uint16_t tx_buffer_el = lduw_phys(tbd_address + 6);
+            uint32_t tx_buffer_address = iommu_ldl(iommu, dev, tbd_address);
+            uint16_t tx_buffer_size = iommu_lduw(iommu, dev, tbd_address + 4);
+            uint16_t tx_buffer_el = iommu_lduw(iommu, dev, tbd_address + 6);
             tbd_address += 8;
             TRACE(RXTX, logout
                 ("TBD (flexible mode): buffer address 0x%08x, size 0x%04x\n",
                  tx_buffer_address, tx_buffer_size));
             tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-            cpu_physical_memory_read(tx_buffer_address, &buf[size],
-                                     tx_buffer_size);
+            err = iommu_read(iommu, dev,
+                             tx_buffer_address, &buf[size], tx_buffer_size);
             size += tx_buffer_size;
             if (tx_buffer_el & 1) {
                 break;
@@ -807,11 +833,17 @@ static void set_multicast_list(EEPRO100State *s)
 {
     uint16_t multicast_count = s->tx.tbd_array_addr & BITS(13, 0);
     uint16_t i;
+    struct iommu *iommu;
+    DeviceState *dev;
+    int err;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     memset(&s->mult[0], 0, sizeof(s->mult));
     TRACE(OTHER, logout("multicast list, multicast count = %u\n", multicast_count));
     for (i = 0; i < multicast_count; i += 6) {
         uint8_t multicast_addr[6];
-        cpu_physical_memory_read(s->cb_address + 10 + i, multicast_addr, 6);
+        err = iommu_read(iommu, dev, s->cb_address + 10 + i, multicast_addr, 6);
         TRACE(OTHER, logout("multicast entry %s\n", nic_dump(multicast_addr, 6)));
         unsigned mcast_idx = compute_mcast_idx(multicast_addr);
         assert(mcast_idx < 64);
@@ -821,6 +853,11 @@ static void set_multicast_list(EEPRO100State *s)
 
 static void action_command(EEPRO100State *s)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     for (;;) {
         bool bit_el;
         bool bit_s;
@@ -845,12 +882,14 @@ static void action_command(EEPRO100State *s)
             /* Do nothing. */
             break;
         case CmdIASetup:
-            cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 6);
+            iommu_read(iommu, dev,
+                       s->cb_address + 8, &s->conf.macaddr.a[0], 6);
             TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6)));
             break;
         case CmdConfigure:
-            cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
-                                     sizeof(s->configuration));
+            iommu_read(iommu, dev,
+                       s->cb_address + 8, &s->configuration[0],
+                       sizeof(s->configuration));
             TRACE(OTHER, logout("configuration: %s\n", nic_dump(&s->configuration[0], 16)));
             break;
         case CmdMulticastList:
@@ -880,7 +919,8 @@ static void action_command(EEPRO100State *s)
             break;
         }
         /* Write new status. */
-        stw_phys(s->cb_address, s->tx.status | ok_status | STATUS_C);
+        iommu_stw(iommu, dev,
+                  s->cb_address, s->tx.status | ok_status | STATUS_C);
         if (bit_i) {
             /* CU completed action. */
             eepro100_cx_interrupt(s);
@@ -907,6 +947,11 @@ static void action_command(EEPRO100State *s)
 static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
 {
     cu_state_t cu_state;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     switch (val) {
     case CU_NOP:
         /* No operation. */
@@ -947,7 +992,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa005);
+        stl_le_phys(iommu, dev, s->statsaddr + s->stats_size, 0xa005);
         break;
     case CU_CMD_BASE:
         /* Load CU base. */
@@ -958,7 +1003,7 @@ static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
         /* Dump and reset statistical counters. */
         TRACE(OTHER, logout("val=0x%02x (dump stats and reset)\n", val));
         dump_statistics(s);
-        stl_le_phys(s->statsaddr + s->stats_size, 0xa007);
+        stl_le_phys(iommu, dev, s->statsaddr + s->stats_size, 0xa007);
         memset(&s->statistics, 0, sizeof(s->statistics));
         break;
     case CU_SRESUME:
@@ -1252,6 +1297,11 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     val = le32_to_cpu(val);
     uint32_t address = (val & ~PORT_SELECTION_MASK);
     uint8_t selection = (val & PORT_SELECTION_MASK);
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     switch (selection) {
     case PORT_SOFTWARE_RESET:
         nic_reset(s);
@@ -1259,10 +1309,12 @@ static void eepro100_write_port(EEPRO100State * s, uint32_t val)
     case PORT_SELFTEST:
         TRACE(OTHER, logout("selftest address=0x%08x\n", address));
         eepro100_selftest_t data;
-        cpu_physical_memory_read(address, (uint8_t *) & data, sizeof(data));
+        iommu_read(iommu, dev,
+                   address, (uint8_t *) &data, sizeof(data));
         data.st_sign = 0xffffffff;
         data.st_result = 0;
-        cpu_physical_memory_write(address, (uint8_t *) & data, sizeof(data));
+        iommu_write(iommu, dev,
+                    address, (uint8_t *) &data, sizeof(data));
         break;
     case PORT_SELECTIVE_RESET:
         TRACE(OTHER, logout("selective reset, selftest address=0x%08x\n", address));
@@ -1646,6 +1698,10 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     uint16_t rfd_status = 0xa000;
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
 
     /* TODO: check multiple IA bit. */
     if (s->configuration[20] & BIT(6)) {
@@ -1721,8 +1777,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     /* !!! */
     eepro100_rx_t rx;
-    cpu_physical_memory_read(s->ru_base + s->ru_offset, (uint8_t *) & rx,
-                             offsetof(eepro100_rx_t, packet));
+    iommu_read(iommu, dev,
+               s->ru_base + s->ru_offset, (uint8_t *) & rx,
+               offsetof(eepro100_rx_t, packet));
     uint16_t rfd_command = le16_to_cpu(rx.command);
     uint16_t rfd_size = le16_to_cpu(rx.size);
 
@@ -1736,9 +1793,12 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
     }
     TRACE(OTHER, logout("command 0x%04x, link 0x%08x, addr 0x%08x, size %u\n",
           rfd_command, rx.link, rx.rx_buf_addr, rfd_size));
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
-             rfd_status);
-    stw_phys(s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count), size);
+    iommu_stw(iommu, dev,
+              s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, status),
+              rfd_status);
+    iommu_stw(iommu, dev,
+              s->ru_base + s->ru_offset + offsetof(eepro100_rx_t, count),
+              size);
     /* Early receive interrupt not supported. */
 #if 0
     eepro100_er_interrupt(s);
@@ -1752,8 +1812,9 @@ static ssize_t nic_receive(VLANClientState *nc, const uint8_t * buf, size_t size
 #if 0
     assert(!(s->configuration[17] & BIT(0)));
 #endif
-    cpu_physical_memory_write(s->ru_base + s->ru_offset +
-                              offsetof(eepro100_rx_t, packet), buf, size);
+    iommu_write(iommu, dev,
+                s->ru_base + s->ru_offset +
+                offsetof(eepro100_rx_t, packet), buf, size);
     s->statistics.rx_good_frames++;
     eepro100_fr_interrupt(s);
     s->ru_offset = le32_to_cpu(rx.link);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH 7/7] ac97: IOMMU support
  2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: avi, paul, kvm, qemu-devel, Eduard - Gabriel Munteanu

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 4319bc8..0e30d80 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -15,6 +15,7 @@
  */
 
 #include "hw.h"
+#include "iommu.h"
 #include "audiodev.h"
 #include "audio/audio.h"
 #include "pci.h"
@@ -221,9 +222,13 @@ static void cold_reset (AC97LinkState * s)
 
 static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
+    iommu_read (iommu, dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -962,6 +967,9 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     uint32_t temp = r->picb << 1;
     uint32_t written = 0;
     int to_copy = 0;
+    struct iommu *iommu;
+    DeviceState *dev;
+
     temp = audio_MIN (temp, max);
 
     if (!temp) {
@@ -969,10 +977,12 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
         return 0;
     }
 
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        iommu_read (iommu, dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1040,6 +1050,10 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     uint32_t nread = 0;
     int to_copy = 0;
     SWVoiceIn *voice = (r - s->bm_regs) == MC_INDEX ? s->voice_mc : s->voice_pi;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
 
     temp = audio_MIN (temp, max);
 
@@ -1056,7 +1070,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        iommu_write (iommu, dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [Qemu-devel] [RFC PATCH 7/7] ac97: IOMMU support
@ 2010-07-14  5:45   ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14  5:45 UTC (permalink / raw)
  To: joro; +Cc: qemu-devel, Eduard - Gabriel Munteanu, avi, kvm, paul

Memory accesses must go through the IOMMU layer.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 hw/ac97.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 4319bc8..0e30d80 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -15,6 +15,7 @@
  */
 
 #include "hw.h"
+#include "iommu.h"
 #include "audiodev.h"
 #include "audio/audio.h"
 #include "pci.h"
@@ -221,9 +222,13 @@ static void cold_reset (AC97LinkState * s)
 
 static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs *r)
 {
+    struct iommu *iommu;
+    DeviceState *dev;
     uint8_t b[8];
 
-    cpu_physical_memory_read (r->bdbar + r->civ * 8, b, 8);
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
+    iommu_read (iommu, dev, r->bdbar + r->civ * 8, b, 8);
     r->bd_valid = 1;
     r->bd.addr = le32_to_cpu (*(uint32_t *) &b[0]) & ~3;
     r->bd.ctl_len = le32_to_cpu (*(uint32_t *) &b[4]);
@@ -962,6 +967,9 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     uint32_t temp = r->picb << 1;
     uint32_t written = 0;
     int to_copy = 0;
+    struct iommu *iommu;
+    DeviceState *dev;
+
     temp = audio_MIN (temp, max);
 
     if (!temp) {
@@ -969,10 +977,12 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs *r,
         return 0;
     }
 
+    iommu = iommu_get(&s->dev.qdev, &dev);
+
     while (temp) {
         int copied;
         to_copy = audio_MIN (temp, sizeof (tmpbuf));
-        cpu_physical_memory_read (addr, tmpbuf, to_copy);
+        iommu_read (iommu, dev, addr, tmpbuf, to_copy);
         copied = AUD_write (s->voice_po, tmpbuf, to_copy);
         dolog ("write_audio max=%x to_copy=%x copied=%x\n",
                max, to_copy, copied);
@@ -1040,6 +1050,10 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
     uint32_t nread = 0;
     int to_copy = 0;
     SWVoiceIn *voice = (r - s->bm_regs) == MC_INDEX ? s->voice_mc : s->voice_pi;
+    struct iommu *iommu;
+    DeviceState *dev;
+
+    iommu = iommu_get(&s->dev.qdev, &dev);
 
     temp = audio_MIN (temp, max);
 
@@ -1056,7 +1070,7 @@ static int read_audio (AC97LinkState *s, AC97BusMasterRegs *r,
             *stop = 1;
             break;
         }
-        cpu_physical_memory_write (addr, tmpbuf, acquired);
+        iommu_write (iommu, dev, addr, tmpbuf, acquired);
         temp -= acquired;
         addr += acquired;
         nread += acquired;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 1/7] Generic IOMMU layer
  2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  6:07     ` malc
  -1 siblings, 0 replies; 83+ messages in thread
From: malc @ 2010-07-14  6:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:

> This provides an API for abstracting IOMMU functions. Hardware emulation
> code can use it to request address translation and access checking. In
> the absence of an emulated IOMMU, no translation/checking happens and
> I/O goes through as before.
> 
> IOMMU emulation code must provide implementation-specific hooks for this
> layer.
> 

[..snip..]

> +int __iommu_rw(struct iommu *iommu,
> +               DeviceState *dev,
> +               target_phys_addr_t addr,
> +               uint8_t *buf,
> +               int len,
> +               int is_write)

Do not use leading double underscore.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/7] Generic IOMMU layer
@ 2010-07-14  6:07     ` malc
  0 siblings, 0 replies; 83+ messages in thread
From: malc @ 2010-07-14  6:07 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:

> This provides an API for abstracting IOMMU functions. Hardware emulation
> code can use it to request address translation and access checking. In
> the absence of an emulated IOMMU, no translation/checking happens and
> I/O goes through as before.
> 
> IOMMU emulation code must provide implementation-specific hooks for this
> layer.
> 

[..snip..]

> +int __iommu_rw(struct iommu *iommu,
> +               DeviceState *dev,
> +               target_phys_addr_t addr,
> +               uint8_t *buf,
> +               int len,
> +               int is_write)

Do not use leading double underscore.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 7/7] ac97: IOMMU support
  2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  6:09     ` malc
  -1 siblings, 0 replies; 83+ messages in thread
From: malc @ 2010-07-14  6:09 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:

> Memory accesses must go through the IOMMU layer.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/ac97.c |   20 +++++++++++++++++---
>  1 files changed, 17 insertions(+), 3 deletions(-)

Fine with me.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 7/7] ac97: IOMMU support
@ 2010-07-14  6:09     ` malc
  0 siblings, 0 replies; 83+ messages in thread
From: malc @ 2010-07-14  6:09 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:

> Memory accesses must go through the IOMMU layer.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/ac97.c |   20 +++++++++++++++++---
>  1 files changed, 17 insertions(+), 3 deletions(-)

Fine with me.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks
  2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14  7:37     ` Isaku Yamahata
  -1 siblings, 0 replies; 83+ messages in thread
From: Isaku Yamahata @ 2010-07-14  7:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, qemu-devel, avi, kvm, paul

On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:
> Memory accesses must go through the IOMMU layer.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.c |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 6871728..9c5d706 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "hw.h"
> +#include "iommu.h"
>  #include "pci.h"
>  #include "monitor.h"
>  #include "net.h"
> @@ -733,12 +734,25 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
>      pci_config_free(pci_dev);
>  }
>  
> +#ifdef CONFIG_IOMMU
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +    return iommu_register_device(bus->qbus.iommu, &dev->qdev);
> +}
> +#else
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +    return 0;
> +}
> +#endif
> +
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
>                                 int instance_size, int devfn,
>                                 PCIConfigReadFunc *config_read,
>                                 PCIConfigWriteFunc *config_write)
>  {
>      PCIDevice *pci_dev;
> +    int err;
>  
>      pci_dev = qemu_mallocz(instance_size);
>      pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
>      if (pci_dev == NULL) {
>          hw_error("PCI: can't register device\n");
>      }
> +
> +    err = pci_iommu_register_device(bus, pci_dev);
> +    if (err) {
> +        hw_error("PCI: can't register device with IOMMU\n");
> +        return NULL;
> +    }
> +
>      return pci_dev;
>  }

pci_register_device() is pre-qdev api.
qdev'fied device doesn't call pci_register_device().
So please move the initialization hook into do_pci_register_device()
which are commonly used by pci_register_device() and pci_qdev_init().
-- 
yamahata

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks
@ 2010-07-14  7:37     ` Isaku Yamahata
  0 siblings, 0 replies; 83+ messages in thread
From: Isaku Yamahata @ 2010-07-14  7:37 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:
> Memory accesses must go through the IOMMU layer.
> 
> Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
> ---
>  hw/pci.c |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 6871728..9c5d706 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "hw.h"
> +#include "iommu.h"
>  #include "pci.h"
>  #include "monitor.h"
>  #include "net.h"
> @@ -733,12 +734,25 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
>      pci_config_free(pci_dev);
>  }
>  
> +#ifdef CONFIG_IOMMU
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +    return iommu_register_device(bus->qbus.iommu, &dev->qdev);
> +}
> +#else
> +static inline int pci_iommu_register_device(PCIBus *bus, PCIDevice *dev)
> +{
> +    return 0;
> +}
> +#endif
> +
>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
>                                 int instance_size, int devfn,
>                                 PCIConfigReadFunc *config_read,
>                                 PCIConfigWriteFunc *config_write)
>  {
>      PCIDevice *pci_dev;
> +    int err;
>  
>      pci_dev = qemu_mallocz(instance_size);
>      pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
>      if (pci_dev == NULL) {
>          hw_error("PCI: can't register device\n");
>      }
> +
> +    err = pci_iommu_register_device(bus, pci_dev);
> +    if (err) {
> +        hw_error("PCI: can't register device with IOMMU\n");
> +        return NULL;
> +    }
> +
>      return pci_dev;
>  }

pci_register_device() is pre-qdev api.
qdev'fied device doesn't call pci_register_device().
So please move the initialization hook into do_pci_register_device()
which are commonly used by pci_register_device() and pci_qdev_init().
-- 
yamahata

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14 13:53     ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-14 13:53 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, avi, kvm, qemu-devel

> Memory accesses must go through the IOMMU layer.

No. Devices should not know or care whether an IOMMU is present.

You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
This should then handle IOMMU translation transparently.

You also need to accomodate the the case where multiple IOMMU are present.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 13:53     ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-14 13:53 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, avi, kvm, qemu-devel

> Memory accesses must go through the IOMMU layer.

No. Devices should not know or care whether an IOMMU is present.

You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
This should then handle IOMMU translation transparently.

You also need to accomodate the the case where multiple IOMMU are present.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 13:53     ` [Qemu-devel] " Paul Brook
@ 2010-07-14 18:33       ` Joerg Roedel
  -1 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-14 18:33 UTC (permalink / raw)
  To: Paul Brook; +Cc: Eduard - Gabriel Munteanu, avi, kvm, qemu-devel

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

There are real devices that care very much about an IOMMU. Basically all
devices supporting ATS care about that. So I don't see a problem if the
device emulation code of qemu also cares about present IOMMUs.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.

That's not a good idea imho. With an IOMMU the device no longer accesses
cpu physical memory. It accesses device virtual memory. Using
cpu_physical_memory* functions in device code becomes misleading when
the device virtual address space differs from cpu physical. So different
functions for devices make a lot of sense here. Another reason for
seperate functions is that we can extend them later to support emulation
of ATS devices.

> You also need to accomodate the the case where multiple IOMMU are present.

This, indeed, is something transparent to the device. This should be
handled inside the iommu emulation code.

	Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 18:33       ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-14 18:33 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

There are real devices that care very much about an IOMMU. Basically all
devices supporting ATS care about that. So I don't see a problem if the
device emulation code of qemu also cares about present IOMMUs.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.

That's not a good idea imho. With an IOMMU the device no longer accesses
cpu physical memory. It accesses device virtual memory. Using
cpu_physical_memory* functions in device code becomes misleading when
the device virtual address space differs from cpu physical. So different
functions for devices make a lot of sense here. Another reason for
seperate functions is that we can extend them later to support emulation
of ATS devices.

> You also need to accomodate the the case where multiple IOMMU are present.

This, indeed, is something transparent to the device. This should be
handled inside the iommu emulation code.

	Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 18:33       ` [Qemu-devel] " Joerg Roedel
  (?)
@ 2010-07-14 20:13       ` Paul Brook
  2010-07-14 21:29           ` Anthony Liguori
                           ` (2 more replies)
  -1 siblings, 3 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-14 20:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: Joerg Roedel, avi, kvm, Eduard - Gabriel Munteanu

> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > > Memory accesses must go through the IOMMU layer.
> > 
> > No. Devices should not know or care whether an IOMMU is present.
> 
> There are real devices that care very much about an IOMMU. Basically all
> devices supporting ATS care about that. So I don't see a problem if the
> device emulation code of qemu also cares about present IOMMUs.
> 
> > You should be adding a DeviceState argument to
> > cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> > transparently.
> 
> That's not a good idea imho. With an IOMMU the device no longer accesses
> cpu physical memory. It accesses device virtual memory. Using
> cpu_physical_memory* functions in device code becomes misleading when
> the device virtual address space differs from cpu physical. 

Well, ok, the function name needs fixing too.  However I think the only thing 
missing from the current API is that it does not provide a way to determine 
which device is performing the access.

Depending how the we decide to handle IOMMU invalidation, it may also be 
necessary to augment the memory_map API to allow the system to request a 
mapping be revoked.  However this issue is not specific to the IOMMU 
implementation. Such bugs are already present on any system that allows 
dynamic reconfiguration of the address space, e.g. by changing PCI BARs.

> So different
> functions for devices make a lot of sense here. Another reason for
> seperate functions is that we can extend them later to support emulation
> of ATS devices.

I disagree. ATS should be an independent feature, and is inherently bus 
specific.  As usual the PCI spec is not publicly available, but based on the 
AMD IOMMU docs I'd say that ATS is completely independent of memory accesses - 
the convention being that you trust an ATS capable device to DTRT, and 
configure the bus IOMMU to apply a flat mapping for accesses from such 
devices.

> > You also need to accomodate the the case where multiple IOMMU are
> > present.
> 
> This, indeed, is something transparent to the device. This should be
> handled inside the iommu emulation code.

I think you've got the abstraction boundaries all wrong.

A device performs a memory access on its local bus. It has no knowledge of how 
that access is routed to its destination.  The device should not be aware of 
any IOMMUs, in the same way that it doesn't know whether it happens to be 
accessing RAM or memory mapped peripherals on another device.

Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
between a PCI bus and the system bus. It provides a address mapping from one 
bus to another. 

There should be no direct interaction between an IOMMU and a device (ignoring 
ATS, which is effectively a separate data channel).  Everything should be done 
via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
IOMMUs there should be no direct interatcion between these. 
cpu_physical_memory_* should walk the device/bus tree to determine where the 
access terminates, applying mappings appropriately.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/7] AMD IOMMU emulation
  2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-14 20:16     ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-14 20:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduard - Gabriel Munteanu, joro, avi, kvm

> +  --enable-amd-iommu-emul) amd_iommu="yes"

> +#ifdef CONFIG_AMD_IOMMU
> +    amd_iommu_init(pci_bus);
> +#endif

This should not be a configure option.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/7] AMD IOMMU emulation
@ 2010-07-14 20:16     ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-14 20:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: joro, avi, kvm, Eduard - Gabriel Munteanu

> +  --enable-amd-iommu-emul) amd_iommu="yes"

> +#ifdef CONFIG_AMD_IOMMU
> +    amd_iommu_init(pci_bus);
> +#endif

This should not be a configure option.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 20:13       ` Paul Brook
@ 2010-07-14 21:29           ` Anthony Liguori
  2010-07-14 23:39           ` Eduard - Gabriel Munteanu
  2010-07-15  9:22           ` Joerg Roedel
  2 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-14 21:29 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, Joerg Roedel, avi, kvm, Eduard - Gabriel Munteanu

On 07/14/2010 03:13 PM, Paul Brook wrote:
>> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
>>      
>>>> Memory accesses must go through the IOMMU layer.
>>>>          
>>> No. Devices should not know or care whether an IOMMU is present.
>>>        
>> There are real devices that care very much about an IOMMU. Basically all
>> devices supporting ATS care about that. So I don't see a problem if the
>> device emulation code of qemu also cares about present IOMMUs.
>>
>>      
>>> You should be adding a DeviceState argument to
>>> cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
>>> transparently.
>>>        
>> That's not a good idea imho. With an IOMMU the device no longer accesses
>> cpu physical memory. It accesses device virtual memory. Using
>> cpu_physical_memory* functions in device code becomes misleading when
>> the device virtual address space differs from cpu physical.
>>      
> Well, ok, the function name needs fixing too.  However I think the only thing
> missing from the current API is that it does not provide a way to determine
> which device is performing the access.
>    

I agree with Paul.

The right approach IMHO is to convert devices to use bus-specific 
functions to access memory.  The bus specific functions should have a 
device argument as the first parameter.

For PCI-based IOMMUs, the implementation exists solely within the PCI 
bus.  For platforms (like SPARC) that have lower level IOMMUs, we would 
need to probably introduce a sysbus memory access layer and then provide 
a hook to implement an IOMMU there.

> Depending how the we decide to handle IOMMU invalidation, it may also be
> necessary to augment the memory_map API to allow the system to request a
> mapping be revoked.  However this issue is not specific to the IOMMU
> implementation. Such bugs are already present on any system that allows
> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
>    

That's why the memory_map API today does not allow mappings to persist 
after trips back to the main loop.

Regards,

Anthony Liguori

>> So different
>> functions for devices make a lot of sense here. Another reason for
>> seperate functions is that we can extend them later to support emulation
>> of ATS devices.
>>      
> I disagree. ATS should be an independent feature, and is inherently bus
> specific.  As usual the PCI spec is not publicly available, but based on the
> AMD IOMMU docs I'd say that ATS is completely independent of memory accesses -
> the convention being that you trust an ATS capable device to DTRT, and
> configure the bus IOMMU to apply a flat mapping for accesses from such
> devices.
>
>    
>>> You also need to accomodate the the case where multiple IOMMU are
>>> present.
>>>        
>> This, indeed, is something transparent to the device. This should be
>> handled inside the iommu emulation code.
>>      
> I think you've got the abstraction boundaries all wrong.
>
> A device performs a memory access on its local bus. It has no knowledge of how
> that access is routed to its destination.  The device should not be aware of
> any IOMMUs, in the same way that it doesn't know whether it happens to be
> accessing RAM or memory mapped peripherals on another device.
>
> Each IOMMU is fundamentally part of a bus bridge. For example the bridge
> between a PCI bus and the system bus. It provides a address mapping from one
> bus to another.
>
> There should be no direct interaction between an IOMMU and a device (ignoring
> ATS, which is effectively a separate data channel).  Everything should be done
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested
> IOMMUs there should be no direct interatcion between these.
> cpu_physical_memory_* should walk the device/bus tree to determine where the
> access terminates, applying mappings appropriately.
>
> Paul
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 21:29           ` Anthony Liguori
  0 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-14 21:29 UTC (permalink / raw)
  To: Paul Brook; +Cc: Joerg Roedel, Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

On 07/14/2010 03:13 PM, Paul Brook wrote:
>> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
>>      
>>>> Memory accesses must go through the IOMMU layer.
>>>>          
>>> No. Devices should not know or care whether an IOMMU is present.
>>>        
>> There are real devices that care very much about an IOMMU. Basically all
>> devices supporting ATS care about that. So I don't see a problem if the
>> device emulation code of qemu also cares about present IOMMUs.
>>
>>      
>>> You should be adding a DeviceState argument to
>>> cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
>>> transparently.
>>>        
>> That's not a good idea imho. With an IOMMU the device no longer accesses
>> cpu physical memory. It accesses device virtual memory. Using
>> cpu_physical_memory* functions in device code becomes misleading when
>> the device virtual address space differs from cpu physical.
>>      
> Well, ok, the function name needs fixing too.  However I think the only thing
> missing from the current API is that it does not provide a way to determine
> which device is performing the access.
>    

I agree with Paul.

The right approach IMHO is to convert devices to use bus-specific 
functions to access memory.  The bus specific functions should have a 
device argument as the first parameter.

For PCI-based IOMMUs, the implementation exists solely within the PCI 
bus.  For platforms (like SPARC) that have lower level IOMMUs, we would 
need to probably introduce a sysbus memory access layer and then provide 
a hook to implement an IOMMU there.

> Depending how the we decide to handle IOMMU invalidation, it may also be
> necessary to augment the memory_map API to allow the system to request a
> mapping be revoked.  However this issue is not specific to the IOMMU
> implementation. Such bugs are already present on any system that allows
> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
>    

That's why the memory_map API today does not allow mappings to persist 
after trips back to the main loop.

Regards,

Anthony Liguori

>> So different
>> functions for devices make a lot of sense here. Another reason for
>> seperate functions is that we can extend them later to support emulation
>> of ATS devices.
>>      
> I disagree. ATS should be an independent feature, and is inherently bus
> specific.  As usual the PCI spec is not publicly available, but based on the
> AMD IOMMU docs I'd say that ATS is completely independent of memory accesses -
> the convention being that you trust an ATS capable device to DTRT, and
> configure the bus IOMMU to apply a flat mapping for accesses from such
> devices.
>
>    
>>> You also need to accomodate the the case where multiple IOMMU are
>>> present.
>>>        
>> This, indeed, is something transparent to the device. This should be
>> handled inside the iommu emulation code.
>>      
> I think you've got the abstraction boundaries all wrong.
>
> A device performs a memory access on its local bus. It has no knowledge of how
> that access is routed to its destination.  The device should not be aware of
> any IOMMUs, in the same way that it doesn't know whether it happens to be
> accessing RAM or memory mapped peripherals on another device.
>
> Each IOMMU is fundamentally part of a bus bridge. For example the bridge
> between a PCI bus and the system bus. It provides a address mapping from one
> bus to another.
>
> There should be no direct interaction between an IOMMU and a device (ignoring
> ATS, which is effectively a separate data channel).  Everything should be done
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested
> IOMMUs there should be no direct interatcion between these.
> cpu_physical_memory_* should walk the device/bus tree to determine where the
> access terminates, applying mappings appropriately.
>
> Paul
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 21:29           ` Anthony Liguori
@ 2010-07-14 22:24             ` Chris Wright
  -1 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-14 22:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Paul Brook, Joerg Roedel, Eduard - Gabriel Munteanu, qemu-devel,
	kvm, avi

* Anthony Liguori (anthony@codemonkey.ws) wrote:
> On 07/14/2010 03:13 PM, Paul Brook wrote:
> >>On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> >>>>Memory accesses must go through the IOMMU layer.
> >>>No. Devices should not know or care whether an IOMMU is present.
> >>There are real devices that care very much about an IOMMU. Basically all
> >>devices supporting ATS care about that. So I don't see a problem if the
> >>device emulation code of qemu also cares about present IOMMUs.
> >>
> >>>You should be adding a DeviceState argument to
> >>>cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> >>>transparently.
> >>That's not a good idea imho. With an IOMMU the device no longer accesses
> >>cpu physical memory. It accesses device virtual memory. Using
> >>cpu_physical_memory* functions in device code becomes misleading when
> >>the device virtual address space differs from cpu physical.
> >Well, ok, the function name needs fixing too.  However I think the only thing
> >missing from the current API is that it does not provide a way to determine
> >which device is performing the access.
> 
> I agree with Paul.

I do too.

> The right approach IMHO is to convert devices to use bus-specific
> functions to access memory.  The bus specific functions should have
> a device argument as the first parameter.

As for ATS, the internal api to handle the device's dma reqeust needs
a notion of a translated vs. an untranslated request.  IOW, if qemu ever
had a device with ATS support, the device would use its local cache to
translate the dma address and then submit a translated request to the
pci bus (effectively doing a raw cpu physical memory* in that case).

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 22:24             ` Chris Wright
  0 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-14 22:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Joerg Roedel, qemu-devel, avi, Eduard - Gabriel Munteanu,
	Paul Brook

* Anthony Liguori (anthony@codemonkey.ws) wrote:
> On 07/14/2010 03:13 PM, Paul Brook wrote:
> >>On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> >>>>Memory accesses must go through the IOMMU layer.
> >>>No. Devices should not know or care whether an IOMMU is present.
> >>There are real devices that care very much about an IOMMU. Basically all
> >>devices supporting ATS care about that. So I don't see a problem if the
> >>device emulation code of qemu also cares about present IOMMUs.
> >>
> >>>You should be adding a DeviceState argument to
> >>>cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> >>>transparently.
> >>That's not a good idea imho. With an IOMMU the device no longer accesses
> >>cpu physical memory. It accesses device virtual memory. Using
> >>cpu_physical_memory* functions in device code becomes misleading when
> >>the device virtual address space differs from cpu physical.
> >Well, ok, the function name needs fixing too.  However I think the only thing
> >missing from the current API is that it does not provide a way to determine
> >which device is performing the access.
> 
> I agree with Paul.

I do too.

> The right approach IMHO is to convert devices to use bus-specific
> functions to access memory.  The bus specific functions should have
> a device argument as the first parameter.

As for ATS, the internal api to handle the device's dma reqeust needs
a notion of a translated vs. an untranslated request.  IOW, if qemu ever
had a device with ATS support, the device would use its local cache to
translate the dma address and then submit a translated request to the
pci bus (effectively doing a raw cpu physical memory* in that case).

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/7] Generic IOMMU layer
  2010-07-14  6:07     ` [Qemu-devel] " malc
@ 2010-07-14 22:47       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 22:47 UTC (permalink / raw)
  To: malc; +Cc: joro, qemu-devel, avi, kvm, paul

On Wed, Jul 14, 2010 at 10:07:20AM +0400, malc wrote:
> On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:
> 
> > This provides an API for abstracting IOMMU functions. Hardware emulation
> > code can use it to request address translation and access checking. In
> > the absence of an emulated IOMMU, no translation/checking happens and
> > I/O goes through as before.
> > 
> > IOMMU emulation code must provide implementation-specific hooks for this
> > layer.
> > 
> 
> [..snip..]
> 
> > +int __iommu_rw(struct iommu *iommu,
> > +               DeviceState *dev,
> > +               target_phys_addr_t addr,
> > +               uint8_t *buf,
> > +               int len,
> > +               int is_write)
> 
> Do not use leading double underscore.
> 
> [..snip..]
> 
> -- 
> mailto:av1474@comtv.ru

Thanks, will fix it.


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/7] Generic IOMMU layer
@ 2010-07-14 22:47       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 22:47 UTC (permalink / raw)
  To: malc; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, Jul 14, 2010 at 10:07:20AM +0400, malc wrote:
> On Wed, 14 Jul 2010, Eduard - Gabriel Munteanu wrote:
> 
> > This provides an API for abstracting IOMMU functions. Hardware emulation
> > code can use it to request address translation and access checking. In
> > the absence of an emulated IOMMU, no translation/checking happens and
> > I/O goes through as before.
> > 
> > IOMMU emulation code must provide implementation-specific hooks for this
> > layer.
> > 
> 
> [..snip..]
> 
> > +int __iommu_rw(struct iommu *iommu,
> > +               DeviceState *dev,
> > +               target_phys_addr_t addr,
> > +               uint8_t *buf,
> > +               int len,
> > +               int is_write)
> 
> Do not use leading double underscore.
> 
> [..snip..]
> 
> -- 
> mailto:av1474@comtv.ru

Thanks, will fix it.


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks
  2010-07-14  7:37     ` Isaku Yamahata
@ 2010-07-14 22:50       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 22:50 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: joro, qemu-devel, avi, kvm, paul

On Wed, Jul 14, 2010 at 04:37:39PM +0900, Isaku Yamahata wrote:
> On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:

[snip]

> >  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> >                                 int instance_size, int devfn,
> >                                 PCIConfigReadFunc *config_read,
> >                                 PCIConfigWriteFunc *config_write)
> >  {
> >      PCIDevice *pci_dev;
> > +    int err;
> >  
> >      pci_dev = qemu_mallocz(instance_size);
> >      pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> > @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> >      if (pci_dev == NULL) {
> >          hw_error("PCI: can't register device\n");
> >      }
> > +
> > +    err = pci_iommu_register_device(bus, pci_dev);
> > +    if (err) {
> > +        hw_error("PCI: can't register device with IOMMU\n");
> > +        return NULL;
> > +    }
> > +
> >      return pci_dev;
> >  }
> 
> pci_register_device() is pre-qdev api.
> qdev'fied device doesn't call pci_register_device().
> So please move the initialization hook into do_pci_register_device()
> which are commonly used by pci_register_device() and pci_qdev_init().
> -- 
> yamahata

Thanks, I didn't need the functionality and missed this.


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/7] pci: call IOMMU hooks
@ 2010-07-14 22:50       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 22:50 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: joro, paul, qemu-devel, kvm, avi

On Wed, Jul 14, 2010 at 04:37:39PM +0900, Isaku Yamahata wrote:
> On Wed, Jul 14, 2010 at 08:45:03AM +0300, Eduard - Gabriel Munteanu wrote:

[snip]

> >  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> >                                 int instance_size, int devfn,
> >                                 PCIConfigReadFunc *config_read,
> >                                 PCIConfigWriteFunc *config_write)
> >  {
> >      PCIDevice *pci_dev;
> > +    int err;
> >  
> >      pci_dev = qemu_mallocz(instance_size);
> >      pci_dev = do_pci_register_device(pci_dev, bus, name, devfn,
> > @@ -747,6 +761,13 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> >      if (pci_dev == NULL) {
> >          hw_error("PCI: can't register device\n");
> >      }
> > +
> > +    err = pci_iommu_register_device(bus, pci_dev);
> > +    if (err) {
> > +        hw_error("PCI: can't register device with IOMMU\n");
> > +        return NULL;
> > +    }
> > +
> >      return pci_dev;
> >  }
> 
> pci_register_device() is pre-qdev api.
> qdev'fied device doesn't call pci_register_device().
> So please move the initialization hook into do_pci_register_device()
> which are commonly used by pci_register_device() and pci_qdev_init().
> -- 
> yamahata

Thanks, I didn't need the functionality and missed this.


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 13:53     ` [Qemu-devel] " Paul Brook
@ 2010-07-14 23:11       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 23:11 UTC (permalink / raw)
  To: Paul Brook; +Cc: joro, avi, kvm, qemu-devel

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

They don't really care. iommu_get() et al. are convenience functions
which can and do return NULL when there's no IOMMU and device code can
pass that NULL around without checking. I could've probably made the r/w
functions take only the DeviceState in addition to normal args, but
wanted to avoid looking up the related structures on each I/O operation.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.
> 
> You also need to accomodate the the case where multiple IOMMU are present.
> 
> Paul

We don't assume there's a single IOMMU in the generic layer. The
callbacks within 'struct iommu' could very well dispatch the request to
one of multiple, coexisting IOMMUs.


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 23:11       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 23:11 UTC (permalink / raw)
  To: Paul Brook; +Cc: joro, avi, kvm, qemu-devel

On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > Memory accesses must go through the IOMMU layer.
> 
> No. Devices should not know or care whether an IOMMU is present.

They don't really care. iommu_get() et al. are convenience functions
which can and do return NULL when there's no IOMMU and device code can
pass that NULL around without checking. I could've probably made the r/w
functions take only the DeviceState in addition to normal args, but
wanted to avoid looking up the related structures on each I/O operation.

> You should be adding a DeviceState argument to cpu_physical_memory_{rw,map}. 
> This should then handle IOMMU translation transparently.
> 
> You also need to accomodate the the case where multiple IOMMU are present.
> 
> Paul

We don't assume there's a single IOMMU in the generic layer. The
callbacks within 'struct iommu' could very well dispatch the request to
one of multiple, coexisting IOMMUs.


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 20:13       ` Paul Brook
@ 2010-07-14 23:39           ` Eduard - Gabriel Munteanu
  2010-07-14 23:39           ` Eduard - Gabriel Munteanu
  2010-07-15  9:22           ` Joerg Roedel
  2 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 23:39 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, Joerg Roedel, avi, kvm

On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:

> Well, ok, the function name needs fixing too.  However I think the only thing 
> missing from the current API is that it does not provide a way to determine 
> which device is performing the access.
> 
> Depending how the we decide to handle IOMMU invalidation, it may also be 
> necessary to augment the memory_map API to allow the system to request a 
> mapping be revoked.  However this issue is not specific to the IOMMU 
> implementation. Such bugs are already present on any system that allows 
> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.

Yeah, having a way to alter existing maps would be good. Basically it's
the only place where we truly care about the existence of an IOMMU, and
that's due to AIO.

But it's really tricky to do, unfortunately. I also think having an
abort_doing_io_on_map() kind of notifier would be sufficient. It should
notify back when I/O has been completely stopped.

This should be enough, since we don't expect the results of IDE-like DMA
to be recoverable in case a mapping change occurs, even on real hardware.

> I disagree. ATS should be an independent feature, and is inherently bus 
> specific.  As usual the PCI spec is not publicly available, but based on the 
> AMD IOMMU docs I'd say that ATS is completely independent of memory accesses - 
> the convention being that you trust an ATS capable device to DTRT, and 
> configure the bus IOMMU to apply a flat mapping for accesses from such 
> devices.

ATS is documented in the Hypertransport specs which are publicly
available.

[snip]

> A device performs a memory access on its local bus. It has no knowledge of how 
> that access is routed to its destination.  The device should not be aware of 
> any IOMMUs, in the same way that it doesn't know whether it happens to be 
> accessing RAM or memory mapped peripherals on another device.
> 
> Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
> between a PCI bus and the system bus. It provides a address mapping from one 
> bus to another. 
> 
> There should be no direct interaction between an IOMMU and a device (ignoring 
> ATS, which is effectively a separate data channel).  Everything should be done 
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
> IOMMUs there should be no direct interatcion between these. 
> cpu_physical_memory_* should walk the device/bus tree to determine where the 
> access terminates, applying mappings appropriately.
> 
> Paul

Admittedly I could make __iommu_rw() repeateadly call itself instead of
doing cpu_physical_memory_rw(). That's what I actually intended, and it
should handle IOMMU nesting. It's a trivial change to do so.

Note that emulating hardware realistically defeats some performance
purposes. It'd make AIO impossible, if we imagine some sort of message
passing scenario (which would be just like the real thing, but a lot
slower).


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-14 23:39           ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-14 23:39 UTC (permalink / raw)
  To: Paul Brook; +Cc: Joerg Roedel, qemu-devel, kvm, avi

On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:

> Well, ok, the function name needs fixing too.  However I think the only thing 
> missing from the current API is that it does not provide a way to determine 
> which device is performing the access.
> 
> Depending how the we decide to handle IOMMU invalidation, it may also be 
> necessary to augment the memory_map API to allow the system to request a 
> mapping be revoked.  However this issue is not specific to the IOMMU 
> implementation. Such bugs are already present on any system that allows 
> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.

Yeah, having a way to alter existing maps would be good. Basically it's
the only place where we truly care about the existence of an IOMMU, and
that's due to AIO.

But it's really tricky to do, unfortunately. I also think having an
abort_doing_io_on_map() kind of notifier would be sufficient. It should
notify back when I/O has been completely stopped.

This should be enough, since we don't expect the results of IDE-like DMA
to be recoverable in case a mapping change occurs, even on real hardware.

> I disagree. ATS should be an independent feature, and is inherently bus 
> specific.  As usual the PCI spec is not publicly available, but based on the 
> AMD IOMMU docs I'd say that ATS is completely independent of memory accesses - 
> the convention being that you trust an ATS capable device to DTRT, and 
> configure the bus IOMMU to apply a flat mapping for accesses from such 
> devices.

ATS is documented in the Hypertransport specs which are publicly
available.

[snip]

> A device performs a memory access on its local bus. It has no knowledge of how 
> that access is routed to its destination.  The device should not be aware of 
> any IOMMUs, in the same way that it doesn't know whether it happens to be 
> accessing RAM or memory mapped peripherals on another device.
> 
> Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
> between a PCI bus and the system bus. It provides a address mapping from one 
> bus to another. 
> 
> There should be no direct interaction between an IOMMU and a device (ignoring 
> ATS, which is effectively a separate data channel).  Everything should be done 
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
> IOMMUs there should be no direct interatcion between these. 
> cpu_physical_memory_* should walk the device/bus tree to determine where the 
> access terminates, applying mappings appropriately.
> 
> Paul

Admittedly I could make __iommu_rw() repeateadly call itself instead of
doing cpu_physical_memory_rw(). That's what I actually intended, and it
should handle IOMMU nesting. It's a trivial change to do so.

Note that emulating hardware realistically defeats some performance
purposes. It'd make AIO impossible, if we imagine some sort of message
passing scenario (which would be just like the real thing, but a lot
slower).


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 21:29           ` Anthony Liguori
@ 2010-07-15  9:10             ` Joerg Roedel
  -1 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15  9:10 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Paul Brook, qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 04:29:18PM -0500, Anthony Liguori wrote:
> On 07/14/2010 03:13 PM, Paul Brook wrote:
>> Well, ok, the function name needs fixing too.  However I think the only thing
>> missing from the current API is that it does not provide a way to determine
>> which device is performing the access.
>
> I agree with Paul.
>
> The right approach IMHO is to convert devices to use bus-specific  
> functions to access memory.  The bus specific functions should have a  
> device argument as the first parameter.

If this means a seperate interface for device dma accesses and not fold
that functionality into the cpu_physical_memory* interface I agree too :-)

		Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15  9:10             ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15  9:10 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: avi, Eduard - Gabriel Munteanu, Paul Brook, kvm, qemu-devel

On Wed, Jul 14, 2010 at 04:29:18PM -0500, Anthony Liguori wrote:
> On 07/14/2010 03:13 PM, Paul Brook wrote:
>> Well, ok, the function name needs fixing too.  However I think the only thing
>> missing from the current API is that it does not provide a way to determine
>> which device is performing the access.
>
> I agree with Paul.
>
> The right approach IMHO is to convert devices to use bus-specific  
> functions to access memory.  The bus specific functions should have a  
> device argument as the first parameter.

If this means a seperate interface for device dma accesses and not fold
that functionality into the cpu_physical_memory* interface I agree too :-)

		Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 20:13       ` Paul Brook
@ 2010-07-15  9:22           ` Joerg Roedel
  2010-07-14 23:39           ` Eduard - Gabriel Munteanu
  2010-07-15  9:22           ` Joerg Roedel
  2 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15  9:22 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:

> A device performs a memory access on its local bus. It has no knowledge of how 
> that access is routed to its destination.  The device should not be aware of 
> any IOMMUs, in the same way that it doesn't know whether it happens to be 
> accessing RAM or memory mapped peripherals on another device.

Right.

> Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
> between a PCI bus and the system bus. It provides a address mapping from one 
> bus to another.

An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
can also be implemented on a plugin-card translating only that card.
Real implementations that I am aware of always implement the IOMMU in
the PCI root bridge, though.

> There should be no direct interaction between an IOMMU and a device (ignoring 
> ATS, which is effectively a separate data channel).  Everything should be done 
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
> IOMMUs there should be no direct interatcion between these. 
> cpu_physical_memory_* should walk the device/bus tree to determine where the 
> access terminates, applying mappings appropriately.

Thats the point where I disagree. I think there should be a seperate set
of functions independent from cpu_physical_memory_* to handle device
memory accesses. This would keep the changes small and non-intrusive.
Beside that, real memory controlers can also handle cpu memory accesses
different from device memory accesses. The AMD northbridge GART uses
this to decide whether it needs to remap a request or not. The GART can
be configured to translate cpu and device accesses seperatly.


		Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15  9:22           ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15  9:22 UTC (permalink / raw)
  To: Paul Brook; +Cc: Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:

> A device performs a memory access on its local bus. It has no knowledge of how 
> that access is routed to its destination.  The device should not be aware of 
> any IOMMUs, in the same way that it doesn't know whether it happens to be 
> accessing RAM or memory mapped peripherals on another device.

Right.

> Each IOMMU is fundamentally part of a bus bridge. For example the bridge 
> between a PCI bus and the system bus. It provides a address mapping from one 
> bus to another.

An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
can also be implemented on a plugin-card translating only that card.
Real implementations that I am aware of always implement the IOMMU in
the PCI root bridge, though.

> There should be no direct interaction between an IOMMU and a device (ignoring 
> ATS, which is effectively a separate data channel).  Everything should be done 
> via the cpu_phsycial_memory_* code.  Likewise on a system with multiple nested 
> IOMMUs there should be no direct interatcion between these. 
> cpu_physical_memory_* should walk the device/bus tree to determine where the 
> access terminates, applying mappings appropriately.

Thats the point where I disagree. I think there should be a seperate set
of functions independent from cpu_physical_memory_* to handle device
memory accesses. This would keep the changes small and non-intrusive.
Beside that, real memory controlers can also handle cpu memory accesses
different from device memory accesses. The AMD northbridge GART uses
this to decide whether it needs to remap a request or not. The GART can
be configured to translate cpu and device accesses seperatly.


		Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 22:24             ` Chris Wright
@ 2010-07-15 10:28               ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:28 UTC (permalink / raw)
  To: Chris Wright
  Cc: Anthony Liguori, Joerg Roedel, Eduard - Gabriel Munteanu,
	qemu-devel, kvm, avi

> > The right approach IMHO is to convert devices to use bus-specific
> > functions to access memory.  The bus specific functions should have
> > a device argument as the first parameter.
> 
> As for ATS, the internal api to handle the device's dma reqeust needs
> a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> had a device with ATS support, the device would use its local cache to
> translate the dma address and then submit a translated request to the
> pci bus (effectively doing a raw cpu physical memory* in that case).

Really? Can you provide an documentation to support this claim?
My impression is that there is no difference between translated and 
untranslated devices, and the translation is explicitly disabled by software.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 10:28               ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:28 UTC (permalink / raw)
  To: Chris Wright
  Cc: kvm, Joerg Roedel, qemu-devel, avi, Eduard - Gabriel Munteanu

> > The right approach IMHO is to convert devices to use bus-specific
> > functions to access memory.  The bus specific functions should have
> > a device argument as the first parameter.
> 
> As for ATS, the internal api to handle the device's dma reqeust needs
> a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> had a device with ATS support, the device would use its local cache to
> translate the dma address and then submit a translated request to the
> pci bus (effectively doing a raw cpu physical memory* in that case).

Really? Can you provide an documentation to support this claim?
My impression is that there is no difference between translated and 
untranslated devices, and the translation is explicitly disabled by software.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 21:29           ` Anthony Liguori
@ 2010-07-15 10:33             ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:33 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Joerg Roedel, avi, kvm, Eduard - Gabriel Munteanu

> > Depending how the we decide to handle IOMMU invalidation, it may also be
> > necessary to augment the memory_map API to allow the system to request a
> > mapping be revoked.  However this issue is not specific to the IOMMU
> > implementation. Such bugs are already present on any system that allows
> > dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
> 
> That's why the memory_map API today does not allow mappings to persist
> after trips back to the main loop.

Sure it does.  If you can't combine zero-copy memory access with asynchronous 
IO then IMO it's fairly useless. See e.g. dma-helpers.c

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 10:33             ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:33 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Joerg Roedel, Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

> > Depending how the we decide to handle IOMMU invalidation, it may also be
> > necessary to augment the memory_map API to allow the system to request a
> > mapping be revoked.  However this issue is not specific to the IOMMU
> > implementation. Such bugs are already present on any system that allows
> > dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
> 
> That's why the memory_map API today does not allow mappings to persist
> after trips back to the main loop.

Sure it does.  If you can't combine zero-copy memory access with asynchronous 
IO then IMO it's fairly useless. See e.g. dma-helpers.c

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15  9:22           ` Joerg Roedel
@ 2010-07-15 10:49             ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

> On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:
> > A device performs a memory access on its local bus. It has no knowledge
> > of how that access is routed to its destination.  The device should not
> > be aware of any IOMMUs, in the same way that it doesn't know whether it
> > happens to be accessing RAM or memory mapped peripherals on another
> > device.
> 
> Right.
> 
> > Each IOMMU is fundamentally part of a bus bridge. For example the bridge
> > between a PCI bus and the system bus. It provides a address mapping from
> > one bus to another.
> 
> An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
> can also be implemented on a plugin-card translating only that card.
> Real implementations that I am aware of always implement the IOMMU in
> the PCI root bridge, though.

If the IOMMU is implemented on the card, then it isn't an interesting case. 
It's effectively just a complex form of scatter-gather.

If the on-card MMU can delegate pagetable walks to an external device then IMO 
that's also an unrelated feature, and requires an additional data channel.

> > There should be no direct interaction between an IOMMU and a device
> > (ignoring ATS, which is effectively a separate data channel). 
> > Everything should be done via the cpu_phsycial_memory_* code.  Likewise
> > on a system with multiple nested IOMMUs there should be no direct
> > interatcion between these.
> > cpu_physical_memory_* should walk the device/bus tree to determine where
> > the access terminates, applying mappings appropriately.
> 
> Thats the point where I disagree. I think there should be a seperate set
> of functions independent from cpu_physical_memory_* to handle device
> memory accesses. This would keep the changes small and non-intrusive.
> Beside that, real memory controlers can also handle cpu memory accesses
> different from device memory accesses. The AMD northbridge GART uses
> this to decide whether it needs to remap a request or not. The GART can
> be configured to translate cpu and device accesses seperatly.

My point still stands. You should not be pushing the IOMMU handling into 
device specific code. All you need to do is make the memory access routines 
aware of which device caused the access.

The fact that the GART can translate CPU accesses proves my point.  If you 
have separate code for CPU and devices, then you need to duplicate the GART 
handling code.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 10:49             ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

> On Wed, Jul 14, 2010 at 09:13:44PM +0100, Paul Brook wrote:
> > A device performs a memory access on its local bus. It has no knowledge
> > of how that access is routed to its destination.  The device should not
> > be aware of any IOMMUs, in the same way that it doesn't know whether it
> > happens to be accessing RAM or memory mapped peripherals on another
> > device.
> 
> Right.
> 
> > Each IOMMU is fundamentally part of a bus bridge. For example the bridge
> > between a PCI bus and the system bus. It provides a address mapping from
> > one bus to another.
> 
> An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
> can also be implemented on a plugin-card translating only that card.
> Real implementations that I am aware of always implement the IOMMU in
> the PCI root bridge, though.

If the IOMMU is implemented on the card, then it isn't an interesting case. 
It's effectively just a complex form of scatter-gather.

If the on-card MMU can delegate pagetable walks to an external device then IMO 
that's also an unrelated feature, and requires an additional data channel.

> > There should be no direct interaction between an IOMMU and a device
> > (ignoring ATS, which is effectively a separate data channel). 
> > Everything should be done via the cpu_phsycial_memory_* code.  Likewise
> > on a system with multiple nested IOMMUs there should be no direct
> > interatcion between these.
> > cpu_physical_memory_* should walk the device/bus tree to determine where
> > the access terminates, applying mappings appropriately.
> 
> Thats the point where I disagree. I think there should be a seperate set
> of functions independent from cpu_physical_memory_* to handle device
> memory accesses. This would keep the changes small and non-intrusive.
> Beside that, real memory controlers can also handle cpu memory accesses
> different from device memory accesses. The AMD northbridge GART uses
> this to decide whether it needs to remap a request or not. The GART can
> be configured to translate cpu and device accesses seperatly.

My point still stands. You should not be pushing the IOMMU handling into 
device specific code. All you need to do is make the memory access routines 
aware of which device caused the access.

The fact that the GART can translate CPU accesses proves my point.  If you 
have separate code for CPU and devices, then you need to duplicate the GART 
handling code.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-14 23:11       ` [Qemu-devel] " Eduard - Gabriel Munteanu
@ 2010-07-15 10:58         ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, avi, kvm, qemu-devel

> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > > Memory accesses must go through the IOMMU layer.
> > 
> > No. Devices should not know or care whether an IOMMU is present.
> 
> They don't really care. iommu_get() et al. are convenience functions
> which can and do return NULL when there's no IOMMU and device code can
> pass that NULL around without checking. 

Devices should not need to know any of this. You're introducing a significant 
amount of duplication and complexity into every device.

The assumption that all accesses will go through the same IOMMU is also false. 
Accesses to devices on the same bus will not be translated by the IOMMU. 
Currently there are probably also other things that will break in this case, 
but your API seems fundamentally incapable of handling this.

> I could've probably made the r/w
> functions take only the DeviceState in addition to normal args, but
> wanted to avoid looking up the related structures on each I/O operation.

That's exactly what you should be doing.  If this is inefficient then there 
are much better ways of fixing this. e.g. by not having the device perform so 
many accesses, or by adding some sort of translation cache.

> > You should be adding a DeviceState argument to
> > cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> > transparently.
> > 
> > You also need to accomodate the the case where multiple IOMMU are
> > present.
> 
> We don't assume there's a single IOMMU in the generic layer. The
> callbacks within 'struct iommu' could very well dispatch the request to
> one of multiple, coexisting IOMMUs.

So you've now introduced yet another copy of the translation code. Not only 
does every device have to be IOMMU aware, but every IOMMU also has to be aware 
of nested IOMMUs.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 10:58         ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 10:58 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: joro, avi, kvm, qemu-devel

> On Wed, Jul 14, 2010 at 02:53:03PM +0100, Paul Brook wrote:
> > > Memory accesses must go through the IOMMU layer.
> > 
> > No. Devices should not know or care whether an IOMMU is present.
> 
> They don't really care. iommu_get() et al. are convenience functions
> which can and do return NULL when there's no IOMMU and device code can
> pass that NULL around without checking. 

Devices should not need to know any of this. You're introducing a significant 
amount of duplication and complexity into every device.

The assumption that all accesses will go through the same IOMMU is also false. 
Accesses to devices on the same bus will not be translated by the IOMMU. 
Currently there are probably also other things that will break in this case, 
but your API seems fundamentally incapable of handling this.

> I could've probably made the r/w
> functions take only the DeviceState in addition to normal args, but
> wanted to avoid looking up the related structures on each I/O operation.

That's exactly what you should be doing.  If this is inefficient then there 
are much better ways of fixing this. e.g. by not having the device perform so 
many accesses, or by adding some sort of translation cache.

> > You should be adding a DeviceState argument to
> > cpu_physical_memory_{rw,map}. This should then handle IOMMU translation
> > transparently.
> > 
> > You also need to accomodate the the case where multiple IOMMU are
> > present.
> 
> We don't assume there's a single IOMMU in the generic layer. The
> callbacks within 'struct iommu' could very well dispatch the request to
> one of multiple, coexisting IOMMUs.

So you've now introduced yet another copy of the translation code. Not only 
does every device have to be IOMMU aware, but every IOMMU also has to be aware 
of nested IOMMUs.

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 10:33             ` Paul Brook
@ 2010-07-15 12:42               ` Anthony Liguori
  -1 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 12:42 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, Joerg Roedel, avi, kvm, Eduard - Gabriel Munteanu

On 07/15/2010 05:33 AM, Paul Brook wrote:
>>> Depending how the we decide to handle IOMMU invalidation, it may also be
>>> necessary to augment the memory_map API to allow the system to request a
>>> mapping be revoked.  However this issue is not specific to the IOMMU
>>> implementation. Such bugs are already present on any system that allows
>>> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
>>>        
>> That's why the memory_map API today does not allow mappings to persist
>> after trips back to the main loop.
>>      
> Sure it does.  If you can't combine zero-copy memory access with asynchronous
> IO then IMO it's fairly useless. See e.g. dma-helpers.c
>    

DMA's a very special case.  DMA is performed asynchronously to the 
execution of the CPU so you generally can't make any guarantees about 
what state the transaction is in until it's completed.  That gives us a 
fair bit of wiggle room when dealing with a DMA operation to a region of 
physical memory where the physical memory mapping is altered in some way 
during the transaction.

However, that is not true in the general case.

Regards,

Anthony Liguori

> Paul
>    


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 12:42               ` Anthony Liguori
  0 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 12:42 UTC (permalink / raw)
  To: Paul Brook; +Cc: Joerg Roedel, Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

On 07/15/2010 05:33 AM, Paul Brook wrote:
>>> Depending how the we decide to handle IOMMU invalidation, it may also be
>>> necessary to augment the memory_map API to allow the system to request a
>>> mapping be revoked.  However this issue is not specific to the IOMMU
>>> implementation. Such bugs are already present on any system that allows
>>> dynamic reconfiguration of the address space, e.g. by changing PCI BARs.
>>>        
>> That's why the memory_map API today does not allow mappings to persist
>> after trips back to the main loop.
>>      
> Sure it does.  If you can't combine zero-copy memory access with asynchronous
> IO then IMO it's fairly useless. See e.g. dma-helpers.c
>    

DMA's a very special case.  DMA is performed asynchronously to the 
execution of the CPU so you generally can't make any guarantees about 
what state the transaction is in until it's completed.  That gives us a 
fair bit of wiggle room when dealing with a DMA operation to a region of 
physical memory where the physical memory mapping is altered in some way 
during the transaction.

However, that is not true in the general case.

Regards,

Anthony Liguori

> Paul
>    

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15  9:10             ` Joerg Roedel
@ 2010-07-15 12:45               ` Anthony Liguori
  -1 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 12:45 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Paul Brook, qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On 07/15/2010 04:10 AM, Joerg Roedel wrote:
> On Wed, Jul 14, 2010 at 04:29:18PM -0500, Anthony Liguori wrote:
>    
>> On 07/14/2010 03:13 PM, Paul Brook wrote:
>>      
>>> Well, ok, the function name needs fixing too.  However I think the only thing
>>> missing from the current API is that it does not provide a way to determine
>>> which device is performing the access.
>>>        
>> I agree with Paul.
>>
>> The right approach IMHO is to convert devices to use bus-specific
>> functions to access memory.  The bus specific functions should have a
>> device argument as the first parameter.
>>      
> If this means a seperate interface for device dma accesses and not fold
> that functionality into the cpu_physical_memory* interface I agree too :-)
>    

No.  PCI devices should never call cpu_physical_memory*.

PCI devices should call pci_memory*.

ISA devices should call isa_memory*.

All device memory accesses should go through their respective buses.  
There can be multiple IOMMUs at different levels of the device 
hierarchy.  If you don't provide bus-level memory access functions that 
chain through the hierarchy, it's extremely difficult to implement all 
the necessary hooks to perform the translations at different places.

Regards,

Anthony Liguori

> 		Joerg
>
>    


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 12:45               ` Anthony Liguori
  0 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 12:45 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: avi, Eduard - Gabriel Munteanu, Paul Brook, kvm, qemu-devel

On 07/15/2010 04:10 AM, Joerg Roedel wrote:
> On Wed, Jul 14, 2010 at 04:29:18PM -0500, Anthony Liguori wrote:
>    
>> On 07/14/2010 03:13 PM, Paul Brook wrote:
>>      
>>> Well, ok, the function name needs fixing too.  However I think the only thing
>>> missing from the current API is that it does not provide a way to determine
>>> which device is performing the access.
>>>        
>> I agree with Paul.
>>
>> The right approach IMHO is to convert devices to use bus-specific
>> functions to access memory.  The bus specific functions should have a
>> device argument as the first parameter.
>>      
> If this means a seperate interface for device dma accesses and not fold
> that functionality into the cpu_physical_memory* interface I agree too :-)
>    

No.  PCI devices should never call cpu_physical_memory*.

PCI devices should call pci_memory*.

ISA devices should call isa_memory*.

All device memory accesses should go through their respective buses.  
There can be multiple IOMMUs at different levels of the device 
hierarchy.  If you don't provide bus-level memory access functions that 
chain through the hierarchy, it's extremely difficult to implement all 
the necessary hooks to perform the translations at different places.

Regards,

Anthony Liguori

> 		Joerg
>
>    

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 12:42               ` Anthony Liguori
@ 2010-07-15 14:02                 ` Paul Brook
  -1 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 14:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Joerg Roedel, avi, kvm, Eduard - Gabriel Munteanu

> >>> Depending how the we decide to handle IOMMU invalidation, it may also
> >>> be necessary to augment the memory_map API to allow the system to
> >>> request a mapping be revoked.  However this issue is not specific to
> >>> the IOMMU implementation. Such bugs are already present on any system
> >>> that allows dynamic reconfiguration of the address space, e.g. by
> >>> changing PCI BARs.
> >> 
> >> That's why the memory_map API today does not allow mappings to persist
> >> after trips back to the main loop.
> > 
> > Sure it does.  If you can't combine zero-copy memory access with
> > asynchronous IO then IMO it's fairly useless. See e.g. dma-helpers.c
> 
> DMA's a very special case.  

Special compared to what?  The whole purpose of this API is to provide DMA.

> DMA is performed asynchronously to the
> execution of the CPU so you generally can't make any guarantees about
> what state the transaction is in until it's completed.  That gives us a
> fair bit of wiggle room when dealing with a DMA operation to a region of
> physical memory where the physical memory mapping is altered in some way
> during the transaction.

You do have ordering constraints though. While it may not be possible to 
directly determine whether the DMA completed before or after the remapping, 
and you might not be able to make any assumptions about the atomicity of the 
transaction as a whole, it is reasonable to assume that any writes to the old 
mapping will occur before the remapping operation completes.

While things like store buffers potentially allows reordering and deferral of 
accesses, there are generally fairly tight constraints on this. For example a 
PCI hast bridge may buffer CPU writes. However it will guarantee that those 
writes have been flushed out before a subsequent read operation completes.

Consider the case where the hypervisor allows passthough of a device, using 
the IOMMU to support DMA from that device into virtual machine RAM. When that 
virtual machine is destroyed the IOMMU mapping for that device will be 
invalidated. Once the invalidation has completed that RAM can be reused by the 
hypervisor for other purposes. This may happen before the device is reset.  We 
probably don't really care what happens to the device in this case, but we do 
need to prevent the device stomping on ram it no longer owns.

There are two ways this can be handled:

If your address translation mechanism allows updates to be deferred 
indefinitely then we can stall until all relevant DMA transactions have 
completed.  This is probably sufficient for well behaved guests, but 
potentially opens up a significant window for DoS attacks. 

If you need the remapping to occur in a finite timeframe (in the PCI BAR case 
this is probably before the next CPU access to that bus) then you need some 
mechanism for revoking the host mapping provided by cpu_physical_memory_map.

Note that a QEMU DMA transaction typically encompasses a whole block of data. 
The transaction is started when the AIO request is issued, and remains live 
until the transfer completes. This includes the time taken to fetch the data 
from external media/devices.

On real hardware a DMA transaction typically only covers a single burst memory 
write (maybe 16 bytes). This will generally not start until the device has 
buffered sufficient data to satisfy the burst (or has sufficient buffer space 
to receive the whole burst).

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 14:02                 ` Paul Brook
  0 siblings, 0 replies; 83+ messages in thread
From: Paul Brook @ 2010-07-15 14:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Joerg Roedel, Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

> >>> Depending how the we decide to handle IOMMU invalidation, it may also
> >>> be necessary to augment the memory_map API to allow the system to
> >>> request a mapping be revoked.  However this issue is not specific to
> >>> the IOMMU implementation. Such bugs are already present on any system
> >>> that allows dynamic reconfiguration of the address space, e.g. by
> >>> changing PCI BARs.
> >> 
> >> That's why the memory_map API today does not allow mappings to persist
> >> after trips back to the main loop.
> > 
> > Sure it does.  If you can't combine zero-copy memory access with
> > asynchronous IO then IMO it's fairly useless. See e.g. dma-helpers.c
> 
> DMA's a very special case.  

Special compared to what?  The whole purpose of this API is to provide DMA.

> DMA is performed asynchronously to the
> execution of the CPU so you generally can't make any guarantees about
> what state the transaction is in until it's completed.  That gives us a
> fair bit of wiggle room when dealing with a DMA operation to a region of
> physical memory where the physical memory mapping is altered in some way
> during the transaction.

You do have ordering constraints though. While it may not be possible to 
directly determine whether the DMA completed before or after the remapping, 
and you might not be able to make any assumptions about the atomicity of the 
transaction as a whole, it is reasonable to assume that any writes to the old 
mapping will occur before the remapping operation completes.

While things like store buffers potentially allows reordering and deferral of 
accesses, there are generally fairly tight constraints on this. For example a 
PCI hast bridge may buffer CPU writes. However it will guarantee that those 
writes have been flushed out before a subsequent read operation completes.

Consider the case where the hypervisor allows passthough of a device, using 
the IOMMU to support DMA from that device into virtual machine RAM. When that 
virtual machine is destroyed the IOMMU mapping for that device will be 
invalidated. Once the invalidation has completed that RAM can be reused by the 
hypervisor for other purposes. This may happen before the device is reset.  We 
probably don't really care what happens to the device in this case, but we do 
need to prevent the device stomping on ram it no longer owns.

There are two ways this can be handled:

If your address translation mechanism allows updates to be deferred 
indefinitely then we can stall until all relevant DMA transactions have 
completed.  This is probably sufficient for well behaved guests, but 
potentially opens up a significant window for DoS attacks. 

If you need the remapping to occur in a finite timeframe (in the PCI BAR case 
this is probably before the next CPU access to that bus) then you need some 
mechanism for revoking the host mapping provided by cpu_physical_memory_map.

Note that a QEMU DMA transaction typically encompasses a whole block of data. 
The transaction is started when the AIO request is issued, and remains live 
until the transfer completes. This includes the time taken to fetch the data 
from external media/devices.

On real hardware a DMA transaction typically only covers a single burst memory 
write (maybe 16 bytes). This will generally not start until the device has 
buffered sufficient data to satisfy the burst (or has sufficient buffer space 
to receive the whole burst).

Paul

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 12:45               ` Anthony Liguori
@ 2010-07-15 14:45                 ` Joerg Roedel
  -1 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 14:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Paul Brook, qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
> On 07/15/2010 04:10 AM, Joerg Roedel wrote:

>> If this means a seperate interface for device dma accesses and not fold
>> that functionality into the cpu_physical_memory* interface I agree too :-)
>>
> No.  PCI devices should never call cpu_physical_memory*.

Fully agreed.

> PCI devices should call pci_memory*.
>
> ISA devices should call isa_memory*.

This is a seperate interface. I like the idea and as you stated below it
has clear advantages, so lets go this way.

> All device memory accesses should go through their respective buses.   
> There can be multiple IOMMUs at different levels of the device  
> hierarchy.  If you don't provide bus-level memory access functions that  
> chain through the hierarchy, it's extremely difficult to implement all  
> the necessary hooks to perform the translations at different places.


		Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 14:45                 ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 14:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: avi, Eduard - Gabriel Munteanu, Paul Brook, kvm, qemu-devel

On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
> On 07/15/2010 04:10 AM, Joerg Roedel wrote:

>> If this means a seperate interface for device dma accesses and not fold
>> that functionality into the cpu_physical_memory* interface I agree too :-)
>>
> No.  PCI devices should never call cpu_physical_memory*.

Fully agreed.

> PCI devices should call pci_memory*.
>
> ISA devices should call isa_memory*.

This is a seperate interface. I like the idea and as you stated below it
has clear advantages, so lets go this way.

> All device memory accesses should go through their respective buses.   
> There can be multiple IOMMUs at different levels of the device  
> hierarchy.  If you don't provide bus-level memory access functions that  
> chain through the hierarchy, it's extremely difficult to implement all  
> the necessary hooks to perform the translations at different places.


		Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 10:49             ` Paul Brook
@ 2010-07-15 14:59               ` Joerg Roedel
  -1 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 14:59 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel, avi, kvm, Eduard - Gabriel Munteanu

On Thu, Jul 15, 2010 at 11:49:20AM +0100, Paul Brook wrote:

> > An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
> > can also be implemented on a plugin-card translating only that card.
> > Real implementations that I am aware of always implement the IOMMU in
> > the PCI root bridge, though.
> 
> If the IOMMU is implemented on the card, then it isn't an interesting case. 
> It's effectively just a complex form of scatter-gather.
> 
> If the on-card MMU can delegate pagetable walks to an external device then IMO 
> that's also an unrelated feature, and requires an additional data channel.

But that would be handled by the same IOMMU emulation code, so the
hooks need to be usable there too.

> My point still stands. You should not be pushing the IOMMU handling into 
> device specific code. All you need to do is make the memory access routines 
> aware of which device caused the access.

Right, the device does not need to know too much about the IOMMU in the
general case. The iommu_get/iommu_read/iommu_write interface should
replaced by the pci_memory* functions like suggested by Anthony.

> The fact that the GART can translate CPU accesses proves my point.  If you 
> have separate code for CPU and devices, then you need to duplicate the GART 
> handling code.

You can configure the GART to translate device accesses only, cpu
accesses only, or to translate both. This is hard to handle if cpu and
device emulation use the same memory access functions.


		Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 14:59               ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 14:59 UTC (permalink / raw)
  To: Paul Brook; +Cc: Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

On Thu, Jul 15, 2010 at 11:49:20AM +0100, Paul Brook wrote:

> > An IOMMU is not necessarily part of a bus bridge. By concept an IOMMU
> > can also be implemented on a plugin-card translating only that card.
> > Real implementations that I am aware of always implement the IOMMU in
> > the PCI root bridge, though.
> 
> If the IOMMU is implemented on the card, then it isn't an interesting case. 
> It's effectively just a complex form of scatter-gather.
> 
> If the on-card MMU can delegate pagetable walks to an external device then IMO 
> that's also an unrelated feature, and requires an additional data channel.

But that would be handled by the same IOMMU emulation code, so the
hooks need to be usable there too.

> My point still stands. You should not be pushing the IOMMU handling into 
> device specific code. All you need to do is make the memory access routines 
> aware of which device caused the access.

Right, the device does not need to know too much about the IOMMU in the
general case. The iommu_get/iommu_read/iommu_write interface should
replaced by the pci_memory* functions like suggested by Anthony.

> The fact that the GART can translate CPU accesses proves my point.  If you 
> have separate code for CPU and devices, then you need to duplicate the GART 
> handling code.

You can configure the GART to translate device accesses only, cpu
accesses only, or to translate both. This is hard to handle if cpu and
device emulation use the same memory access functions.


		Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 12:45               ` Anthony Liguori
@ 2010-07-15 16:45                 ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-15 16:45 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Joerg Roedel, Paul Brook, qemu-devel, avi, kvm

On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
> 
> No.  PCI devices should never call cpu_physical_memory*.
> 
> PCI devices should call pci_memory*.
> 
> ISA devices should call isa_memory*.
> 
> All device memory accesses should go through their respective buses.  
> There can be multiple IOMMUs at different levels of the device 
> hierarchy.  If you don't provide bus-level memory access functions that 
> chain through the hierarchy, it's extremely difficult to implement all 
> the necessary hooks to perform the translations at different places.
> 
> Regards,
> 
> Anthony Liguori
> 

I liked Paul's initial approach more, at least if I understood him
correctly. Basically I'm suggesting a single memory_* function that
simply asks the bus for I/O and translation. Say you have something like
this:

+ Bus 1
|
---- Memory 1
|
---+ Bus 2 bridge
   |
   ---- Memory 2
   |
   ---+ Bus 3 bridge
      |
      ---- Device

Say Device wants to write to memory. If we have the DeviceState we
needn't concern whether this is a BusOneDevice or BusTwoDevice from
device code itself. We would just call

memory_rw(dev_state, addr, buf, size, is_write);

which simply recurses through DeviceState's and BusState's through their
parent pointers. The actual bus can set up those to provide
identification information and perhaps hooks for translation and access
checking. So memory_rw() looks like this (pseudocode):

static void memory_rw(DeviceState *dev,
                      target_phys_addr_t addr,
		      uint8_t *buf,
		      int size,
		      int is_write)
{
	BusState *bus = get_parent_bus_of_dev(dev);
	DeviceState *pdev = get_parent_dev(dev);
	target_phys_addr_t taddr;

	if (!bus) {
		/* This shouldn't happen. */
		assert(0);
	}

	if (bus->responsible_for(addr)) {
		raw_physical_memory_rw(addr, buf, size, is_write);
		return;
	}

	taddr = bus->translate(dev, addr);
	memory_rw(pdev, taddr, buf, size, is_write);
}

If we do this, it seems there's no need to provide separate
functions. The actual buses must instead initialize those hooks
properly. Translation here is something inherent to the bus, that
handles arbitration between possibly multiple IOMMUs. Our memory would
normally reside on / belong to the top-level bus.

What do you think? (Naming could be better though.)


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 16:45                 ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-15 16:45 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Joerg Roedel, avi, Paul Brook, kvm, qemu-devel

On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
> 
> No.  PCI devices should never call cpu_physical_memory*.
> 
> PCI devices should call pci_memory*.
> 
> ISA devices should call isa_memory*.
> 
> All device memory accesses should go through their respective buses.  
> There can be multiple IOMMUs at different levels of the device 
> hierarchy.  If you don't provide bus-level memory access functions that 
> chain through the hierarchy, it's extremely difficult to implement all 
> the necessary hooks to perform the translations at different places.
> 
> Regards,
> 
> Anthony Liguori
> 

I liked Paul's initial approach more, at least if I understood him
correctly. Basically I'm suggesting a single memory_* function that
simply asks the bus for I/O and translation. Say you have something like
this:

+ Bus 1
|
---- Memory 1
|
---+ Bus 2 bridge
   |
   ---- Memory 2
   |
   ---+ Bus 3 bridge
      |
      ---- Device

Say Device wants to write to memory. If we have the DeviceState we
needn't concern whether this is a BusOneDevice or BusTwoDevice from
device code itself. We would just call

memory_rw(dev_state, addr, buf, size, is_write);

which simply recurses through DeviceState's and BusState's through their
parent pointers. The actual bus can set up those to provide
identification information and perhaps hooks for translation and access
checking. So memory_rw() looks like this (pseudocode):

static void memory_rw(DeviceState *dev,
                      target_phys_addr_t addr,
		      uint8_t *buf,
		      int size,
		      int is_write)
{
	BusState *bus = get_parent_bus_of_dev(dev);
	DeviceState *pdev = get_parent_dev(dev);
	target_phys_addr_t taddr;

	if (!bus) {
		/* This shouldn't happen. */
		assert(0);
	}

	if (bus->responsible_for(addr)) {
		raw_physical_memory_rw(addr, buf, size, is_write);
		return;
	}

	taddr = bus->translate(dev, addr);
	memory_rw(pdev, taddr, buf, size, is_write);
}

If we do this, it seems there's no need to provide separate
functions. The actual buses must instead initialize those hooks
properly. Translation here is something inherent to the bus, that
handles arbitration between possibly multiple IOMMUs. Our memory would
normally reside on / belong to the top-level bus.

What do you think? (Naming could be better though.)


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 10:28               ` Paul Brook
@ 2010-07-15 16:52                 ` Chris Wright
  -1 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 16:52 UTC (permalink / raw)
  To: Paul Brook
  Cc: Chris Wright, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

* Paul Brook (paul@codesourcery.com) wrote:
> > > The right approach IMHO is to convert devices to use bus-specific
> > > functions to access memory.  The bus specific functions should have
> > > a device argument as the first parameter.
> > 
> > As for ATS, the internal api to handle the device's dma reqeust needs
> > a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> > had a device with ATS support, the device would use its local cache to
> > translate the dma address and then submit a translated request to the
> > pci bus (effectively doing a raw cpu physical memory* in that case).
> 
> Really? Can you provide an documentation to support this claim?
> My impression is that there is no difference between translated and 
> untranslated devices, and the translation is explicitly disabled by software.

ATS allows an I/O device to request a translation from the IOMMU.
The device can then cache that translation and use the translated address
in a PCIe memory transaction.  PCIe uses a couple of previously reserved
bits in the transaction layer packet header to describe the address
type for memory transactions.  The default (00) maps to legacy PCIe and
describes the memory address as untranslated.  This is the normal mode,
and could then incur a translation if an IOMMU is present and programmed
w/ page tables, etc. as is passes through the host bridge.

Another type is simply a transaction requesting a translation.  This is
new, and allows a device to request (and cache) a translation from the
IOMMU for subsequent use.

The third type is a memory transaction tagged as already translated.
This is the type of transaction an ATS capable I/O device will generate
when it was able to translate the memory address from its own cache.

Of course, there's also an invalidation request that the IOMMU can send
to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 16:52                 ` Chris Wright
  0 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 16:52 UTC (permalink / raw)
  To: Paul Brook
  Cc: kvm, Joerg Roedel, qemu-devel, Chris Wright, avi,
	Eduard - Gabriel Munteanu

* Paul Brook (paul@codesourcery.com) wrote:
> > > The right approach IMHO is to convert devices to use bus-specific
> > > functions to access memory.  The bus specific functions should have
> > > a device argument as the first parameter.
> > 
> > As for ATS, the internal api to handle the device's dma reqeust needs
> > a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> > had a device with ATS support, the device would use its local cache to
> > translate the dma address and then submit a translated request to the
> > pci bus (effectively doing a raw cpu physical memory* in that case).
> 
> Really? Can you provide an documentation to support this claim?
> My impression is that there is no difference between translated and 
> untranslated devices, and the translation is explicitly disabled by software.

ATS allows an I/O device to request a translation from the IOMMU.
The device can then cache that translation and use the translated address
in a PCIe memory transaction.  PCIe uses a couple of previously reserved
bits in the transaction layer packet header to describe the address
type for memory transactions.  The default (00) maps to legacy PCIe and
describes the memory address as untranslated.  This is the normal mode,
and could then incur a translation if an IOMMU is present and programmed
w/ page tables, etc. as is passes through the host bridge.

Another type is simply a transaction requesting a translation.  This is
new, and allows a device to request (and cache) a translation from the
IOMMU for subsequent use.

The third type is a memory transaction tagged as already translated.
This is the type of transaction an ATS capable I/O device will generate
when it was able to translate the memory address from its own cache.

Of course, there's also an invalidation request that the IOMMU can send
to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 16:52                 ` Chris Wright
@ 2010-07-15 17:02                   ` Avi Kivity
  -1 siblings, 0 replies; 83+ messages in thread
From: Avi Kivity @ 2010-07-15 17:02 UTC (permalink / raw)
  To: Chris Wright
  Cc: Paul Brook, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm

On 07/15/2010 07:52 PM, Chris Wright wrote:
>
>> Really? Can you provide an documentation to support this claim?
>> My impression is that there is no difference between translated and
>> untranslated devices, and the translation is explicitly disabled by software.
>>      
> ATS allows an I/O device to request a translation from the IOMMU.
> The device can then cache that translation and use the translated address
> in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> bits in the transaction layer packet header to describe the address
> type for memory transactions.  The default (00) maps to legacy PCIe and
> describes the memory address as untranslated.  This is the normal mode,
> and could then incur a translation if an IOMMU is present and programmed
> w/ page tables, etc. as is passes through the host bridge.
>
> Another type is simply a transaction requesting a translation.  This is
> new, and allows a device to request (and cache) a translation from the
> IOMMU for subsequent use.
>
> The third type is a memory transaction tagged as already translated.
> This is the type of transaction an ATS capable I/O device will generate
> when it was able to translate the memory address from its own cache.
>
> Of course, there's also an invalidation request that the IOMMU can send
> to ATS capable I/O devices to invalidate the cached translation.
>    

For emulated device, it seems like we can ignore ATS completely, no?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:02                   ` Avi Kivity
  0 siblings, 0 replies; 83+ messages in thread
From: Avi Kivity @ 2010-07-15 17:02 UTC (permalink / raw)
  To: Chris Wright
  Cc: kvm, Joerg Roedel, qemu-devel, Paul Brook, Eduard - Gabriel Munteanu

On 07/15/2010 07:52 PM, Chris Wright wrote:
>
>> Really? Can you provide an documentation to support this claim?
>> My impression is that there is no difference between translated and
>> untranslated devices, and the translation is explicitly disabled by software.
>>      
> ATS allows an I/O device to request a translation from the IOMMU.
> The device can then cache that translation and use the translated address
> in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> bits in the transaction layer packet header to describe the address
> type for memory transactions.  The default (00) maps to legacy PCIe and
> describes the memory address as untranslated.  This is the normal mode,
> and could then incur a translation if an IOMMU is present and programmed
> w/ page tables, etc. as is passes through the host bridge.
>
> Another type is simply a transaction requesting a translation.  This is
> new, and allows a device to request (and cache) a translation from the
> IOMMU for subsequent use.
>
> The third type is a memory transaction tagged as already translated.
> This is the type of transaction an ATS capable I/O device will generate
> when it was able to translate the memory address from its own cache.
>
> Of course, there's also an invalidation request that the IOMMU can send
> to ATS capable I/O devices to invalidate the cached translation.
>    

For emulated device, it seems like we can ignore ATS completely, no?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 16:52                 ` Chris Wright
@ 2010-07-15 17:14                   ` Chris Wright
  -1 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:14 UTC (permalink / raw)
  To: Chris Wright
  Cc: Paul Brook, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm, avi

* Chris Wright (chrisw@sous-sol.org) wrote:
> * Paul Brook (paul@codesourcery.com) wrote:
> > > > The right approach IMHO is to convert devices to use bus-specific
> > > > functions to access memory.  The bus specific functions should have
> > > > a device argument as the first parameter.
> > > 
> > > As for ATS, the internal api to handle the device's dma reqeust needs
> > > a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> > > had a device with ATS support, the device would use its local cache to
> > > translate the dma address and then submit a translated request to the
> > > pci bus (effectively doing a raw cpu physical memory* in that case).
> > 
> > Really? Can you provide an documentation to support this claim?

Wow...color me surprised...there's actually some apparently public
"training" docs that might help give a more complete view:

http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=0ab681ba7001e40cdb297ddaf279a8de82a7dc40

ATS discussion starts on slide 23.

> > My impression is that there is no difference between translated and 
> > untranslated devices, and the translation is explicitly disabled by software.

And now that I re-read that sentence, I see what you are talking about.
Yes, there is the above notion as well.

A device can live in a 1:1 mapping of device address space to physical
memory.  This could be achieved in a few ways (all done by the OS software
programming the IOMMU).

One is to simply create a set of page tables that map 1:1 all of device
memory to physical memory.  Another is to somehow mark the device as
special (either omit translation tables or mark the translation entry
as effectively "do not translate").  This is often referred to as Pass
Through mode.  But this is not the same as ATS.

Pass Through mode is the functional equivalent of disabling the
translation/isolation capabilities of the IOMMU.  It's typically used
when an OS wants to keep a device for itself and isn't interested in
the isolation properties of the IOMMU.  It then only creates isolating
translation tables for devices it's giving to unprivileged software
(e.g. Linux/KVM giving a device to a guest, Linux giving a device to
user space process, etc.)

> ATS allows an I/O device to request a translation from the IOMMU.
> The device can then cache that translation and use the translated address
> in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> bits in the transaction layer packet header to describe the address
> type for memory transactions.  The default (00) maps to legacy PCIe and
> describes the memory address as untranslated.  This is the normal mode,
> and could then incur a translation if an IOMMU is present and programmed
> w/ page tables, etc. as is passes through the host bridge.
> 
> Another type is simply a transaction requesting a translation.  This is
> new, and allows a device to request (and cache) a translation from the
> IOMMU for subsequent use.
> 
> The third type is a memory transaction tagged as already translated.
> This is the type of transaction an ATS capable I/O device will generate
> when it was able to translate the memory address from its own cache.
> 
> Of course, there's also an invalidation request that the IOMMU can send
> to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:14                   ` Chris Wright
  0 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:14 UTC (permalink / raw)
  To: Chris Wright
  Cc: kvm, Joerg Roedel, qemu-devel, Paul Brook,
	Eduard - Gabriel Munteanu, avi

* Chris Wright (chrisw@sous-sol.org) wrote:
> * Paul Brook (paul@codesourcery.com) wrote:
> > > > The right approach IMHO is to convert devices to use bus-specific
> > > > functions to access memory.  The bus specific functions should have
> > > > a device argument as the first parameter.
> > > 
> > > As for ATS, the internal api to handle the device's dma reqeust needs
> > > a notion of a translated vs. an untranslated request.  IOW, if qemu ever
> > > had a device with ATS support, the device would use its local cache to
> > > translate the dma address and then submit a translated request to the
> > > pci bus (effectively doing a raw cpu physical memory* in that case).
> > 
> > Really? Can you provide an documentation to support this claim?

Wow...color me surprised...there's actually some apparently public
"training" docs that might help give a more complete view:

http://www.pcisig.com/developers/main/training_materials/get_document?doc_id=0ab681ba7001e40cdb297ddaf279a8de82a7dc40

ATS discussion starts on slide 23.

> > My impression is that there is no difference between translated and 
> > untranslated devices, and the translation is explicitly disabled by software.

And now that I re-read that sentence, I see what you are talking about.
Yes, there is the above notion as well.

A device can live in a 1:1 mapping of device address space to physical
memory.  This could be achieved in a few ways (all done by the OS software
programming the IOMMU).

One is to simply create a set of page tables that map 1:1 all of device
memory to physical memory.  Another is to somehow mark the device as
special (either omit translation tables or mark the translation entry
as effectively "do not translate").  This is often referred to as Pass
Through mode.  But this is not the same as ATS.

Pass Through mode is the functional equivalent of disabling the
translation/isolation capabilities of the IOMMU.  It's typically used
when an OS wants to keep a device for itself and isn't interested in
the isolation properties of the IOMMU.  It then only creates isolating
translation tables for devices it's giving to unprivileged software
(e.g. Linux/KVM giving a device to a guest, Linux giving a device to
user space process, etc.)

> ATS allows an I/O device to request a translation from the IOMMU.
> The device can then cache that translation and use the translated address
> in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> bits in the transaction layer packet header to describe the address
> type for memory transactions.  The default (00) maps to legacy PCIe and
> describes the memory address as untranslated.  This is the normal mode,
> and could then incur a translation if an IOMMU is present and programmed
> w/ page tables, etc. as is passes through the host bridge.
> 
> Another type is simply a transaction requesting a translation.  This is
> new, and allows a device to request (and cache) a translation from the
> IOMMU for subsequent use.
> 
> The third type is a memory transaction tagged as already translated.
> This is the type of transaction an ATS capable I/O device will generate
> when it was able to translate the memory address from its own cache.
> 
> Of course, there's also an invalidation request that the IOMMU can send
> to ATS capable I/O devices to invalidate the cached translation.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 17:02                   ` Avi Kivity
@ 2010-07-15 17:17                     ` Chris Wright
  -1 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Chris Wright, Paul Brook, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm

* Avi Kivity (avi@redhat.com) wrote:
> On 07/15/2010 07:52 PM, Chris Wright wrote:
> >
> >>Really? Can you provide an documentation to support this claim?
> >>My impression is that there is no difference between translated and
> >>untranslated devices, and the translation is explicitly disabled by software.
> >ATS allows an I/O device to request a translation from the IOMMU.
> >The device can then cache that translation and use the translated address
> >in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> >bits in the transaction layer packet header to describe the address
> >type for memory transactions.  The default (00) maps to legacy PCIe and
> >describes the memory address as untranslated.  This is the normal mode,
> >and could then incur a translation if an IOMMU is present and programmed
> >w/ page tables, etc. as is passes through the host bridge.
> >
> >Another type is simply a transaction requesting a translation.  This is
> >new, and allows a device to request (and cache) a translation from the
> >IOMMU for subsequent use.
> >
> >The third type is a memory transaction tagged as already translated.
> >This is the type of transaction an ATS capable I/O device will generate
> >when it was able to translate the memory address from its own cache.
> >
> >Of course, there's also an invalidation request that the IOMMU can send
> >to ATS capable I/O devices to invalidate the cached translation.
> 
> For emulated device, it seems like we can ignore ATS completely, no?

Not if you want to emulate an ATS capable device ;)

Eariler upthread I said:

  IOW, if qemu ever had a device with ATS support...

So, that should've been a much bigger _IF_

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:17                     ` Chris Wright
  0 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Joerg Roedel, qemu-devel, Chris Wright, Paul Brook,
	Eduard - Gabriel Munteanu

* Avi Kivity (avi@redhat.com) wrote:
> On 07/15/2010 07:52 PM, Chris Wright wrote:
> >
> >>Really? Can you provide an documentation to support this claim?
> >>My impression is that there is no difference between translated and
> >>untranslated devices, and the translation is explicitly disabled by software.
> >ATS allows an I/O device to request a translation from the IOMMU.
> >The device can then cache that translation and use the translated address
> >in a PCIe memory transaction.  PCIe uses a couple of previously reserved
> >bits in the transaction layer packet header to describe the address
> >type for memory transactions.  The default (00) maps to legacy PCIe and
> >describes the memory address as untranslated.  This is the normal mode,
> >and could then incur a translation if an IOMMU is present and programmed
> >w/ page tables, etc. as is passes through the host bridge.
> >
> >Another type is simply a transaction requesting a translation.  This is
> >new, and allows a device to request (and cache) a translation from the
> >IOMMU for subsequent use.
> >
> >The third type is a memory transaction tagged as already translated.
> >This is the type of transaction an ATS capable I/O device will generate
> >when it was able to translate the memory address from its own cache.
> >
> >Of course, there's also an invalidation request that the IOMMU can send
> >to ATS capable I/O devices to invalidate the cached translation.
> 
> For emulated device, it seems like we can ignore ATS completely, no?

Not if you want to emulate an ATS capable device ;)

Eariler upthread I said:

  IOW, if qemu ever had a device with ATS support...

So, that should've been a much bigger _IF_

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 17:17                     ` Chris Wright
@ 2010-07-15 17:22                       ` Avi Kivity
  -1 siblings, 0 replies; 83+ messages in thread
From: Avi Kivity @ 2010-07-15 17:22 UTC (permalink / raw)
  To: Chris Wright
  Cc: Paul Brook, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm

On 07/15/2010 08:17 PM, Chris Wright wrote:
>
>> For emulated device, it seems like we can ignore ATS completely, no?
>>      
> Not if you want to emulate an ATS capable device ;)
>    

What I meant was that the whole request translation, invalidate, dma 
using a translated address thing is invisible to software.  We can 
emulate an ATS capable device by going through the iommu every time.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:22                       ` Avi Kivity
  0 siblings, 0 replies; 83+ messages in thread
From: Avi Kivity @ 2010-07-15 17:22 UTC (permalink / raw)
  To: Chris Wright
  Cc: kvm, Joerg Roedel, qemu-devel, Paul Brook, Eduard - Gabriel Munteanu

On 07/15/2010 08:17 PM, Chris Wright wrote:
>
>> For emulated device, it seems like we can ignore ATS completely, no?
>>      
> Not if you want to emulate an ATS capable device ;)
>    

What I meant was that the whole request translation, invalidate, dma 
using a translated address thing is invisible to software.  We can 
emulate an ATS capable device by going through the iommu every time.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 17:02                   ` Avi Kivity
@ 2010-07-15 17:22                     ` Joerg Roedel
  -1 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 17:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Chris Wright, Paul Brook, Anthony Liguori,
	Eduard - Gabriel Munteanu, qemu-devel, kvm

On Thu, Jul 15, 2010 at 08:02:05PM +0300, Avi Kivity wrote:

> For emulated device, it seems like we can ignore ATS completely, no?

An important use-case for emulation is software testing and caching of
iommu's is an important part that needs to be handled in software. For
this purpose it makes sense to emulate the behavior of caches too. So we
probably should not ignore the possibility of an emulated ATS device
completly.

		Joerg


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:22                     ` Joerg Roedel
  0 siblings, 0 replies; 83+ messages in thread
From: Joerg Roedel @ 2010-07-15 17:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, qemu-devel, Chris Wright, Paul Brook, Eduard - Gabriel Munteanu

On Thu, Jul 15, 2010 at 08:02:05PM +0300, Avi Kivity wrote:

> For emulated device, it seems like we can ignore ATS completely, no?

An important use-case for emulation is software testing and caching of
iommu's is an important part that needs to be handled in software. For
this purpose it makes sense to emulate the behavior of caches too. So we
probably should not ignore the possibility of an emulated ATS device
completly.

		Joerg

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 17:22                       ` Avi Kivity
@ 2010-07-15 17:25                         ` Chris Wright
  -1 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Chris Wright, Paul Brook, Anthony Liguori, Joerg Roedel,
	Eduard - Gabriel Munteanu, qemu-devel, kvm

* Avi Kivity (avi@redhat.com) wrote:
> On 07/15/2010 08:17 PM, Chris Wright wrote:
> >
> >>For emulated device, it seems like we can ignore ATS completely, no?
> >Not if you want to emulate an ATS capable device ;)
> 
> What I meant was that the whole request translation, invalidate, dma
> using a translated address thing is invisible to software.  We can
> emulate an ATS capable device by going through the iommu every time.

Well, I don't see any reason to completely ignore it.  It'd be really
useful for testing (I'd use it that way).  Esp to verify the
invalidation of the device IOTLBs.

But I think it's not a difficult thing to emulate once we have a proper
api encapsulating a device's dma request.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:25                         ` Chris Wright
  0 siblings, 0 replies; 83+ messages in thread
From: Chris Wright @ 2010-07-15 17:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Joerg Roedel, qemu-devel, Chris Wright, Paul Brook,
	Eduard - Gabriel Munteanu

* Avi Kivity (avi@redhat.com) wrote:
> On 07/15/2010 08:17 PM, Chris Wright wrote:
> >
> >>For emulated device, it seems like we can ignore ATS completely, no?
> >Not if you want to emulate an ATS capable device ;)
> 
> What I meant was that the whole request translation, invalidate, dma
> using a translated address thing is invisible to software.  We can
> emulate an ATS capable device by going through the iommu every time.

Well, I don't see any reason to completely ignore it.  It'd be really
useful for testing (I'd use it that way).  Esp to verify the
invalidation of the device IOTLBs.

But I think it's not a difficult thing to emulate once we have a proper
api encapsulating a device's dma request.

thanks,
-chris

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 17:17                     ` Chris Wright
@ 2010-07-15 17:27                       ` Eduard - Gabriel Munteanu
  -1 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-15 17:27 UTC (permalink / raw)
  To: Chris Wright
  Cc: Avi Kivity, Paul Brook, Anthony Liguori, Joerg Roedel, qemu-devel, kvm

On Thu, Jul 15, 2010 at 10:17:10AM -0700, Chris Wright wrote:
> * Avi Kivity (avi@redhat.com) wrote:
> > 
> > For emulated device, it seems like we can ignore ATS completely, no?
> 
> Not if you want to emulate an ATS capable device ;)
> 
> Eariler upthread I said:
> 
>   IOW, if qemu ever had a device with ATS support...
> 
> So, that should've been a much bigger _IF_
> 
> thanks,
> -chris

I think we can augment some devices with ATS capability if there are
performance gains in doing so. This doesn't seem to be a detail the
actual guest OS would be interested in, so we can do it even for devices
that existed long before the AMD IOMMU came into existence. But I'm not
really sure about this, it's just a thought.

Linux seems to be issuing IOTLB invalidation commands anyway.


	Eduard


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:27                       ` Eduard - Gabriel Munteanu
  0 siblings, 0 replies; 83+ messages in thread
From: Eduard - Gabriel Munteanu @ 2010-07-15 17:27 UTC (permalink / raw)
  To: Chris Wright; +Cc: kvm, Joerg Roedel, qemu-devel, Avi Kivity, Paul Brook

On Thu, Jul 15, 2010 at 10:17:10AM -0700, Chris Wright wrote:
> * Avi Kivity (avi@redhat.com) wrote:
> > 
> > For emulated device, it seems like we can ignore ATS completely, no?
> 
> Not if you want to emulate an ATS capable device ;)
> 
> Eariler upthread I said:
> 
>   IOW, if qemu ever had a device with ATS support...
> 
> So, that should've been a much bigger _IF_
> 
> thanks,
> -chris

I think we can augment some devices with ATS capability if there are
performance gains in doing so. This doesn't seem to be a detail the
actual guest OS would be interested in, so we can do it even for devices
that existed long before the AMD IOMMU came into existence. But I'm not
really sure about this, it's just a thought.

Linux seems to be issuing IOTLB invalidation commands anyway.


	Eduard

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
  2010-07-15 16:45                 ` Eduard - Gabriel Munteanu
@ 2010-07-15 17:42                   ` Anthony Liguori
  -1 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 17:42 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: Joerg Roedel, Paul Brook, qemu-devel, avi, kvm

On 07/15/2010 11:45 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
>    
>> No.  PCI devices should never call cpu_physical_memory*.
>>
>> PCI devices should call pci_memory*.
>>
>> ISA devices should call isa_memory*.
>>
>> All device memory accesses should go through their respective buses.
>> There can be multiple IOMMUs at different levels of the device
>> hierarchy.  If you don't provide bus-level memory access functions that
>> chain through the hierarchy, it's extremely difficult to implement all
>> the necessary hooks to perform the translations at different places.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>      
> I liked Paul's initial approach more, at least if I understood him
> correctly. Basically I'm suggesting a single memory_* function that
> simply asks the bus for I/O and translation. Say you have something like
> this:
>
> + Bus 1
> |
> ---- Memory 1
> |
> ---+ Bus 2 bridge
>     |
>     ---- Memory 2
>     |
>     ---+ Bus 3 bridge
>        |
>        ---- Device
>
> Say Device wants to write to memory. If we have the DeviceState we
> needn't concern whether this is a BusOneDevice or BusTwoDevice from
> device code itself. We would just call
>
> memory_rw(dev_state, addr, buf, size, is_write);
>    

I dislike this API for a few reasons:

1) buses have different types of addresses with different address 
ranges.  this api would have to take a generic address type.
2) dev_state would be the qdev device state.  this means qdev needs to 
have memory hook mechanisms that's chainable.  I think it's unnecessary 
at the qdev level
3) users have upcasted device states, so it's more natural to pass 
PCIDevice than DeviceState.
4) there's an assumption that all devices can get to DeviceState.  
that's not always true today.

> which simply recurses through DeviceState's and BusState's through their
> parent pointers. The actual bus can set up those to provide
> identification information and perhaps hooks for translation and access
> checking. So memory_rw() looks like this (pseudocode):
>
> static void memory_rw(DeviceState *dev,
>                        target_phys_addr_t addr,
> 		      uint8_t *buf,
> 		      int size,
> 		      int is_write)
> {
> 	BusState *bus = get_parent_bus_of_dev(dev);
> 	DeviceState *pdev = get_parent_dev(dev);
> 	target_phys_addr_t taddr;
>
> 	if (!bus) {
> 		/* This shouldn't happen. */
> 		assert(0);
> 	}
>
> 	if (bus->responsible_for(addr)) {
> 		raw_physical_memory_rw(addr, buf, size, is_write);
> 		return;
> 	}
>
> 	taddr = bus->translate(dev, addr);
> 	memory_rw(pdev, taddr, buf, size, is_write);
>    

This is too simplistic because you sometimes have layering that doesn't 
fit into the bus model.  For instance, virtio + pci.

We really want a virtio_memory_rw that calls either syborg_memory_rw or 
pci_memory_rw based on the transport.  In your proposal, we would have 
to model virtio-pci as a bus with a single device which appears awkward 
to me.

Regards,

Anthony Liguori

> }
>
> If we do this, it seems there's no need to provide separate
> functions. The actual buses must instead initialize those hooks
> properly. Translation here is something inherent to the bus, that
> handles arbitration between possibly multiple IOMMUs. Our memory would
> normally reside on / belong to the top-level bus.
>
> What do you think? (Naming could be better though.)
>
>
> 	Eduard
>
>    


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support
@ 2010-07-15 17:42                   ` Anthony Liguori
  0 siblings, 0 replies; 83+ messages in thread
From: Anthony Liguori @ 2010-07-15 17:42 UTC (permalink / raw)
  To: Eduard - Gabriel Munteanu; +Cc: Joerg Roedel, avi, Paul Brook, kvm, qemu-devel

On 07/15/2010 11:45 AM, Eduard - Gabriel Munteanu wrote:
> On Thu, Jul 15, 2010 at 07:45:06AM -0500, Anthony Liguori wrote:
>    
>> No.  PCI devices should never call cpu_physical_memory*.
>>
>> PCI devices should call pci_memory*.
>>
>> ISA devices should call isa_memory*.
>>
>> All device memory accesses should go through their respective buses.
>> There can be multiple IOMMUs at different levels of the device
>> hierarchy.  If you don't provide bus-level memory access functions that
>> chain through the hierarchy, it's extremely difficult to implement all
>> the necessary hooks to perform the translations at different places.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>      
> I liked Paul's initial approach more, at least if I understood him
> correctly. Basically I'm suggesting a single memory_* function that
> simply asks the bus for I/O and translation. Say you have something like
> this:
>
> + Bus 1
> |
> ---- Memory 1
> |
> ---+ Bus 2 bridge
>     |
>     ---- Memory 2
>     |
>     ---+ Bus 3 bridge
>        |
>        ---- Device
>
> Say Device wants to write to memory. If we have the DeviceState we
> needn't concern whether this is a BusOneDevice or BusTwoDevice from
> device code itself. We would just call
>
> memory_rw(dev_state, addr, buf, size, is_write);
>    

I dislike this API for a few reasons:

1) buses have different types of addresses with different address 
ranges.  this api would have to take a generic address type.
2) dev_state would be the qdev device state.  this means qdev needs to 
have memory hook mechanisms that's chainable.  I think it's unnecessary 
at the qdev level
3) users have upcasted device states, so it's more natural to pass 
PCIDevice than DeviceState.
4) there's an assumption that all devices can get to DeviceState.  
that's not always true today.

> which simply recurses through DeviceState's and BusState's through their
> parent pointers. The actual bus can set up those to provide
> identification information and perhaps hooks for translation and access
> checking. So memory_rw() looks like this (pseudocode):
>
> static void memory_rw(DeviceState *dev,
>                        target_phys_addr_t addr,
> 		      uint8_t *buf,
> 		      int size,
> 		      int is_write)
> {
> 	BusState *bus = get_parent_bus_of_dev(dev);
> 	DeviceState *pdev = get_parent_dev(dev);
> 	target_phys_addr_t taddr;
>
> 	if (!bus) {
> 		/* This shouldn't happen. */
> 		assert(0);
> 	}
>
> 	if (bus->responsible_for(addr)) {
> 		raw_physical_memory_rw(addr, buf, size, is_write);
> 		return;
> 	}
>
> 	taddr = bus->translate(dev, addr);
> 	memory_rw(pdev, taddr, buf, size, is_write);
>    

This is too simplistic because you sometimes have layering that doesn't 
fit into the bus model.  For instance, virtio + pci.

We really want a virtio_memory_rw that calls either syborg_memory_rw or 
pci_memory_rw based on the transport.  In your proposal, we would have 
to model virtio-pci as a bus with a single device which appears awkward 
to me.

Regards,

Anthony Liguori

> }
>
> If we do this, it seems there's no need to provide separate
> functions. The actual buses must instead initialize those hooks
> properly. Translation here is something inherent to the bus, that
> handles arbitration between possibly multiple IOMMUs. Our memory would
> normally reside on / belong to the top-level bus.
>
> What do you think? (Naming could be better though.)
>
>
> 	Eduard
>
>    

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2010-07-15 17:42 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-14  5:45 [RFC PATCH 0/7] AMD IOMMU emulation patchset Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [RFC PATCH 1/7] Generic IOMMU layer Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  6:07   ` malc
2010-07-14  6:07     ` [Qemu-devel] " malc
2010-07-14 22:47     ` Eduard - Gabriel Munteanu
2010-07-14 22:47       ` Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [RFC PATCH 2/7] AMD IOMMU emulation Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14 20:16   ` Paul Brook
2010-07-14 20:16     ` Paul Brook
2010-07-14  5:45 ` [RFC PATCH 3/7] pci: call IOMMU hooks Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  7:37   ` Isaku Yamahata
2010-07-14  7:37     ` Isaku Yamahata
2010-07-14 22:50     ` Eduard - Gabriel Munteanu
2010-07-14 22:50       ` Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [RFC PATCH 4/7] ide: IOMMU support Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14 13:53   ` Paul Brook
2010-07-14 13:53     ` [Qemu-devel] " Paul Brook
2010-07-14 18:33     ` Joerg Roedel
2010-07-14 18:33       ` [Qemu-devel] " Joerg Roedel
2010-07-14 20:13       ` Paul Brook
2010-07-14 21:29         ` Anthony Liguori
2010-07-14 21:29           ` Anthony Liguori
2010-07-14 22:24           ` Chris Wright
2010-07-14 22:24             ` Chris Wright
2010-07-15 10:28             ` Paul Brook
2010-07-15 10:28               ` Paul Brook
2010-07-15 16:52               ` Chris Wright
2010-07-15 16:52                 ` Chris Wright
2010-07-15 17:02                 ` Avi Kivity
2010-07-15 17:02                   ` Avi Kivity
2010-07-15 17:17                   ` Chris Wright
2010-07-15 17:17                     ` Chris Wright
2010-07-15 17:22                     ` Avi Kivity
2010-07-15 17:22                       ` Avi Kivity
2010-07-15 17:25                       ` Chris Wright
2010-07-15 17:25                         ` Chris Wright
2010-07-15 17:27                     ` Eduard - Gabriel Munteanu
2010-07-15 17:27                       ` Eduard - Gabriel Munteanu
2010-07-15 17:22                   ` Joerg Roedel
2010-07-15 17:22                     ` Joerg Roedel
2010-07-15 17:14                 ` Chris Wright
2010-07-15 17:14                   ` Chris Wright
2010-07-15  9:10           ` Joerg Roedel
2010-07-15  9:10             ` Joerg Roedel
2010-07-15 12:45             ` Anthony Liguori
2010-07-15 12:45               ` Anthony Liguori
2010-07-15 14:45               ` Joerg Roedel
2010-07-15 14:45                 ` Joerg Roedel
2010-07-15 16:45               ` Eduard - Gabriel Munteanu
2010-07-15 16:45                 ` Eduard - Gabriel Munteanu
2010-07-15 17:42                 ` Anthony Liguori
2010-07-15 17:42                   ` Anthony Liguori
2010-07-15 10:33           ` Paul Brook
2010-07-15 10:33             ` Paul Brook
2010-07-15 12:42             ` Anthony Liguori
2010-07-15 12:42               ` Anthony Liguori
2010-07-15 14:02               ` Paul Brook
2010-07-15 14:02                 ` Paul Brook
2010-07-14 23:39         ` Eduard - Gabriel Munteanu
2010-07-14 23:39           ` Eduard - Gabriel Munteanu
2010-07-15  9:22         ` Joerg Roedel
2010-07-15  9:22           ` Joerg Roedel
2010-07-15 10:49           ` Paul Brook
2010-07-15 10:49             ` Paul Brook
2010-07-15 14:59             ` Joerg Roedel
2010-07-15 14:59               ` Joerg Roedel
2010-07-14 23:11     ` Eduard - Gabriel Munteanu
2010-07-14 23:11       ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-15 10:58       ` Paul Brook
2010-07-15 10:58         ` [Qemu-devel] " Paul Brook
2010-07-14  5:45 ` [RFC PATCH 5/7] rtl8139: " Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [RFC PATCH 6/7] eepro100: " Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  5:45 ` [RFC PATCH 7/7] ac97: " Eduard - Gabriel Munteanu
2010-07-14  5:45   ` [Qemu-devel] " Eduard - Gabriel Munteanu
2010-07-14  6:09   ` malc
2010-07-14  6:09     ` [Qemu-devel] " malc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.