All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/7] IOMMU support
@ 2012-10-30 11:47 Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 1/7] memory: fix address space initialization/destruction Avi Kivity
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

Changes from RFC:
 - change ->translate to return read/write permissions in IOTLBEntry (was:
   ->translate received is_write parameter)
 - add support for iommu fault reporting

Avi Kivity (7):
  memory: fix address space initialization/destruction
  memory: limit sections in the radix tree to the actual address space
    size
  memory: iommu support
  memory: provide a MemoryRegion for IOMMUs to log faults
  pci: use memory core for iommu support
  vfio: abort if an emulated iommu is used
  i440fx: add an iommu

 exec.c             |  43 +++++++++++++++++---
 hw/pci.c           |  59 +++++++++++++++++-----------
 hw/pci.h           |   7 +++-
 hw/pci_internals.h |   5 ++-
 hw/piix_pci.c      |  77 ++++++++++++++++++++++++++++++++++++
 hw/spapr.h         |   1 +
 hw/spapr_iommu.c   |  45 +++++++++------------
 hw/spapr_pci.c     |  27 +++++++++++--
 hw/spapr_pci.h     |   2 +
 hw/vfio_pci.c      |   2 +
 memory.c           | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory.h           |  49 +++++++++++++++++++++++
 12 files changed, 366 insertions(+), 63 deletions(-)

-- 
1.7.12

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 1/7] memory: fix address space initialization/destruction
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 2/7] memory: limit sections in the radix tree to the actual address space size Avi Kivity
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

A couple of fields were left uninitialized.  This was not observed earlier
because all address spaces were statically allocated.  Also free allocation
for those fields.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/memory.c b/memory.c
index 243cb23..ae3552b 100644
--- a/memory.c
+++ b/memory.c
@@ -1541,6 +1541,8 @@ void address_space_init(AddressSpace *as, MemoryRegion *root)
     as->root = root;
     as->current_map = g_new(FlatView, 1);
     flatview_init(as->current_map);
+    as->ioeventfd_nb = 0;
+    as->ioeventfds = NULL;
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
     as->name = NULL;
     memory_region_transaction_commit();
@@ -1557,6 +1559,7 @@ void address_space_destroy(AddressSpace *as)
     address_space_destroy_dispatch(as);
     flatview_destroy(as->current_map);
     g_free(as->current_map);
+    g_free(as->ioeventfds);
 }
 
 uint64_t io_mem_read(MemoryRegion *mr, hwaddr addr, unsigned size)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 2/7] memory: limit sections in the radix tree to the actual address space size
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 1/7] memory: fix address space initialization/destruction Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 3/7] memory: iommu support Avi Kivity
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

The radix tree is statically sized to fit TARGET_PHYS_ADDR_SPACE_BITS.
If a larger memory region is registered, it will overflow.

Fix by limiting any section in the radix tree to the supported size.

This problem was not observed earlier since artificial regions (containers
and aliases) are eliminated by the memory core, leaving only device regions
which have reasonable sizes.  An IOMMU however cannot be eliminated by the
memory core, and may have an artificial size.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 exec.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index b0ed593..deee8ec 100644
--- a/exec.c
+++ b/exec.c
@@ -2280,10 +2280,23 @@ static void register_multipage(AddressSpaceDispatch *d, MemoryRegionSection *sec
                   section_index);
 }
 
+static MemoryRegionSection limit(MemoryRegionSection section)
+{
+    unsigned practical_as_bits = MIN(TARGET_PHYS_ADDR_SPACE_BITS, 62);
+    hwaddr as_limit;
+
+    as_limit = (hwaddr)1 << practical_as_bits;
+
+    section.size = MIN(section.offset_within_address_space + section.size, as_limit)
+                   - section.offset_within_address_space;
+
+    return section;
+}
+
 static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
 {
     AddressSpaceDispatch *d = container_of(listener, AddressSpaceDispatch, listener);
-    MemoryRegionSection now = *section, remain = *section;
+    MemoryRegionSection now = limit(*section), remain = limit(*section);
 
     if ((now.offset_within_address_space & ~TARGET_PAGE_MASK)
         || (now.size < TARGET_PAGE_SIZE)) {
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 1/7] memory: fix address space initialization/destruction Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 2/7] memory: limit sections in the radix tree to the actual address space size Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 19:11   ` Blue Swirl
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults Avi Kivity
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

Add a new memory region type that translates addresses it is given,
then forwards them to a target address space.  This is similar to
an alias, except that the mapping is more flexible than a linear
translation and trucation, and also less efficient since the
translation happens at runtime.

The implementation uses an AddressSpace mapping the target region to
avoid hierarchical dispatch all the way to the resolved region; only
iommu regions are looked up dynamically.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 exec.c   |  28 ++++++++++++++---
 memory.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory.h |  45 +++++++++++++++++++++++++++
 3 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index deee8ec..c3914fc 100644
--- a/exec.c
+++ b/exec.c
@@ -3507,6 +3507,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
 
 typedef struct {
     void *buffer;
+    AddressSpace *as;
     hwaddr addr;
     hwaddr len;
 } BounceBuffer;
@@ -3572,23 +3573,42 @@ void *address_space_map(AddressSpace *as,
     ram_addr_t raddr = RAM_ADDR_MAX;
     ram_addr_t rlen;
     void *ret;
+    IOMMUTLBEntry iotlb;
+    hwaddr xlat;
+    AddressSpace *as_xlat;
 
     while (len > 0) {
+        xlat = addr;
+        as_xlat = as;
         page = addr & TARGET_PAGE_MASK;
         l = (page + TARGET_PAGE_SIZE) - addr;
         if (l > len)
             l = len;
         section = phys_page_find(d, page >> TARGET_PAGE_BITS);
 
+        while (section->mr->iommu_ops) {
+            iotlb = section->mr->iommu_ops->translate(section->mr, xlat);
+            if (iotlb.perm[is_write]) {
+                xlat = ((iotlb.translated_addr & ~iotlb.addr_mask)
+                        | (addr & iotlb.addr_mask));
+                as_xlat = section->mr->iommu_target_as;
+                l = (MIN(xlat + l - 1, xlat | iotlb.addr_mask) - xlat) + 1;
+                section = phys_page_find(as_xlat->dispatch, xlat >> TARGET_PAGE_BITS);
+            } else {
+                section = &phys_sections[phys_section_unassigned];
+            }
+        }
+
         if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
             if (todo || bounce.buffer) {
                 break;
             }
             bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
-            bounce.addr = addr;
+            bounce.addr = xlat;
+            bounce.as = as_xlat;
             bounce.len = l;
             if (!is_write) {
-                address_space_read(as, addr, bounce.buffer, l);
+                address_space_read(as_xlat, xlat, bounce.buffer, l);
             }
 
             *plen = l;
@@ -3596,7 +3616,7 @@ void *address_space_map(AddressSpace *as,
         }
         if (!todo) {
             raddr = memory_region_get_ram_addr(section->mr)
-                + memory_region_section_addr(section, addr);
+                + memory_region_section_addr(section, xlat);
         }
 
         len -= l;
@@ -3635,7 +3655,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
         return;
     }
     if (is_write) {
-        address_space_write(as, bounce.addr, bounce.buffer, access_len);
+        address_space_write(bounce.as, bounce.addr, bounce.buffer, access_len);
     }
     qemu_vfree(bounce.buffer);
     bounce.buffer = NULL;
diff --git a/memory.c b/memory.c
index ae3552b..ba2d4a0 100644
--- a/memory.c
+++ b/memory.c
@@ -775,6 +775,12 @@ static void memory_region_destructor_rom_device(MemoryRegion *mr)
     qemu_ram_free(mr->ram_addr & TARGET_PAGE_MASK);
 }
 
+static void memory_region_destructor_iommu(MemoryRegion *mr)
+{
+    address_space_destroy(mr->iommu_target_as);
+    g_free(mr->iommu_target_as);
+}
+
 static bool memory_region_wrong_endianness(MemoryRegion *mr)
 {
 #ifdef TARGET_WORDS_BIGENDIAN
@@ -789,6 +795,7 @@ void memory_region_init(MemoryRegion *mr,
                         uint64_t size)
 {
     mr->ops = NULL;
+    mr->iommu_ops = NULL;
     mr->parent = NULL;
     mr->size = int128_make64(size);
     if (size == UINT64_MAX) {
@@ -980,6 +987,100 @@ void memory_region_init_rom_device(MemoryRegion *mr,
     mr->ram_addr = qemu_ram_alloc(size, mr);
 }
 
+static void memory_region_iommu_rw(MemoryRegion *iommu, hwaddr addr,
+                                   uint8_t *buf, unsigned len, bool is_write)
+{
+    IOMMUTLBEntry tlb;
+    unsigned clen;
+    hwaddr xlat;
+
+    while (len) {
+        tlb = iommu->iommu_ops->translate(iommu, addr);
+        clen = (MIN(addr | tlb.addr_mask, addr + len - 1) - addr) + 1;
+        if (tlb.perm[is_write]) {
+            xlat = (tlb.translated_addr & ~tlb.addr_mask) | (addr & tlb.addr_mask);
+            address_space_rw(iommu->iommu_target_as, xlat, buf, clen, is_write);
+        } else {
+            if (!is_write) {
+                memset(buf, 0xff, clen);
+            }
+        }
+        buf += clen;
+        addr += clen;
+        len -= clen;
+    }
+}
+
+static uint64_t memory_region_iommu_read(void *opaque, hwaddr addr,
+                                         unsigned size)
+{
+    MemoryRegion *iommu = opaque;
+    union {
+        uint8_t buf[8];
+        uint8_t u8;
+        uint16_t u16;
+        uint32_t u32;
+        uint64_t u64;
+    } ret;
+
+    memory_region_iommu_rw(iommu, addr, ret.buf, size, false);
+    switch (size) {
+    case 1: return ret.u8;
+    case 2: return ret.u16;
+    case 4: return ret.u32;
+    case 8: return ret.u64;
+    default: abort();
+    }
+}
+
+static void memory_region_iommu_write(void *opaque, hwaddr addr,
+                                      uint64_t data, unsigned size)
+{
+    MemoryRegion *iommu = opaque;
+    union {
+        uint8_t buf[8];
+        uint8_t u8;
+        uint16_t u16;
+        uint32_t u32;
+        uint64_t u64;
+    } in;
+
+    switch (size) {
+    case 1: in.u8 = data; break;
+    case 2: in.u16 = data; break;
+    case 4: in.u32 = data; break;
+    case 8: in.u64 = data; break;
+    default: abort();
+    }
+    memory_region_iommu_rw(iommu, addr, in.buf, size, true);
+}
+
+static MemoryRegionOps memory_region_iommu_ops = {
+    .read = memory_region_iommu_read,
+    .write = memory_region_iommu_write,
+#ifdef HOST_BIGENDIAN
+    .endianness = DEVICE_BIG_ENDIAN,
+#else
+    .endianness = DEVICE_LITTLE_ENDIAN,
+#endif
+};
+
+void memory_region_init_iommu(MemoryRegion *mr,
+                              MemoryRegionIOMMUOps *ops,
+                              MemoryRegion *target,
+                              const char *name,
+                              uint64_t size)
+{
+    memory_region_init(mr, name, size);
+    mr->ops = &memory_region_iommu_ops;
+    mr->iommu_ops = ops,
+    mr->opaque = mr;
+    mr->terminates = true;  /* then re-forwards */
+    mr->destructor = memory_region_destructor_iommu;
+    mr->iommu_target_as = g_new(AddressSpace, 1);
+    address_space_init(mr->iommu_target_as, target);
+}
+
 static uint64_t invalid_read(void *opaque, hwaddr addr,
                              unsigned size)
 {
@@ -1054,6 +1155,11 @@ bool memory_region_is_rom(MemoryRegion *mr)
     return mr->ram && mr->readonly;
 }
 
+bool memory_region_is_iommu(MemoryRegion *mr)
+{
+    return mr->iommu_ops;
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
diff --git a/memory.h b/memory.h
index 9462bfd..47362c9 100644
--- a/memory.h
+++ b/memory.h
@@ -113,12 +113,28 @@ struct MemoryRegionOps {
     const MemoryRegionMmio old_mmio;
 };
 
+typedef struct IOMMUTLBEntry IOMMUTLBEntry;
+typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
+
+struct IOMMUTLBEntry {
+    hwaddr device_addr;
+    hwaddr translated_addr;
+    hwaddr addr_mask;  /* 0xfff = 4k translation */
+    bool perm[2]; /* read/write permissions */
+};
+
+struct MemoryRegionIOMMUOps {
+    /* Returns a TLB entry that contains a given address. */
+    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr);
+};
+
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 
 struct MemoryRegion {
     /* All fields are private - violators will be prosecuted */
     const MemoryRegionOps *ops;
+    const MemoryRegionIOMMUOps *iommu_ops;
     void *opaque;
     MemoryRegion *parent;
     Int128 size;
@@ -145,6 +161,7 @@ struct MemoryRegion {
     uint8_t dirty_log_mask;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
+    struct AddressSpace *iommu_target_as;
 };
 
 struct MemoryRegionPortio {
@@ -334,6 +351,25 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 void memory_region_init_reservation(MemoryRegion *mr,
                                     const char *name,
                                     uint64_t size);
+
+/**
+ * memory_region_init_iommu: Initialize a memory region that translates addresses
+ *
+ * An IOMMU region translates addresses and forwards accesses to a target memory region.
+ *
+ * @mr: the #MemoryRegion to be initialized
+ * @ops: a function that translates addresses into the @target region
+ * @target: a #MemoryRegion that will be used to satisfy accesses to translated
+ *          addresses
+ * @name: used for debugging; not visible to the user or ABI
+ * @size: size of the region.
+ */
+void memory_region_init_iommu(MemoryRegion *mr,
+                              MemoryRegionIOMMUOps *ops,
+                              MemoryRegion *target,
+                              const char *name,
+                              uint64_t size);
+
 /**
  * memory_region_destroy: Destroy a memory region and reclaim all resources.
  *
@@ -373,6 +409,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
 }
 
 /**
+ * memory_region_is_ram: check whether a memory region is an iommu
+ *
+ * Returns %true is a memory region is an iommu.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_is_iommu(MemoryRegion *mr);
+
+/**
  * memory_region_name: get a memory region's name
  *
  * Returns the string that was used to initialize the memory region.
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
                   ` (2 preceding siblings ...)
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 3/7] memory: iommu support Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 19:14   ` Blue Swirl
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 5/7] pci: use memory core for iommu support Avi Kivity
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

Accesses which do not translate will hit the fault region, which
can then log the access.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 memory.c | 9 ++++++---
 memory.h | 4 ++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/memory.c b/memory.c
index ba2d4a0..d6b46fd 100644
--- a/memory.c
+++ b/memory.c
@@ -778,7 +778,9 @@ static void memory_region_destructor_rom_device(MemoryRegion *mr)
 static void memory_region_destructor_iommu(MemoryRegion *mr)
 {
     address_space_destroy(mr->iommu_target_as);
+    address_space_destroy(mr->iommu_fault_as);
     g_free(mr->iommu_target_as);
+    g_free(mr->iommu_fault_as);
 }
 
 static bool memory_region_wrong_endianness(MemoryRegion *mr)
@@ -1001,9 +1003,7 @@ static void memory_region_iommu_rw(MemoryRegion *iommu, hwaddr addr,
             xlat = (tlb.translated_addr & ~tlb.addr_mask) | (addr & tlb.addr_mask);
             address_space_rw(iommu->iommu_target_as, xlat, buf, clen, is_write);
         } else {
-            if (!is_write) {
-                memset(buf, 0xff, clen);
-            }
+            address_space_rw(iommu->iommu_fault_as, addr, buf, clen, is_write);
         }
         buf += clen;
         addr += clen;
@@ -1068,6 +1068,7 @@ static void memory_region_iommu_write(void *opaque, hwaddr addr,
 void memory_region_init_iommu(MemoryRegion *mr,
                               MemoryRegionIOMMUOps *ops,
                               MemoryRegion *target,
+                              MemoryRegion *fault,
                               const char *name,
                               uint64_t size)
 {
@@ -1079,6 +1080,8 @@ void memory_region_init_iommu(MemoryRegion *mr,
     mr->destructor = memory_region_destructor_iommu;
     mr->iommu_target_as = g_new(AddressSpace, 1);
     address_space_init(mr->iommu_target_as, target);
+    mr->iommu_fault_as = g_new(AddressSpace, 1);
+    address_space_init(mr->iommu_fault_as, fault);
 }
 
 static uint64_t invalid_read(void *opaque, hwaddr addr,
diff --git a/memory.h b/memory.h
index 47362c9..6312197 100644
--- a/memory.h
+++ b/memory.h
@@ -162,6 +162,7 @@ struct MemoryRegion {
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
     struct AddressSpace *iommu_target_as;
+    struct AddressSpace *iommu_fault_as;
 };
 
 struct MemoryRegionPortio {
@@ -361,12 +362,15 @@ void memory_region_init_reservation(MemoryRegion *mr,
  * @ops: a function that translates addresses into the @target region
  * @target: a #MemoryRegion that will be used to satisfy accesses to translated
  *          addresses
+ * @fault: a #MemoryRegion for servicing failed translations; accessed with
+ *         untranslated addresses
  * @name: used for debugging; not visible to the user or ABI
  * @size: size of the region.
  */
 void memory_region_init_iommu(MemoryRegion *mr,
                               MemoryRegionIOMMUOps *ops,
                               MemoryRegion *target,
+                              MemoryRegion *fault,
                               const char *name,
                               uint64_t size);
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 5/7] pci: use memory core for iommu support
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
                   ` (3 preceding siblings ...)
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 6/7] vfio: abort if an emulated iommu is used Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu Avi Kivity
  6 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

Use the new iommu support in the memory core for iommu support.
The only user, spapr, is also converted.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 hw/pci.c           | 59 +++++++++++++++++++++++++++++++++---------------------
 hw/pci.h           |  7 +++++--
 hw/pci_internals.h |  5 +++--
 hw/spapr.h         |  1 +
 hw/spapr_iommu.c   | 45 +++++++++++++++++------------------------
 hw/spapr_pci.c     | 27 +++++++++++++++++++++----
 hw/spapr_pci.h     |  2 ++
 7 files changed, 88 insertions(+), 58 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index dceda0b..5056ddb 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -274,6 +274,21 @@ int pci_find_domain(const PCIBus *bus)
     return -1;
 }
 
+static MemoryRegion *pci_default_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    MemoryRegion *mr = g_new(MemoryRegion, 1);
+
+    /* FIXME: inherit memory region from bus creator */
+    memory_region_init_alias(mr, "iommu-nop", get_system_memory(), 0, INT64_MAX);
+    return mr;
+}
+
+static void pci_default_iommu_dtor(MemoryRegion *mr)
+{
+    memory_region_destroy(mr);
+    g_free(mr);
+}
+
 void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
                          const char *name,
                          MemoryRegion *address_space_mem,
@@ -285,6 +300,7 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
     bus->devfn_min = devfn_min;
     bus->address_space_mem = address_space_mem;
     bus->address_space_io = address_space_io;
+    pci_setup_iommu(bus, pci_default_iommu, pci_default_iommu_dtor, NULL);
 
     /* host bridge */
     QLIST_INIT(&bus->child);
@@ -775,21 +791,16 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
                      PCI_SLOT(devfn), PCI_FUNC(devfn), name, bus->devices[devfn]->name);
         return NULL;
     }
+
     pci_dev->bus = bus;
-    if (bus->dma_context_fn) {
-        pci_dev->dma = bus->dma_context_fn(bus, bus->dma_context_opaque, devfn);
-    } else {
-        /* FIXME: Make dma_context_fn use MemoryRegions instead, so this path is
-         * taken unconditionally */
-        /* FIXME: inherit memory region from bus creator */
-        memory_region_init_alias(&pci_dev->bus_master_enable_region, "bus master",
-                                 get_system_memory(), 0,
-                                 memory_region_size(get_system_memory()));
-        memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
-        address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_enable_region);
-        pci_dev->dma = g_new(DMAContext, 1);
-        dma_context_init(pci_dev->dma, &pci_dev->bus_master_as, NULL, NULL, NULL);
-    }
+    pci_dev->iommu = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
+    memory_region_init_alias(&pci_dev->bus_master_enable_region, "bus master",
+                             pci_dev->iommu, 0, memory_region_size(pci_dev->iommu));
+    memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
+    address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_enable_region);
+    pci_dev->dma = g_new(DMAContext, 1);
+    dma_context_init(pci_dev->dma, &pci_dev->bus_master_as, NULL, NULL, NULL);
+
     pci_dev->devfn = devfn;
     pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
     pci_dev->irq_state = 0;
@@ -843,12 +854,12 @@ static void do_pci_unregister_device(PCIDevice *pci_dev)
     pci_dev->bus->devices[pci_dev->devfn] = NULL;
     pci_config_free(pci_dev);
 
-    if (!pci_dev->bus->dma_context_fn) {
-        address_space_destroy(&pci_dev->bus_master_as);
-        memory_region_destroy(&pci_dev->bus_master_enable_region);
-        g_free(pci_dev->dma);
-        pci_dev->dma = NULL;
-    }
+    address_space_destroy(&pci_dev->bus_master_as);
+    memory_region_del_subregion(&pci_dev->bus_master_enable_region, pci_dev->iommu);
+    pci_dev->bus->iommu_dtor_fn(pci_dev->iommu);
+    memory_region_destroy(&pci_dev->bus_master_enable_region);
+    g_free(pci_dev->dma);
+    pci_dev->dma = NULL;
 }
 
 static void pci_unregister_io_regions(PCIDevice *pci_dev)
@@ -2140,10 +2151,12 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
     k->props = pci_props;
 }
 
-void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque)
+void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, PCIIOMMUDestructorFunc dtor,
+                     void *opaque)
 {
-    bus->dma_context_fn = fn;
-    bus->dma_context_opaque = opaque;
+    bus->iommu_fn = fn;
+    bus->iommu_dtor_fn = dtor;
+    bus->iommu_opaque = opaque;
 }
 
 static TypeInfo pci_device_type_info = {
diff --git a/hw/pci.h b/hw/pci.h
index 241c1d8..c3dd282 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -213,6 +213,7 @@ struct PCIDevice {
     PCIIORegion io_regions[PCI_NUM_REGIONS];
     AddressSpace bus_master_as;
     MemoryRegion bus_master_enable_region;
+    MemoryRegion *iommu;
     DMAContext *dma;
 
     /* do not access the following fields */
@@ -357,9 +358,11 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp,
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
-typedef DMAContext *(*PCIDMAContextFunc)(PCIBus *, void *, int);
+typedef MemoryRegion *(*PCIIOMMUFunc)(PCIBus *, void *, int);
+typedef void (*PCIIOMMUDestructorFunc)(MemoryRegion *mr);
 
-void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque);
+void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, PCIIOMMUDestructorFunc dtor,
+                     void *opaque);
 
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index 21d0ce6..ab64a0e 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -17,8 +17,9 @@
 
 struct PCIBus {
     BusState qbus;
-    PCIDMAContextFunc dma_context_fn;
-    void *dma_context_opaque;
+    PCIIOMMUFunc iommu_fn;
+    PCIIOMMUDestructorFunc iommu_dtor_fn;
+    void *iommu_opaque;
     uint8_t devfn_min;
     pci_set_irq_fn set_irq;
     pci_map_irq_fn map_irq;
diff --git a/hw/spapr.h b/hw/spapr.h
index 51c709e..4456d1b 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -342,6 +342,7 @@ typedef struct sPAPRTCE {
 
 
 void spapr_iommu_init(void);
+IOMMUTLBEntry spapr_tce_translate(DMAContext *dma, hwaddr addr);
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 86dc8f9..08cbc72 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -64,15 +64,9 @@ static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
     return NULL;
 }
 
-static int spapr_tce_translate(DMAContext *dma,
-                               dma_addr_t addr,
-                               hwaddr *paddr,
-                               hwaddr *len,
-                               DMADirection dir)
+IOMMUTLBEntry spapr_tce_translate(DMAContext *dma, hwaddr addr)
 {
     sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, dma);
-    enum sPAPRTCEAccess access = (dir == DMA_DIRECTION_FROM_DEVICE)
-        ? SPAPR_TCE_WO : SPAPR_TCE_RO;
     uint64_t tce;
 
 #ifdef DEBUG_TCE
@@ -81,9 +75,12 @@ static int spapr_tce_translate(DMAContext *dma,
 #endif
 
     if (tcet->bypass) {
-        *paddr = addr;
-        *len = (hwaddr)-1;
-        return 0;
+        return (IOMMUTLBEntry) {
+            .device_addr = 0,
+            .translated_addr = 0,
+            .addr_mask = ~(hwaddr)0,
+            .perm = { true, true },
+        };
     }
 
     /* Check if we are in bound */
@@ -91,29 +88,22 @@ static int spapr_tce_translate(DMAContext *dma,
 #ifdef DEBUG_TCE
         fprintf(stderr, "spapr_tce_translate out of bounds\n");
 #endif
-        return -EFAULT;
+        return (IOMMUTLBEntry) { .perm = { false, false } };
     }
 
     tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT].tce;
 
-    /* Check TCE */
-    if (!(tce & access)) {
-        return -EPERM;
-    }
-
-    /* How much til end of page ? */
-    *len = ((~addr) & SPAPR_TCE_PAGE_MASK) + 1;
-
-    /* Translate */
-    *paddr = (tce & ~SPAPR_TCE_PAGE_MASK) |
-        (addr & SPAPR_TCE_PAGE_MASK);
-
 #ifdef DEBUG_TCE
     fprintf(stderr, " ->  *paddr=0x" TARGET_FMT_plx ", *len=0x"
-            TARGET_FMT_plx "\n", *paddr, *len);
+            TARGET_FMT_plx "\n", (tce & ~SPAPR_TCE_PAGE_MASK), SPAPR_TCE_PAGE_MASK + 1);
 #endif
 
-    return 0;
+    return (IOMMUTLBEntry) {
+        .device_addr = addr & SPAPR_TCE_PAGE_MASK,
+        .translated_addr = (tce & ~SPAPR_TCE_PAGE_MASK),
+        .addr_mask = SPAPR_TCE_PAGE_MASK,
+        .perm = { [0] = tce & SPAPR_TCE_RO, [1] = tce & SPAPR_TCE_WO },
+    };
 }
 
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size)
@@ -125,7 +115,7 @@ DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size)
     }
 
     tcet = g_malloc0(sizeof(*tcet));
-    dma_context_init(&tcet->dma, &address_space_memory, spapr_tce_translate, NULL, NULL);
+    dma_context_init(&tcet->dma, &address_space_memory, NULL, NULL, NULL);
 
     tcet->liobn = liobn;
     tcet->window_size = window_size;
@@ -277,7 +267,8 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
         return 0;
     }
 
-    if (iommu->translate == spapr_tce_translate) {
+    /* FIXME: WHAT?? */
+    if (true) {
         sPAPRTCETable *tcet = DO_UPCAST(sPAPRTCETable, dma, iommu);
         return spapr_dma_dt(fdt, node_off, propname,
                 tcet->liobn, 0, tcet->window_size);
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index c2c3079..892a6ef 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -469,14 +469,29 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
 /*
  * PHB PCI device
  */
-static DMAContext *spapr_pci_dma_context_fn(PCIBus *bus, void *opaque,
-                                            int devfn)
+static MemoryRegion *spapr_pci_dma_iommu_new(PCIBus *bus, void *opaque, int devfn)
 {
     sPAPRPHBState *phb = opaque;
 
-    return phb->dma;
+    return &phb->iommu;
 }
 
+static void spapr_pci_dma_iommu_destroy(MemoryRegion *iommu)
+{
+    /* iommu is shared among devices, do nothing */
+}
+
+static IOMMUTLBEntry spapr_phb_translate(MemoryRegion *iommu, hwaddr addr)
+{
+    sPAPRPHBState *phb = container_of(iommu, sPAPRPHBState, iommu);
+
+    return spapr_tce_translate(phb->dma, addr);
+}
+
+static MemoryRegionIOMMUOps spapr_iommu_ops = {
+    .translate = spapr_phb_translate,
+};
+
 static int spapr_phb_init(SysBusDevice *s)
 {
     sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
@@ -534,7 +549,11 @@ static int spapr_phb_init(SysBusDevice *s)
     sphb->dma_window_start = 0;
     sphb->dma_window_size = 0x40000000;
     sphb->dma = spapr_tce_new_dma_context(sphb->dma_liobn, sphb->dma_window_size);
-    pci_setup_iommu(bus, spapr_pci_dma_context_fn, sphb);
+    memory_region_init(&sphb->iommu_fault, "iommu-spapr-fault", INT64_MAX);
+    memory_region_init_iommu(&sphb->iommu, &spapr_iommu_ops,
+                             get_system_memory(), &sphb->iommu_fault,
+                             "iommu-spapr", INT64_MAX);
+    pci_setup_iommu(bus, spapr_pci_dma_iommu_new, spapr_pci_dma_iommu_destroy, sphb);
 
     QLIST_INSERT_HEAD(&spapr->phbs, sphb, list);
 
diff --git a/hw/spapr_pci.h b/hw/spapr_pci.h
index a77d7d5..b904dcf 100644
--- a/hw/spapr_pci.h
+++ b/hw/spapr_pci.h
@@ -45,6 +45,8 @@ typedef struct sPAPRPHBState {
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
     hwaddr msi_win_addr;
     MemoryRegion memwindow, msiwindow;
+    MemoryRegion iommu;
+    MemoryRegion iommu_fault;
 
     uint32_t dma_liobn;
     uint64_t dma_window_start;
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 6/7] vfio: abort if an emulated iommu is used
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
                   ` (4 preceding siblings ...)
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 5/7] pci: use memory core for iommu support Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu Avi Kivity
  6 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

vfio doesn't support guest iommus yet, indicate it to the user
by gently depositing a core on their disk.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 hw/vfio_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 0473ae8..bd7a075 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -943,6 +943,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
     void *vaddr;
     int ret;
 
+    assert(!memory_region_is_iommu(section->mr));
+
     if (vfio_listener_skipped_section(section)) {
         DPRINTF("vfio: SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
                 section->offset_within_address_space,
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu
  2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
                   ` (5 preceding siblings ...)
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 6/7] vfio: abort if an emulated iommu is used Avi Kivity
@ 2012-10-30 11:47 ` Avi Kivity
  2012-10-30 19:18   ` Blue Swirl
  6 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2012-10-30 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Alexander Graf, Blue Swirl, Alex Williamson,
	Anthony Liguori

This iommu encrypts addresses on the device bus to avoid divuling information
to hackers equipped with bus analyzers.  Following 3DES, addresses are encrypted
multiple times.  A XOR cypher is employed for efficiency.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 hw/piix_pci.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index 9af5847..99601f4 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -30,6 +30,7 @@
 #include "sysbus.h"
 #include "range.h"
 #include "xen.h"
+#include "exec-memory.h"
 
 /*
  * I440FX chipset data sheet.
@@ -248,6 +249,81 @@ static int i440fx_initfn(PCIDevice *dev)
     return 0;
 }
 
+typedef struct SillyIOMMU SillyIOMMU;
+
+struct SillyIOMMU {
+    MemoryRegion fault;
+    MemoryRegion l1;
+    MemoryRegion l2;
+    hwaddr mask;
+    hwaddr secret;
+};
+
+static IOMMUTLBEntry silly_l1_translate(MemoryRegion *l1, hwaddr addr)
+{
+    SillyIOMMU *s = container_of(l1, SillyIOMMU, l1);
+    hwaddr xlat = addr ^ s->secret;
+
+    printf("l1: %" HWADDR_PRIx " -> %" HWADDR_PRIx "\n", addr, xlat);
+
+    return (IOMMUTLBEntry) {
+        .device_addr = addr & ~s->mask,
+        .translated_addr = xlat & ~s->mask,
+        .addr_mask = s->mask,
+        .perm = { true, true },
+    };
+}
+
+static MemoryRegionIOMMUOps silly_l1_iommu_ops = {
+    .translate = silly_l1_translate,
+};
+
+static IOMMUTLBEntry silly_l2_translate(MemoryRegion *l2, hwaddr addr)
+{
+    SillyIOMMU *s = container_of(l2, SillyIOMMU, l2);
+    hwaddr xlat = addr ^ s->secret;
+
+    printf("l2: %" HWADDR_PRIx " -> %" HWADDR_PRIx "\n", addr, xlat);
+
+    return (IOMMUTLBEntry) {
+        .device_addr = addr & ~s->mask,
+        .translated_addr = xlat & ~s->mask,
+        .addr_mask = s->mask,
+        .perm = { true, true },
+    };
+}
+
+static MemoryRegionIOMMUOps silly_l2_iommu_ops = {
+    .translate = silly_l2_translate,
+};
+
+static MemoryRegion *silly_iommu_new(PCIBus *bus, void *opaque, int devfn)
+{
+    SillyIOMMU *s = g_new(SillyIOMMU, 1);
+    MemoryRegion *sysmem = get_system_memory();
+
+    s->mask = (0x1000 << (devfn >> 3)) - 1;
+    s->secret = ((devfn << 24) | 0x00aabbccdd) & ~s->mask;
+    memory_region_init(&s->fault, "silly-fault", INT64_MAX);
+    memory_region_init_iommu(&s->l2, &silly_l2_iommu_ops, sysmem, &s->fault,
+                             "silly-l2", INT64_MAX);
+    memory_region_init_iommu(&s->l1, &silly_l1_iommu_ops, &s->l2, &s->fault,
+                             "silly-l1", INT64_MAX);
+    return &s->l1;
+}
+
+static void silly_iommu_del(MemoryRegion *l1)
+{
+    SillyIOMMU *s = container_of(l1, SillyIOMMU, l1);
+
+    memory_region_del_subregion(&s->l2, get_system_memory());
+    memory_region_del_subregion(&s->l1, &s->l2);
+    memory_region_destroy(&s->l2);
+    memory_region_destroy(&s->l1);
+    memory_region_destroy(&s->fault);
+    g_free(s);
+}
+
 static PCIBus *i440fx_common_init(const char *device_name,
                                   PCII440FXState **pi440fx_state,
                                   int *piix3_devfn,
@@ -275,6 +351,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
     s->address_space = address_space_mem;
     b = pci_bus_new(dev, NULL, pci_address_space,
                     address_space_io, 0);
+    pci_setup_iommu(b, silly_iommu_new, silly_iommu_del, NULL);
     s->bus = b;
     object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), NULL);
     qdev_init_nofail(dev);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 3/7] memory: iommu support Avi Kivity
@ 2012-10-30 19:11   ` Blue Swirl
  2012-10-30 20:03     ` Benjamin Herrenschmidt
  2012-10-31 10:32     ` Avi Kivity
  0 siblings, 2 replies; 19+ messages in thread
From: Blue Swirl @ 2012-10-30 19:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
> Add a new memory region type that translates addresses it is given,
> then forwards them to a target address space.  This is similar to
> an alias, except that the mapping is more flexible than a linear
> translation and trucation, and also less efficient since the
> translation happens at runtime.
>
> The implementation uses an AddressSpace mapping the target region to
> avoid hierarchical dispatch all the way to the resolved region; only
> iommu regions are looked up dynamically.
>
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  exec.c   |  28 ++++++++++++++---
>  memory.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  memory.h |  45 +++++++++++++++++++++++++++
>  3 files changed, 175 insertions(+), 4 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index deee8ec..c3914fc 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3507,6 +3507,7 @@ void cpu_physical_memory_write_rom(hwaddr addr,
>
>  typedef struct {
>      void *buffer;
> +    AddressSpace *as;
>      hwaddr addr;
>      hwaddr len;
>  } BounceBuffer;
> @@ -3572,23 +3573,42 @@ void *address_space_map(AddressSpace *as,
>      ram_addr_t raddr = RAM_ADDR_MAX;
>      ram_addr_t rlen;
>      void *ret;
> +    IOMMUTLBEntry iotlb;
> +    hwaddr xlat;
> +    AddressSpace *as_xlat;
>
>      while (len > 0) {
> +        xlat = addr;
> +        as_xlat = as;
>          page = addr & TARGET_PAGE_MASK;
>          l = (page + TARGET_PAGE_SIZE) - addr;
>          if (l > len)
>              l = len;
>          section = phys_page_find(d, page >> TARGET_PAGE_BITS);
>
> +        while (section->mr->iommu_ops) {
> +            iotlb = section->mr->iommu_ops->translate(section->mr, xlat);
> +            if (iotlb.perm[is_write]) {
> +                xlat = ((iotlb.translated_addr & ~iotlb.addr_mask)
> +                        | (addr & iotlb.addr_mask));
> +                as_xlat = section->mr->iommu_target_as;
> +                l = (MIN(xlat + l - 1, xlat | iotlb.addr_mask) - xlat) + 1;
> +                section = phys_page_find(as_xlat->dispatch, xlat >> TARGET_PAGE_BITS);
> +            } else {
> +                section = &phys_sections[phys_section_unassigned];
> +            }
> +        }
> +
>          if (!(memory_region_is_ram(section->mr) && !section->readonly)) {
>              if (todo || bounce.buffer) {
>                  break;
>              }
>              bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
> -            bounce.addr = addr;
> +            bounce.addr = xlat;
> +            bounce.as = as_xlat;
>              bounce.len = l;
>              if (!is_write) {
> -                address_space_read(as, addr, bounce.buffer, l);
> +                address_space_read(as_xlat, xlat, bounce.buffer, l);
>              }
>
>              *plen = l;
> @@ -3596,7 +3616,7 @@ void *address_space_map(AddressSpace *as,
>          }
>          if (!todo) {
>              raddr = memory_region_get_ram_addr(section->mr)
> -                + memory_region_section_addr(section, addr);
> +                + memory_region_section_addr(section, xlat);
>          }
>
>          len -= l;
> @@ -3635,7 +3655,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
>          return;
>      }
>      if (is_write) {
> -        address_space_write(as, bounce.addr, bounce.buffer, access_len);
> +        address_space_write(bounce.as, bounce.addr, bounce.buffer, access_len);
>      }
>      qemu_vfree(bounce.buffer);
>      bounce.buffer = NULL;
> diff --git a/memory.c b/memory.c
> index ae3552b..ba2d4a0 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -775,6 +775,12 @@ static void memory_region_destructor_rom_device(MemoryRegion *mr)
>      qemu_ram_free(mr->ram_addr & TARGET_PAGE_MASK);
>  }
>
> +static void memory_region_destructor_iommu(MemoryRegion *mr)
> +{
> +    address_space_destroy(mr->iommu_target_as);
> +    g_free(mr->iommu_target_as);
> +}
> +
>  static bool memory_region_wrong_endianness(MemoryRegion *mr)
>  {
>  #ifdef TARGET_WORDS_BIGENDIAN
> @@ -789,6 +795,7 @@ void memory_region_init(MemoryRegion *mr,
>                          uint64_t size)
>  {
>      mr->ops = NULL;
> +    mr->iommu_ops = NULL;
>      mr->parent = NULL;
>      mr->size = int128_make64(size);
>      if (size == UINT64_MAX) {
> @@ -980,6 +987,100 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>      mr->ram_addr = qemu_ram_alloc(size, mr);
>  }
>
> +static void memory_region_iommu_rw(MemoryRegion *iommu, hwaddr addr,
> +                                   uint8_t *buf, unsigned len, bool is_write)
> +{
> +    IOMMUTLBEntry tlb;
> +    unsigned clen;
> +    hwaddr xlat;
> +
> +    while (len) {
> +        tlb = iommu->iommu_ops->translate(iommu, addr);
> +        clen = (MIN(addr | tlb.addr_mask, addr + len - 1) - addr) + 1;
> +        if (tlb.perm[is_write]) {
> +            xlat = (tlb.translated_addr & ~tlb.addr_mask) | (addr & tlb.addr_mask);
> +            address_space_rw(iommu->iommu_target_as, xlat, buf, clen, is_write);
> +        } else {
> +            if (!is_write) {
> +                memset(buf, 0xff, clen);
> +            }
> +        }
> +        buf += clen;
> +        addr += clen;
> +        len -= clen;
> +    }
> +}
> +
> +static uint64_t memory_region_iommu_read(void *opaque, hwaddr addr,
> +                                         unsigned size)
> +{
> +    MemoryRegion *iommu = opaque;
> +    union {
> +        uint8_t buf[8];
> +        uint8_t u8;
> +        uint16_t u16;
> +        uint32_t u32;
> +        uint64_t u64;
> +    } ret;
> +
> +    memory_region_iommu_rw(iommu, addr, ret.buf, size, false);
> +    switch (size) {
> +    case 1: return ret.u8;
> +    case 2: return ret.u16;
> +    case 4: return ret.u32;
> +    case 8: return ret.u64;
> +    default: abort();
> +    }
> +}
> +
> +static void memory_region_iommu_write(void *opaque, hwaddr addr,
> +                                      uint64_t data, unsigned size)
> +{
> +    MemoryRegion *iommu = opaque;
> +    union {
> +        uint8_t buf[8];
> +        uint8_t u8;
> +        uint16_t u16;
> +        uint32_t u32;
> +        uint64_t u64;
> +    } in;
> +
> +    switch (size) {
> +    case 1: in.u8 = data; break;
> +    case 2: in.u16 = data; break;
> +    case 4: in.u32 = data; break;
> +    case 8: in.u64 = data; break;
> +    default: abort();
> +    }
> +    memory_region_iommu_rw(iommu, addr, in.buf, size, true);
> +}
> +
> +static MemoryRegionOps memory_region_iommu_ops = {
> +    .read = memory_region_iommu_read,
> +    .write = memory_region_iommu_write,
> +#ifdef HOST_BIGENDIAN
> +    .endianness = DEVICE_BIG_ENDIAN,

Why couple this with host endianness? I'd expect IOMMU to operate at
target bus endianness, for example LE for PCI on PPC guest.

> +#else
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +#endif
> +};
> +
> +void memory_region_init_iommu(MemoryRegion *mr,
> +                              MemoryRegionIOMMUOps *ops,
> +                              MemoryRegion *target,
> +                              const char *name,
> +                              uint64_t size)
> +{
> +    memory_region_init(mr, name, size);
> +    mr->ops = &memory_region_iommu_ops;
> +    mr->iommu_ops = ops,
> +    mr->opaque = mr;
> +    mr->terminates = true;  /* then re-forwards */
> +    mr->destructor = memory_region_destructor_iommu;
> +    mr->iommu_target_as = g_new(AddressSpace, 1);
> +    address_space_init(mr->iommu_target_as, target);
> +}
> +
>  static uint64_t invalid_read(void *opaque, hwaddr addr,
>                               unsigned size)
>  {
> @@ -1054,6 +1155,11 @@ bool memory_region_is_rom(MemoryRegion *mr)
>      return mr->ram && mr->readonly;
>  }
>
> +bool memory_region_is_iommu(MemoryRegion *mr)
> +{
> +    return mr->iommu_ops;
> +}
> +
>  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>      uint8_t mask = 1 << client;
> diff --git a/memory.h b/memory.h
> index 9462bfd..47362c9 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -113,12 +113,28 @@ struct MemoryRegionOps {
>      const MemoryRegionMmio old_mmio;
>  };
>
> +typedef struct IOMMUTLBEntry IOMMUTLBEntry;
> +typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> +
> +struct IOMMUTLBEntry {
> +    hwaddr device_addr;
> +    hwaddr translated_addr;
> +    hwaddr addr_mask;  /* 0xfff = 4k translation */
> +    bool perm[2]; /* read/write permissions */

Please document that bit 1 is write and 0 read.

> +};
> +
> +struct MemoryRegionIOMMUOps {
> +    /* Returns a TLB entry that contains a given address. */
> +    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr);

Maybe the MemoryRegion could be const to declare that the translation
does not change mappings.

> +};
> +
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>
>  struct MemoryRegion {
>      /* All fields are private - violators will be prosecuted */
>      const MemoryRegionOps *ops;
> +    const MemoryRegionIOMMUOps *iommu_ops;
>      void *opaque;
>      MemoryRegion *parent;
>      Int128 size;
> @@ -145,6 +161,7 @@ struct MemoryRegion {
>      uint8_t dirty_log_mask;
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
> +    struct AddressSpace *iommu_target_as;
>  };
>
>  struct MemoryRegionPortio {
> @@ -334,6 +351,25 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>  void memory_region_init_reservation(MemoryRegion *mr,
>                                      const char *name,
>                                      uint64_t size);
> +
> +/**
> + * memory_region_init_iommu: Initialize a memory region that translates addresses
> + *
> + * An IOMMU region translates addresses and forwards accesses to a target memory region.
> + *
> + * @mr: the #MemoryRegion to be initialized
> + * @ops: a function that translates addresses into the @target region
> + * @target: a #MemoryRegion that will be used to satisfy accesses to translated
> + *          addresses
> + * @name: used for debugging; not visible to the user or ABI
> + * @size: size of the region.
> + */
> +void memory_region_init_iommu(MemoryRegion *mr,
> +                              MemoryRegionIOMMUOps *ops,
> +                              MemoryRegion *target,
> +                              const char *name,
> +                              uint64_t size);
> +
>  /**
>   * memory_region_destroy: Destroy a memory region and reclaim all resources.
>   *
> @@ -373,6 +409,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>  }
>
>  /**
> + * memory_region_is_ram: check whether a memory region is an iommu
> + *
> + * Returns %true is a memory region is an iommu.
> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_is_iommu(MemoryRegion *mr);
> +
> +/**
>   * memory_region_name: get a memory region's name
>   *
>   * Returns the string that was used to initialize the memory region.
> --
> 1.7.12
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults Avi Kivity
@ 2012-10-30 19:14   ` Blue Swirl
  2012-10-31 10:33     ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Blue Swirl @ 2012-10-30 19:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
> Accesses which do not translate will hit the fault region, which
> can then log the access.

Maybe special casing the fault region should be avoided, the IOMMU
could set up suitable fault regions by itself.

>
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  memory.c | 9 ++++++---
>  memory.h | 4 ++++
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index ba2d4a0..d6b46fd 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -778,7 +778,9 @@ static void memory_region_destructor_rom_device(MemoryRegion *mr)
>  static void memory_region_destructor_iommu(MemoryRegion *mr)
>  {
>      address_space_destroy(mr->iommu_target_as);
> +    address_space_destroy(mr->iommu_fault_as);
>      g_free(mr->iommu_target_as);
> +    g_free(mr->iommu_fault_as);
>  }
>
>  static bool memory_region_wrong_endianness(MemoryRegion *mr)
> @@ -1001,9 +1003,7 @@ static void memory_region_iommu_rw(MemoryRegion *iommu, hwaddr addr,
>              xlat = (tlb.translated_addr & ~tlb.addr_mask) | (addr & tlb.addr_mask);
>              address_space_rw(iommu->iommu_target_as, xlat, buf, clen, is_write);
>          } else {
> -            if (!is_write) {
> -                memset(buf, 0xff, clen);
> -            }
> +            address_space_rw(iommu->iommu_fault_as, addr, buf, clen, is_write);
>          }
>          buf += clen;
>          addr += clen;
> @@ -1068,6 +1068,7 @@ static void memory_region_iommu_write(void *opaque, hwaddr addr,
>  void memory_region_init_iommu(MemoryRegion *mr,
>                                MemoryRegionIOMMUOps *ops,
>                                MemoryRegion *target,
> +                              MemoryRegion *fault,
>                                const char *name,
>                                uint64_t size)
>  {
> @@ -1079,6 +1080,8 @@ void memory_region_init_iommu(MemoryRegion *mr,
>      mr->destructor = memory_region_destructor_iommu;
>      mr->iommu_target_as = g_new(AddressSpace, 1);
>      address_space_init(mr->iommu_target_as, target);
> +    mr->iommu_fault_as = g_new(AddressSpace, 1);
> +    address_space_init(mr->iommu_fault_as, fault);
>  }
>
>  static uint64_t invalid_read(void *opaque, hwaddr addr,
> diff --git a/memory.h b/memory.h
> index 47362c9..6312197 100644
> --- a/memory.h
> +++ b/memory.h
> @@ -162,6 +162,7 @@ struct MemoryRegion {
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
>      struct AddressSpace *iommu_target_as;
> +    struct AddressSpace *iommu_fault_as;
>  };
>
>  struct MemoryRegionPortio {
> @@ -361,12 +362,15 @@ void memory_region_init_reservation(MemoryRegion *mr,
>   * @ops: a function that translates addresses into the @target region
>   * @target: a #MemoryRegion that will be used to satisfy accesses to translated
>   *          addresses
> + * @fault: a #MemoryRegion for servicing failed translations; accessed with
> + *         untranslated addresses
>   * @name: used for debugging; not visible to the user or ABI
>   * @size: size of the region.
>   */
>  void memory_region_init_iommu(MemoryRegion *mr,
>                                MemoryRegionIOMMUOps *ops,
>                                MemoryRegion *target,
> +                              MemoryRegion *fault,
>                                const char *name,
>                                uint64_t size);
>
> --
> 1.7.12
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu
  2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu Avi Kivity
@ 2012-10-30 19:18   ` Blue Swirl
  2012-10-31 10:34     ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Blue Swirl @ 2012-10-30 19:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
> This iommu encrypts addresses on the device bus to avoid divuling information
> to hackers equipped with bus analyzers.  Following 3DES, addresses are encrypted
> multiple times.  A XOR cypher is employed for efficiency.

If this is not useful as a test device or other purposes, I'd drop it.

>
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  hw/piix_pci.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
>
> diff --git a/hw/piix_pci.c b/hw/piix_pci.c
> index 9af5847..99601f4 100644
> --- a/hw/piix_pci.c
> +++ b/hw/piix_pci.c
> @@ -30,6 +30,7 @@
>  #include "sysbus.h"
>  #include "range.h"
>  #include "xen.h"
> +#include "exec-memory.h"
>
>  /*
>   * I440FX chipset data sheet.
> @@ -248,6 +249,81 @@ static int i440fx_initfn(PCIDevice *dev)
>      return 0;
>  }
>
> +typedef struct SillyIOMMU SillyIOMMU;
> +
> +struct SillyIOMMU {
> +    MemoryRegion fault;
> +    MemoryRegion l1;
> +    MemoryRegion l2;
> +    hwaddr mask;
> +    hwaddr secret;
> +};
> +
> +static IOMMUTLBEntry silly_l1_translate(MemoryRegion *l1, hwaddr addr)
> +{
> +    SillyIOMMU *s = container_of(l1, SillyIOMMU, l1);
> +    hwaddr xlat = addr ^ s->secret;
> +
> +    printf("l1: %" HWADDR_PRIx " -> %" HWADDR_PRIx "\n", addr, xlat);
> +
> +    return (IOMMUTLBEntry) {
> +        .device_addr = addr & ~s->mask,
> +        .translated_addr = xlat & ~s->mask,
> +        .addr_mask = s->mask,
> +        .perm = { true, true },
> +    };
> +}
> +
> +static MemoryRegionIOMMUOps silly_l1_iommu_ops = {
> +    .translate = silly_l1_translate,
> +};
> +
> +static IOMMUTLBEntry silly_l2_translate(MemoryRegion *l2, hwaddr addr)
> +{
> +    SillyIOMMU *s = container_of(l2, SillyIOMMU, l2);
> +    hwaddr xlat = addr ^ s->secret;
> +
> +    printf("l2: %" HWADDR_PRIx " -> %" HWADDR_PRIx "\n", addr, xlat);
> +
> +    return (IOMMUTLBEntry) {
> +        .device_addr = addr & ~s->mask,
> +        .translated_addr = xlat & ~s->mask,
> +        .addr_mask = s->mask,
> +        .perm = { true, true },
> +    };
> +}
> +
> +static MemoryRegionIOMMUOps silly_l2_iommu_ops = {
> +    .translate = silly_l2_translate,
> +};
> +
> +static MemoryRegion *silly_iommu_new(PCIBus *bus, void *opaque, int devfn)
> +{
> +    SillyIOMMU *s = g_new(SillyIOMMU, 1);
> +    MemoryRegion *sysmem = get_system_memory();
> +
> +    s->mask = (0x1000 << (devfn >> 3)) - 1;
> +    s->secret = ((devfn << 24) | 0x00aabbccdd) & ~s->mask;
> +    memory_region_init(&s->fault, "silly-fault", INT64_MAX);
> +    memory_region_init_iommu(&s->l2, &silly_l2_iommu_ops, sysmem, &s->fault,
> +                             "silly-l2", INT64_MAX);
> +    memory_region_init_iommu(&s->l1, &silly_l1_iommu_ops, &s->l2, &s->fault,
> +                             "silly-l1", INT64_MAX);
> +    return &s->l1;
> +}
> +
> +static void silly_iommu_del(MemoryRegion *l1)
> +{
> +    SillyIOMMU *s = container_of(l1, SillyIOMMU, l1);
> +
> +    memory_region_del_subregion(&s->l2, get_system_memory());
> +    memory_region_del_subregion(&s->l1, &s->l2);
> +    memory_region_destroy(&s->l2);
> +    memory_region_destroy(&s->l1);
> +    memory_region_destroy(&s->fault);
> +    g_free(s);
> +}
> +
>  static PCIBus *i440fx_common_init(const char *device_name,
>                                    PCII440FXState **pi440fx_state,
>                                    int *piix3_devfn,
> @@ -275,6 +351,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
>      s->address_space = address_space_mem;
>      b = pci_bus_new(dev, NULL, pci_address_space,
>                      address_space_io, 0);
> +    pci_setup_iommu(b, silly_iommu_new, silly_iommu_del, NULL);
>      s->bus = b;
>      object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), NULL);
>      qdev_init_nofail(dev);
> --
> 1.7.12
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-30 19:11   ` Blue Swirl
@ 2012-10-30 20:03     ` Benjamin Herrenschmidt
  2012-10-30 21:13       ` Blue Swirl
  2012-10-31 10:32     ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-30 20:03 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Michael S. Tsirkin, Alexander Graf, qemu-devel, Alex Williamson,
	Avi Kivity, Anthony Liguori

On Tue, 2012-10-30 at 19:11 +0000, Blue Swirl wrote:

> Why couple this with host endianness? I'd expect IOMMU to operate at
> target bus endianness, for example LE for PCI on PPC guest.

I'm not sure about putting the iommu "in charge" of endianness ...

On one hand it's fishy. It should be 'transparent', the device is what
controls its own endianness and playing games at the iommu level is
going to result into tears... on the other hand I can see the appeal of
not bothering at the device level and letting the iommu do it for you...
but I still think it's risky.

Besides not all PCI devices are little endian :-) Also that won't deal
with map/unmap etc...

So I'd vote for sanity here and make the iommu not affect endianness in
any way and let the devices do the "right thing" as is the expectation
today.

Ben.

> > +#else
> > +    .endianness = DEVICE_LITTLE_ENDIAN,
> > +#endif
> > +};
> > +
> > +void memory_region_init_iommu(MemoryRegion *mr,
> > +                              MemoryRegionIOMMUOps *ops,
> > +                              MemoryRegion *target,
> > +                              const char *name,
> > +                              uint64_t size)
> > +{
> > +    memory_region_init(mr, name, size);
> > +    mr->ops = &memory_region_iommu_ops;
> > +    mr->iommu_ops = ops,
> > +    mr->opaque = mr;
> > +    mr->terminates = true;  /* then re-forwards */
> > +    mr->destructor = memory_region_destructor_iommu;
> > +    mr->iommu_target_as = g_new(AddressSpace, 1);
> > +    address_space_init(mr->iommu_target_as, target);
> > +}
> > +
> >  static uint64_t invalid_read(void *opaque, hwaddr addr,
> >                               unsigned size)
> >  {
> > @@ -1054,6 +1155,11 @@ bool memory_region_is_rom(MemoryRegion *mr)
> >      return mr->ram && mr->readonly;
> >  }
> >
> > +bool memory_region_is_iommu(MemoryRegion *mr)
> > +{
> > +    return mr->iommu_ops;
> > +}
> > +
> >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
> >  {
> >      uint8_t mask = 1 << client;
> > diff --git a/memory.h b/memory.h
> > index 9462bfd..47362c9 100644
> > --- a/memory.h
> > +++ b/memory.h
> > @@ -113,12 +113,28 @@ struct MemoryRegionOps {
> >      const MemoryRegionMmio old_mmio;
> >  };
> >
> > +typedef struct IOMMUTLBEntry IOMMUTLBEntry;
> > +typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
> > +
> > +struct IOMMUTLBEntry {
> > +    hwaddr device_addr;
> > +    hwaddr translated_addr;
> > +    hwaddr addr_mask;  /* 0xfff = 4k translation */
> > +    bool perm[2]; /* read/write permissions */
> 
> Please document that bit 1 is write and 0 read.
> 
> > +};
> > +
> > +struct MemoryRegionIOMMUOps {
> > +    /* Returns a TLB entry that contains a given address. */
> > +    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr);
> 
> Maybe the MemoryRegion could be const to declare that the translation
> does not change mappings.
> 
> > +};
> > +
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> >
> >  struct MemoryRegion {
> >      /* All fields are private - violators will be prosecuted */
> >      const MemoryRegionOps *ops;
> > +    const MemoryRegionIOMMUOps *iommu_ops;
> >      void *opaque;
> >      MemoryRegion *parent;
> >      Int128 size;
> > @@ -145,6 +161,7 @@ struct MemoryRegion {
> >      uint8_t dirty_log_mask;
> >      unsigned ioeventfd_nb;
> >      MemoryRegionIoeventfd *ioeventfds;
> > +    struct AddressSpace *iommu_target_as;
> >  };
> >
> >  struct MemoryRegionPortio {
> > @@ -334,6 +351,25 @@ void memory_region_init_rom_device(MemoryRegion *mr,
> >  void memory_region_init_reservation(MemoryRegion *mr,
> >                                      const char *name,
> >                                      uint64_t size);
> > +
> > +/**
> > + * memory_region_init_iommu: Initialize a memory region that translates addresses
> > + *
> > + * An IOMMU region translates addresses and forwards accesses to a target memory region.
> > + *
> > + * @mr: the #MemoryRegion to be initialized
> > + * @ops: a function that translates addresses into the @target region
> > + * @target: a #MemoryRegion that will be used to satisfy accesses to translated
> > + *          addresses
> > + * @name: used for debugging; not visible to the user or ABI
> > + * @size: size of the region.
> > + */
> > +void memory_region_init_iommu(MemoryRegion *mr,
> > +                              MemoryRegionIOMMUOps *ops,
> > +                              MemoryRegion *target,
> > +                              const char *name,
> > +                              uint64_t size);
> > +
> >  /**
> >   * memory_region_destroy: Destroy a memory region and reclaim all resources.
> >   *
> > @@ -373,6 +409,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
> >  }
> >
> >  /**
> > + * memory_region_is_ram: check whether a memory region is an iommu
> > + *
> > + * Returns %true is a memory region is an iommu.
> > + *
> > + * @mr: the memory region being queried
> > + */
> > +bool memory_region_is_iommu(MemoryRegion *mr);
> > +
> > +/**
> >   * memory_region_name: get a memory region's name
> >   *
> >   * Returns the string that was used to initialize the memory region.
> > --
> > 1.7.12
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-30 20:03     ` Benjamin Herrenschmidt
@ 2012-10-30 21:13       ` Blue Swirl
  0 siblings, 0 replies; 19+ messages in thread
From: Blue Swirl @ 2012-10-30 21:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, Alexander Graf, qemu-devel, Alex Williamson,
	Avi Kivity, Anthony Liguori

On Tue, Oct 30, 2012 at 8:03 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2012-10-30 at 19:11 +0000, Blue Swirl wrote:
>
>> Why couple this with host endianness? I'd expect IOMMU to operate at
>> target bus endianness, for example LE for PCI on PPC guest.
>
> I'm not sure about putting the iommu "in charge" of endianness ...
>
> On one hand it's fishy. It should be 'transparent', the device is what
> controls its own endianness and playing games at the iommu level is
> going to result into tears... on the other hand I can see the appeal of
> not bothering at the device level and letting the iommu do it for you...
> but I still think it's risky.
>
> Besides not all PCI devices are little endian :-) Also that won't deal
> with map/unmap etc...
>
> So I'd vote for sanity here and make the iommu not affect endianness in
> any way and let the devices do the "right thing" as is the expectation
> today.

Yes. Does the IOMMU even need to touch the data bus at all, because it
only translates address bus bits (ignoring caches/TLBs)?

>
> Ben.
>
>> > +#else
>> > +    .endianness = DEVICE_LITTLE_ENDIAN,
>> > +#endif
>> > +};
>> > +
>> > +void memory_region_init_iommu(MemoryRegion *mr,
>> > +                              MemoryRegionIOMMUOps *ops,
>> > +                              MemoryRegion *target,
>> > +                              const char *name,
>> > +                              uint64_t size)
>> > +{
>> > +    memory_region_init(mr, name, size);
>> > +    mr->ops = &memory_region_iommu_ops;
>> > +    mr->iommu_ops = ops,
>> > +    mr->opaque = mr;
>> > +    mr->terminates = true;  /* then re-forwards */
>> > +    mr->destructor = memory_region_destructor_iommu;
>> > +    mr->iommu_target_as = g_new(AddressSpace, 1);
>> > +    address_space_init(mr->iommu_target_as, target);
>> > +}
>> > +
>> >  static uint64_t invalid_read(void *opaque, hwaddr addr,
>> >                               unsigned size)
>> >  {
>> > @@ -1054,6 +1155,11 @@ bool memory_region_is_rom(MemoryRegion *mr)
>> >      return mr->ram && mr->readonly;
>> >  }
>> >
>> > +bool memory_region_is_iommu(MemoryRegion *mr)
>> > +{
>> > +    return mr->iommu_ops;
>> > +}
>> > +
>> >  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>> >  {
>> >      uint8_t mask = 1 << client;
>> > diff --git a/memory.h b/memory.h
>> > index 9462bfd..47362c9 100644
>> > --- a/memory.h
>> > +++ b/memory.h
>> > @@ -113,12 +113,28 @@ struct MemoryRegionOps {
>> >      const MemoryRegionMmio old_mmio;
>> >  };
>> >
>> > +typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>> > +typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
>> > +
>> > +struct IOMMUTLBEntry {
>> > +    hwaddr device_addr;
>> > +    hwaddr translated_addr;
>> > +    hwaddr addr_mask;  /* 0xfff = 4k translation */
>> > +    bool perm[2]; /* read/write permissions */
>>
>> Please document that bit 1 is write and 0 read.
>>
>> > +};
>> > +
>> > +struct MemoryRegionIOMMUOps {
>> > +    /* Returns a TLB entry that contains a given address. */
>> > +    IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr);
>>
>> Maybe the MemoryRegion could be const to declare that the translation
>> does not change mappings.
>>
>> > +};
>> > +
>> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>> >  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>> >
>> >  struct MemoryRegion {
>> >      /* All fields are private - violators will be prosecuted */
>> >      const MemoryRegionOps *ops;
>> > +    const MemoryRegionIOMMUOps *iommu_ops;
>> >      void *opaque;
>> >      MemoryRegion *parent;
>> >      Int128 size;
>> > @@ -145,6 +161,7 @@ struct MemoryRegion {
>> >      uint8_t dirty_log_mask;
>> >      unsigned ioeventfd_nb;
>> >      MemoryRegionIoeventfd *ioeventfds;
>> > +    struct AddressSpace *iommu_target_as;
>> >  };
>> >
>> >  struct MemoryRegionPortio {
>> > @@ -334,6 +351,25 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>> >  void memory_region_init_reservation(MemoryRegion *mr,
>> >                                      const char *name,
>> >                                      uint64_t size);
>> > +
>> > +/**
>> > + * memory_region_init_iommu: Initialize a memory region that translates addresses
>> > + *
>> > + * An IOMMU region translates addresses and forwards accesses to a target memory region.
>> > + *
>> > + * @mr: the #MemoryRegion to be initialized
>> > + * @ops: a function that translates addresses into the @target region
>> > + * @target: a #MemoryRegion that will be used to satisfy accesses to translated
>> > + *          addresses
>> > + * @name: used for debugging; not visible to the user or ABI
>> > + * @size: size of the region.
>> > + */
>> > +void memory_region_init_iommu(MemoryRegion *mr,
>> > +                              MemoryRegionIOMMUOps *ops,
>> > +                              MemoryRegion *target,
>> > +                              const char *name,
>> > +                              uint64_t size);
>> > +
>> >  /**
>> >   * memory_region_destroy: Destroy a memory region and reclaim all resources.
>> >   *
>> > @@ -373,6 +409,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>> >  }
>> >
>> >  /**
>> > + * memory_region_is_ram: check whether a memory region is an iommu
>> > + *
>> > + * Returns %true is a memory region is an iommu.
>> > + *
>> > + * @mr: the memory region being queried
>> > + */
>> > +bool memory_region_is_iommu(MemoryRegion *mr);
>> > +
>> > +/**
>> >   * memory_region_name: get a memory region's name
>> >   *
>> >   * Returns the string that was used to initialize the memory region.
>> > --
>> > 1.7.12
>> >
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-30 19:11   ` Blue Swirl
  2012-10-30 20:03     ` Benjamin Herrenschmidt
@ 2012-10-31 10:32     ` Avi Kivity
  2012-10-31 18:59       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2012-10-31 10:32 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On 10/30/2012 09:11 PM, Blue Swirl wrote:
> On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
>> Add a new memory region type that translates addresses it is given,
>> then forwards them to a target address space.  This is similar to
>> an alias, except that the mapping is more flexible than a linear
>> translation and trucation, and also less efficient since the
>> translation happens at runtime.
>>
>> The implementation uses an AddressSpace mapping the target region to
>> avoid hierarchical dispatch all the way to the resolved region; only
>> iommu regions are looked up dynamically.
>>
>> +
>> +static MemoryRegionOps memory_region_iommu_ops = {
>> +    .read = memory_region_iommu_read,
>> +    .write = memory_region_iommu_write,
>> +#ifdef HOST_BIGENDIAN
>> +    .endianness = DEVICE_BIG_ENDIAN,
> 
> Why couple this with host endianness? I'd expect IOMMU to operate at
> target bus endianness, for example LE for PCI on PPC guest.

This has nothing to do with device endianness; we're translating from a
byte buffer (address_space_rw()) to a uint64_t
(MemoryRegionOps::read/write()) so we need to take host endianess into
account.

This code cleverly makes use of memory core endian support to do the
conversion for us.  But it's probably too clever and should be replaced
by an explicit call to a helper.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults
  2012-10-30 19:14   ` Blue Swirl
@ 2012-10-31 10:33     ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-31 10:33 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On 10/30/2012 09:14 PM, Blue Swirl wrote:
> On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
>> Accesses which do not translate will hit the fault region, which
>> can then log the access.
> 
> Maybe special casing the fault region should be avoided, the IOMMU
> could set up suitable fault regions by itself.

How can it?  Please elaborate.

I'm not too pleased with using a MemoryRegion for faults, perhaps a
MemoryRegionIOMMUOps::fault() callback would be better.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu
  2012-10-30 19:18   ` Blue Swirl
@ 2012-10-31 10:34     ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-10-31 10:34 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Alex Williamson,
	Anthony Liguori

On 10/30/2012 09:18 PM, Blue Swirl wrote:
> On Tue, Oct 30, 2012 at 11:47 AM, Avi Kivity <avi@redhat.com> wrote:
>> This iommu encrypts addresses on the device bus to avoid divuling information
>> to hackers equipped with bus analyzers.  Following 3DES, addresses are encrypted
>> multiple times.  A XOR cypher is employed for efficiency.
> 
> If this is not useful as a test device or other purposes, I'd drop it.

It's just for demonstrating the API.  I won't propose it for merging.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-31 10:32     ` Avi Kivity
@ 2012-10-31 18:59       ` Benjamin Herrenschmidt
  2012-11-01 13:44         ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2012-10-31 18:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Blue Swirl,
	Alex Williamson, Anthony Liguori

On Wed, 2012-10-31 at 12:32 +0200, Avi Kivity wrote:
> This has nothing to do with device endianness; we're translating from a
> byte buffer (address_space_rw()) to a uint64_t
> (MemoryRegionOps::read/write()) so we need to take host endianess into
> account.
> 
> This code cleverly makes use of memory core endian support to do the
> conversion for us.  But it's probably too clever and should be replaced
> by an explicit call to a helper.

Well, the whole idea of providing sized-type accessors (u8/u16/...) is
very fishy for DMA. I understand it comes from the MemoryRegion ops, and
It made somewhat sense in CPU to device (though even then endianness had
always gotten wrong) but for DMA it's going to be a can of worms.

Taking "host endianness" into account means nothing if you don't define
what endianness those are expected to use to write to memory. If you add
yet another case of "guest endianness" in the mix, then you make yet
another mistake akin to the virtio trainwreck since things like powerpc
or ARM can operate in different endianness.

The best thing to do is to forbid use of those functions :-) And
basically mandate the use of endian explicit accessor that specifically
indicate what endianness the value should have once written in target
memory. The most sensible expectation when using the "raw" Memory ops is
to have the value go "as is" ie in host endianness.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-10-31 18:59       ` Benjamin Herrenschmidt
@ 2012-11-01 13:44         ` Avi Kivity
  2012-11-01 13:45           ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2012-11-01 13:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Blue Swirl,
	Alex Williamson, Anthony Liguori

On 10/31/2012 08:59 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2012-10-31 at 12:32 +0200, Avi Kivity wrote:
>> This has nothing to do with device endianness; we're translating from a
>> byte buffer (address_space_rw()) to a uint64_t
>> (MemoryRegionOps::read/write()) so we need to take host endianess into
>> account.
>> 
>> This code cleverly makes use of memory core endian support to do the
>> conversion for us.  But it's probably too clever and should be replaced
>> by an explicit call to a helper.
> 
> Well, the whole idea of providing sized-type accessors (u8/u16/...) is
> very fishy for DMA. I understand it comes from the MemoryRegion ops, and
> It made somewhat sense in CPU to device (though even then endianness had
> always gotten wrong) but for DMA it's going to be a can of worms.

Endianness here has no effect on the result.  An address_space_rw()
causes a lookup of a memory region, which happens to be an iommu memory
region.  Because of the way MemoryRegionOps are done, it is converted to
a 1/2/4/8 accessor, and then converted immediately back to a byte array.
 As long as we're consistent there's no change to the data path.

However we do have a problem with non-1/2/4/8 byte writes.  Right now
any mismatched access ends up as an 8 byte write, we need an extra
accessor for arbitrary writes, or rather better use of the .impl members
of MemoryRegionOps.

> 
> Taking "host endianness" into account means nothing if you don't define
> what endianness those are expected to use to write to memory. If you add
> yet another case of "guest endianness" in the mix, then you make yet
> another mistake akin to the virtio trainwreck since things like powerpc
> or ARM can operate in different endianness.
> 
> The best thing to do is to forbid use of those functions :-) And
> basically mandate the use of endian explicit accessor that specifically
> indicate what endianness the value should have once written in target
> memory. The most sensible expectation when using the "raw" Memory ops is
> to have the value go "as is" ie in host endianness.
-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/7] memory: iommu support
  2012-11-01 13:44         ` Avi Kivity
@ 2012-11-01 13:45           ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2012-11-01 13:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael S. Tsirkin, qemu-devel, Alexander Graf, Blue Swirl,
	Alex Williamson, Anthony Liguori

On 11/01/2012 03:44 PM, Avi Kivity wrote:
> 
> However we do have a problem with non-1/2/4/8 byte writes.  Right now
> any mismatched access ends up as an 8 byte write, we need an extra
> accessor for arbitrary writes, or rather better use of the .impl members
> of MemoryRegionOps.

Sorry, it's converted into a series of 8-bit writes.  So the code is
correct now, if inefficient.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-11-01 13:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-30 11:47 [Qemu-devel] [PATCH v2 0/7] IOMMU support Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 1/7] memory: fix address space initialization/destruction Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 2/7] memory: limit sections in the radix tree to the actual address space size Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 3/7] memory: iommu support Avi Kivity
2012-10-30 19:11   ` Blue Swirl
2012-10-30 20:03     ` Benjamin Herrenschmidt
2012-10-30 21:13       ` Blue Swirl
2012-10-31 10:32     ` Avi Kivity
2012-10-31 18:59       ` Benjamin Herrenschmidt
2012-11-01 13:44         ` Avi Kivity
2012-11-01 13:45           ` Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 4/7] memory: provide a MemoryRegion for IOMMUs to log faults Avi Kivity
2012-10-30 19:14   ` Blue Swirl
2012-10-31 10:33     ` Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 5/7] pci: use memory core for iommu support Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 6/7] vfio: abort if an emulated iommu is used Avi Kivity
2012-10-30 11:47 ` [Qemu-devel] [PATCH v2 7/7] i440fx: add an iommu Avi Kivity
2012-10-30 19:18   ` Blue Swirl
2012-10-31 10:34     ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.