All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset
@ 2014-07-22 15:47 Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Le Tan @ 2014-07-22 15:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Knut Omang, Le Tan, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

Hi,

These patches are intended to introduce Intel IOMMU (VT-d) emulation to q35
chipset. The major job in these patches is to add support for emulating Intel
IOMMU according to the VT-d specification, including basic responses to CSRs
accesses, the logic of DMAR (DMA remapping) and DMA memory address
translations.

Features implemented for now are:
1. Response to important CSRs accesses;
2. DMAR (DMA remapping) without PASID support;
3. Use register-based invalidation for IOTLB and context cache invalidation;
4. Add DMAR table to ACPI tables to expose VT-d to BIOS;
5. Add "-machine vtd=on|off" option to enable/disable VT-d;
6. Only one DMAR unit for all the devices of PCI Segment 0.

Testing:
1. L1 guest with Linux with intel_iommu=on can interact with VT-d and boot
smoothly, and I can see info about VT-d in log of kernel;
2. Run L1 with VT-d, L2 guest with Linux can boot smoothly withou PCI device
passthrough;
3. Run L1 with VT-d and "-soundhw ac97 (QEMU_AUDIO_DRV=none)", then assign the
sound card to L2; L2 can boot smoothly with legacy PCI assignment;
4. Jailhouse hypervisor seems to run smoothly for now (tested by Jan).
5. Run L1 with VT-d and e1000 network card, then assign e1000 to L2; L2 will be
STUCK when booting. This still remains unsolved now. As far as I know, I suppose
that the L2 crashes when doing e1000_probe(). The QEMU of L1 will dump
something with "KVM: entry failed, hardware error 0x0", and the KVM of host
will print "nested_vmx_exit_handled failed vm entry 7". Unlike assigning the
sound card, after being assigned to L2, there is no translation entry of e1000
through VT-d, which I think means that e1000 doesn't issue any DMA access during
the boot of L2. Sometimes the kernel of L2 will print "divide error" during
booting. Can someone help me with this? Any help is appreciated! :)
6. VFIO has not been tested yet and I will do this soon.

I have some questions want to consult here:
1. Now the struct IntelIOMMUState is a member of MCHPCIState. VT-d is registered
as TYPE_SYS_BUS_DEVICE but registers its configuration MemoryRegion as subregion
of mch->pci_address_space. Is this correct? Another thought comes to my mind is
using sysbus_mmio_map() to map the MemoryRegion of VT-d. But I am not sure. And
maybe there are more improper usage of the QOM.
2. For declaration of porinter of pointer, like VTDAddressSpace **address_spaces,
checkpatch.pl will warn that "ERROR: need consistent spacing around '*' (ctx:WxO)".
Is checkpatch.pl wrong? Then what is the proper declaration?

TODO:
1. Fix the bug of legacy PCI assignment;
2. Test VFIO;
3. Queued Invalidation;
4. Basic fault reporting;
5. Caching propertities of IOTLB;
6. Clear up codes related to migration.

Thanks very much!

Git trees:
https://github.com/tamlok/qemu/commits/q35-iommu-v2

Le Tan (3):
  intel-iommu: introduce Intel IOMMU (VT-d) emulation
  intel-iommu: add DMAR table to ACPI tables
  intel-iommu: add Intel IOMMU emulation to q35 and add a machine option
    "vtd" as a switch

 hw/core/machine.c             |   27 +-
 hw/i386/Makefile.objs         |    1 +
 hw/i386/acpi-build.c          |   41 ++
 hw/i386/acpi-defs.h           |   70 +++
 hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
 hw/pci-host/q35.c             |   72 ++-
 include/hw/boards.h           |    1 +
 include/hw/i386/intel_iommu.h |  350 +++++++++++++
 include/hw/pci-host/q35.h     |    2 +
 qemu-options.hx               |    5 +-
 vl.c                          |    4 +
 11 files changed, 1703 insertions(+), 9 deletions(-)
 create mode 100644 hw/i386/intel_iommu.c
 create mode 100644 include/hw/i386/intel_iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-22 15:47 [Qemu-devel] [PATCH 0/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset Le Tan
@ 2014-07-22 15:47 ` Le Tan
  2014-07-22 20:05   ` Michael S. Tsirkin
                     ` (2 more replies)
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 2/3] intel-iommu: add DMAR table to ACPI tables Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch Le Tan
  2 siblings, 3 replies; 12+ messages in thread
From: Le Tan @ 2014-07-22 15:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Knut Omang, Le Tan, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

Add support for emulating Intel IOMMU according to the VT-d specification for
the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
PASID support. Use register-based invalidation for context-cache invalidation
and IOTLB invalidation.
Basic fault reporting and caching are not implemented yet.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
---
 hw/i386/Makefile.objs         |    1 +
 hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/intel_iommu.h |  350 +++++++++++++
 3 files changed, 1490 insertions(+)
 create mode 100644 hw/i386/intel_iommu.c
 create mode 100644 include/hw/i386/intel_iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 48014ab..6936111 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o smbios.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
+obj-y += intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
new file mode 100644
index 0000000..3ba0e1e
--- /dev/null
+++ b/hw/i386/intel_iommu.c
@@ -0,0 +1,1139 @@
+/*
+ * QEMU emulation of an Intel IOMMU (VT-d)
+ *   (DMA Remapping device)
+ *
+ * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
+ * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "hw/sysbus.h"
+#include "exec/address-spaces.h"
+#include "hw/i386/intel_iommu.h"
+
+/* #define DEBUG_INTEL_IOMMU */
+#ifdef DEBUG_INTEL_IOMMU
+#define D(fmt, ...) \
+    do { fprintf(stderr, "(vtd)%s: " fmt "\n", __func__, \
+                 ## __VA_ARGS__); } while (0)
+#else
+#define D(fmt, ...) \
+    do { } while (0)
+#endif
+
+
+static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
+                        uint64_t wmask, uint64_t w1cmask)
+{
+    *((uint64_t *)&s->csr[addr]) = val;
+    *((uint64_t *)&s->wmask[addr]) = wmask;
+    *((uint64_t *)&s->w1cmask[addr]) = w1cmask;
+}
+
+static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
+                                  uint64_t mask)
+{
+    *((uint64_t *)&s->womask[addr]) = mask;
+}
+
+static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
+                        uint32_t wmask, uint32_t w1cmask)
+{
+    *((uint32_t *)&s->csr[addr]) = val;
+    *((uint32_t *)&s->wmask[addr]) = wmask;
+    *((uint32_t *)&s->w1cmask[addr]) = w1cmask;
+}
+
+static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
+                                  uint32_t mask)
+{
+    *((uint32_t *)&s->womask[addr]) = mask;
+}
+
+/* "External" get/set operations */
+static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
+{
+    uint64_t oldval = *((uint64_t *)&s->csr[addr]);
+    uint64_t wmask = *((uint64_t *)&s->wmask[addr]);
+    uint64_t w1cmask = *((uint64_t *)&s->w1cmask[addr]);
+    *((uint64_t *)&s->csr[addr]) =
+        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
+}
+
+static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
+{
+    uint32_t oldval = *((uint32_t *)&s->csr[addr]);
+    uint32_t wmask = *((uint32_t *)&s->wmask[addr]);
+    uint32_t w1cmask = *((uint32_t *)&s->w1cmask[addr]);
+    *((uint32_t *)&s->csr[addr]) =
+        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
+}
+
+static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
+{
+    uint64_t val = *((uint64_t *)&s->csr[addr]);
+    uint64_t womask = *((uint64_t *)&s->womask[addr]);
+    return val & ~womask;
+}
+
+
+static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
+{
+    uint32_t val = *((uint32_t *)&s->csr[addr]);
+    uint32_t womask = *((uint32_t *)&s->womask[addr]);
+    return val & ~womask;
+}
+
+
+
+/* "Internal" get/set operations */
+static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)
+{
+    return *((uint64_t *)&s->csr[addr]);
+}
+
+static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)
+{
+    return *((uint32_t *)&s->csr[addr]);
+}
+
+
+/* val = (val & ~clear) | mask */
+static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,
+                                     uint32_t clear, uint32_t mask)
+{
+    uint32_t *ptr = (uint32_t *)&s->csr[addr];
+    uint32_t val = (*ptr & ~clear) | mask;
+    *ptr = val;
+    return val;
+}
+
+/* val = (val & ~clear) | mask */
+static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,
+                                     uint64_t clear, uint64_t mask)
+{
+    uint64_t *ptr = (uint64_t *)&s->csr[addr];
+    uint64_t val = (*ptr & ~clear) | mask;
+    *ptr = val;
+    return val;
+}
+
+
+
+
+
+static inline bool root_entry_present(vtd_root_entry *root)
+{
+    return root->val & ROOT_ENTRY_P;
+}
+
+
+static bool get_root_entry(IntelIOMMUState *s, int index, vtd_root_entry *re)
+{
+    dma_addr_t addr;
+
+    assert(index >= 0 && index < ROOT_ENTRY_NR);
+
+    addr = s->root + index * sizeof(*re);
+    if (dma_memory_read(&address_space_memory, addr, re, sizeof(*re))) {
+        fprintf(stderr, "(vtd)error: fail to read root table\n");
+        return false;
+    }
+    re->val = le64_to_cpu(re->val);
+    return true;
+}
+
+
+static inline bool context_entry_present(vtd_context_entry *context)
+{
+    return context->lo & CONTEXT_ENTRY_P;
+}
+
+static bool get_context_entry_from_root(vtd_root_entry *root, int index,
+                                        vtd_context_entry *ce)
+{
+    dma_addr_t addr;
+
+    if (!root_entry_present(root)) {
+        ce->lo = 0;
+        ce->hi = 0;
+        return false;
+    }
+
+    assert(index >= 0 && index < CONTEXT_ENTRY_NR);
+
+    addr = (root->val & ROOT_ENTRY_CTP) + index * sizeof(*ce);
+    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
+        fprintf(stderr, "(vtd)error: fail to read context table\n");
+        return false;
+    }
+    ce->lo = le64_to_cpu(ce->lo);
+    ce->hi = le64_to_cpu(ce->hi);
+    return true;
+}
+
+static inline dma_addr_t get_slpt_base_from_context(vtd_context_entry *ce)
+{
+    return ce->lo & CONTEXT_ENTRY_SLPTPTR;
+}
+
+
+/* The shift of an addr for a certain level of paging structure */
+static inline int slpt_level_shift(int level)
+{
+    return VTD_PAGE_SHIFT_4K + (level - 1) * SL_LEVEL_BITS;
+}
+
+static inline bool slpte_present(uint64_t slpte)
+{
+    return slpte & 3;
+}
+
+/* Calculate the GPA given the base address, the index in the page table and
+ * the level of this page table.
+ */
+static inline uint64_t get_slpt_gpa(uint64_t addr, int index, int level)
+{
+    return addr + (((uint64_t)index) << slpt_level_shift(level));
+}
+
+static inline uint64_t get_slpte_addr(uint64_t slpte)
+{
+    return slpte & SL_PT_BASE_ADDR_MASK;
+}
+
+/* Whether the pte points to a large page */
+static inline bool is_large_pte(uint64_t pte)
+{
+    return pte & SL_PT_PAGE_SIZE_MASK;
+}
+
+/* Whether the pte indicates the address of the page frame */
+static inline bool is_last_slpte(uint64_t slpte, int level)
+{
+    if (level == SL_PT_LEVEL) {
+        return true;
+    }
+    if (is_large_pte(slpte)) {
+        return true;
+    }
+    return false;
+}
+
+/* Get the content of a spte located in @base_addr[@index] */
+static inline uint64_t get_slpte(dma_addr_t base_addr, int index)
+{
+    uint64_t slpte;
+
+    assert(index >= 0 && index < SL_PT_ENTRY_NR);
+
+    if (dma_memory_read(&address_space_memory,
+                        base_addr + index * sizeof(slpte), &slpte,
+                        sizeof(slpte))) {
+        fprintf(stderr, "(vtd)error: fail to read slpte\n");
+        return (uint64_t)-1;
+    }
+    slpte = le64_to_cpu(slpte);
+    return slpte;
+}
+
+#if 0
+static inline void print_slpt(dma_addr_t base_addr)
+{
+    int i;
+    uint64_t slpte;
+
+    D("slpt at addr 0x%"PRIx64 "===========", base_addr);
+    for (i = 0; i < SL_PT_ENTRY_NR; i++) {
+        slpte = get_slpte(base_addr, i);
+        D("slpte #%d 0x%"PRIx64, i, slpte);
+    }
+}
+
+static void print_root_table(IntelIOMMUState *s)
+{
+    int i;
+    vtd_root_entry re;
+    D("root-table=====================");
+    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
+        get_root_entry(s, i, &re);
+        if (root_entry_present(&re)) {
+            D("root-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
+              re.val);
+        }
+    }
+}
+
+static void print_context_table(vtd_root_entry *re)
+{
+    int i;
+    vtd_context_entry ce;
+    D("context-table==================");
+    for (i = 0; i < CONTEXT_ENTRY_NR; ++i) {
+        get_context_entry_from_root(re, i, &ce);
+        if (context_entry_present(&ce)) {
+            D("context-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, ce.hi,
+              ce.lo);
+        }
+    }
+}
+#endif
+
+/* Given a gpa and the level of paging structure, return the offset of current
+ * level.
+ */
+static inline int gpa_level_offset(uint64_t gpa, int level)
+{
+    return (gpa >> slpt_level_shift(level)) & ((1ULL << SL_LEVEL_BITS) - 1);
+}
+
+/* Get the page-table level that hardware should use for the second-level
+ * page-table walk from the Address Width field of context-entry.
+ */
+static inline int get_level_from_context_entry(vtd_context_entry *ce)
+{
+    return 2 + (ce->hi & CONTEXT_ENTRY_AW);
+}
+
+/* Given the @gpa, return relevant slpte. @slpte_level will be the last level
+ * of the translation, can be used for deciding the size of large page.
+ */
+static uint64_t gpa_to_slpte(vtd_context_entry *ce, uint64_t gpa,
+                             int *slpte_level)
+{
+    dma_addr_t addr = get_slpt_base_from_context(ce);
+    int level = get_level_from_context_entry(ce);
+    int offset;
+    uint64_t slpte;
+
+    /* D("slpt_base 0x%"PRIx64, addr); */
+    while (true) {
+        offset = gpa_level_offset(gpa, level);
+        slpte = get_slpte(addr, offset);
+        /* D("level %d slpte 0x%"PRIx64, level, slpte); */
+        if (!slpte_present(slpte)) {
+            D("error: slpte 0x%"PRIx64 " is not present", slpte);
+            slpte = (uint64_t)-1;
+            *slpte_level = level;
+            break;
+        }
+        if (is_last_slpte(slpte, level)) {
+            *slpte_level = level;
+            break;
+        }
+        addr = get_slpte_addr(slpte);
+        level--;
+    }
+    return slpte;
+}
+
+/* Do a paging-structures walk to do a iommu translation
+ * @bus_num: The bus number
+ * @devfn: The devfn, which is the  combined of device and function number
+ * @entry: IOMMUTLBEntry that contain the addr to be translated and result
+ */
+static void iommu_translate(IntelIOMMUState *s, int bus_num, int devfn,
+                            hwaddr addr, IOMMUTLBEntry *entry)
+{
+    vtd_root_entry re;
+    vtd_context_entry ce;
+    uint64_t slpte;
+    int level;
+    uint64_t page_mask = VTD_PAGE_MASK_4K;
+
+
+    if (!get_root_entry(s, bus_num, &re)) {
+        /* FIXME */
+        return;
+    }
+    if (!root_entry_present(&re)) {
+        /* FIXME */
+        D("error: root-entry #%d is not present", bus_num);
+        return;
+    }
+    /* D("root-entry low 0x%"PRIx64, re.val); */
+    if (!get_context_entry_from_root(&re, devfn, &ce)) {
+        /* FIXME */
+        return;
+    }
+    if (!context_entry_present(&ce)) {
+        /* FIXME */
+        D("error: context-entry #%d(bus #%d) is not present", devfn, bus_num);
+        return;
+    }
+    /* D("context-entry hi 0x%"PRIx64 " low 0x%"PRIx64, ce.hi, ce.lo); */
+    slpte = gpa_to_slpte(&ce, addr, &level);
+    if (slpte == (uint64_t)-1) {
+        /* FIXME */
+        D("error: can't get slpte for gpa %"PRIx64, addr);
+        return;
+    }
+
+    if (is_large_pte(slpte)) {
+        if (level == SL_PDP_LEVEL) {
+            /* 1-GB page */
+            page_mask = VTD_PAGE_MASK_1G;
+        } else {
+            /* 2-MB page */
+            page_mask = VTD_PAGE_MASK_2M;
+        }
+    }
+
+    entry->iova = addr & page_mask;
+    entry->translated_addr = get_slpte_addr(slpte) & page_mask;
+    entry->addr_mask = ~page_mask;
+    entry->perm = IOMMU_RW;
+}
+
+#if 0
+/* Iterate a Second Level Page Table */
+static void __walk_slpt(dma_addr_t table_addr, int level, uint64_t gpa)
+{
+    int index;
+    uint64_t next_gpa;
+    dma_addr_t next_table_addr;
+    uint64_t slpte;
+
+    if (level < SL_PT_LEVEL) {
+        return;
+    }
+
+
+    for (index = 0; index < SL_PT_ENTRY_NR; ++index) {
+        slpte = get_slpte(table_addr, index);
+        if (!slpte_present(slpte)) {
+            continue;
+        }
+        D("level %d, index 0x%x, gpa 0x%"PRIx64, level, index, gpa);
+        next_gpa = get_slpt_gpa(gpa, index, level);
+        next_table_addr = get_slpte_addr(slpte);
+
+        if (is_last_slpte(slpte, level)) {
+            D("slpte gpa 0x%"PRIx64 ", hpa 0x%"PRIx64, next_gpa,
+              next_table_addr);
+            if (next_gpa != next_table_addr) {
+                D("Not 1:1 mapping, slpte 0x%"PRIx64, slpte);
+            }
+        } else {
+            __walk_slpt(next_table_addr, level - 1, next_gpa);
+        }
+    }
+}
+
+
+static void print_paging_structure_from_context(vtd_context_entry *ce)
+{
+    dma_addr_t table_addr;
+
+    table_addr = get_slpt_base_from_context(ce);
+    __walk_slpt(table_addr, SL_PDP_LEVEL, 0);
+}
+
+
+static void print_root_table_all(IntelIOMMUState *s)
+{
+    int i;
+    int j;
+    vtd_root_entry re;
+    vtd_context_entry ce;
+
+    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
+        if (get_root_entry(s, i, &re) && root_entry_present(&re)) {
+            D("root_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
+              re.val);
+
+            for (j = 0; j < CONTEXT_ENTRY_NR; ++j) {
+                if (get_context_entry_from_root(&re, j, &ce)
+                    && context_entry_present(&ce)) {
+                    D("context_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64,
+                      j, ce.hi, ce.lo);
+                    D("--------------------------------");
+                    print_paging_structure_from_context(&ce);
+                }
+            }
+        }
+    }
+}
+#endif
+
+
+
+
+static void vtd_root_table_setup(IntelIOMMUState *s)
+{
+    s->root = *((uint64_t *)&s->csr[DMAR_RTADDR_REG]);
+    s->extended = s->root & VTD_RTADDR_RTT;
+    s->root &= ~0xfff;
+    D("root_table addr 0x%"PRIx64 " %s", s->root,
+      (s->extended ? "(Extended)" : ""));
+}
+
+/* Context-cache invalidation
+ * Returns the Context Actual Invalidation Granularity.
+ * @val: the content of the CCMD_REG
+ */
+static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
+{
+    uint64_t caig;
+    uint64_t type = val & VTD_CCMD_CIRG_MASK;
+
+    switch (type) {
+    case VTD_CCMD_GLOBAL_INVL:
+        D("Global invalidation request");
+        caig = VTD_CCMD_GLOBAL_INVL_A;
+        break;
+
+    case VTD_CCMD_DOMAIN_INVL:
+        D("Domain-selective invalidation request");
+        caig = VTD_CCMD_DOMAIN_INVL_A;
+        break;
+
+    case VTD_CCMD_DEVICE_INVL:
+        D("Domain-selective invalidation request");
+        caig = VTD_CCMD_DEVICE_INVL_A;
+        break;
+
+    default:
+        D("error: wrong context-cache invalidation granularity");
+        caig = 0;
+    }
+
+    return caig;
+}
+
+
+/* Flush IOTLB
+ * Returns the IOTLB Actual Invalidation Granularity.
+ * @val: the content of the IOTLB_REG
+ */
+static uint64_t vtd_iotlb_flush(IntelIOMMUState *s, uint64_t val)
+{
+    uint64_t iaig;
+    uint64_t type = val & VTD_TLB_FLUSH_GRANU_MASK;
+
+    switch (type) {
+    case VTD_TLB_GLOBAL_FLUSH:
+        D("Global IOTLB flush");
+        iaig = VTD_TLB_GLOBAL_FLUSH_A;
+        break;
+
+    case VTD_TLB_DSI_FLUSH:
+        D("Domain-selective IOTLB flush");
+        iaig = VTD_TLB_DSI_FLUSH_A;
+        break;
+
+    case VTD_TLB_PSI_FLUSH:
+        D("Page-selective-within-domain IOTLB flush");
+        iaig = VTD_TLB_PSI_FLUSH_A;
+        break;
+
+    default:
+        D("error: wrong iotlb flush granularity");
+        iaig = 0;
+    }
+
+    return iaig;
+}
+
+
+#if 0
+static void iommu_inv_queue_setup(IntelIOMMUState *s)
+{
+    uint64_t tail_val;
+    s->iq = *((uint64_t *)&s->csr[DMAR_IQA_REG]);
+    s->iq_sz = 0x100 << (s->iq & 0x7);  /* 256 entries per page */
+    s->iq &= ~0x7;
+    s->iq_enable = true;
+
+    /* Init head pointers */
+    tail_val = *((uint64_t *)&s->csr[DMAR_IQT_REG]);
+    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = tail_val;
+    s->iq_head = s->iq_tail = (tail_val >> 4) & 0x7fff;
+    D(" -- address: 0x%lx size 0x%lx", s->iq, s->iq_sz);
+}
+
+
+static int handle_invalidate(IntelIOMMUState *s, uint16_t i)
+{
+    intel_iommu_inv_desc entry;
+    uint8_t type;
+    dma_memory_read(&address_space_memory, s->iq + sizeof(entry) * i, &entry,
+                    sizeof(entry));
+    type = entry.lower & 0xf;
+    D("Processing invalidate request %d - desc: %016lx.%016lx", i,
+      entry.upper, entry.lower);
+    switch (type) {
+    case CONTEXT_CACHE_INV_DESC:
+        D("Context-cache Invalidate");
+        break;
+    case IOTLB_INV_DESC:
+        D("IOTLB Invalidate");
+        break;
+    case INV_WAIT_DESC:
+        D("Invalidate Wait");
+        if (status_write(entry.lower)) {
+            dma_memory_write(&address_space_memory, entry.upper,
+                             (uint8_t *)&entry.lower + 4, 4);
+        }
+        break;
+    default:
+        D(" - not impl - ");
+    }
+    return 0;
+}
+
+
+static void handle_iqt_write(IntelIOMMUState *s, uint64_t val)
+{
+    s->iq_tail = (val >> 4) & 0x7fff;
+    D("Write to IQT_REG new tail = %d", s->iq_tail);
+
+    if (!s->iq_enable) {
+        return;
+    }
+
+    /* Process the invalidation queue */
+    while (s->iq_head != s->iq_tail) {
+        handle_invalidate(s, s->iq_head++);
+        if (s->iq_head == s->iq_sz) {
+            s->iq_head = 0;
+        }
+    }
+    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = s->iq_head << 4;
+
+    set_quad(s, DMAR_IQT_REG, val);
+}
+#endif
+
+/* FIXME: Not implemented yet */
+static void handle_gcmd_qie(IntelIOMMUState *s, bool en)
+{
+    D("Queued Invalidation Enable %s", (en ? "on" : "off"));
+
+    /*if (en) {
+        iommu_inv_queue_setup(s);
+    }*/
+
+    /* Ok - report back to driver */
+    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_QIES);
+}
+
+
+/* Set Root Table Pointer */
+static void handle_gcmd_srtp(IntelIOMMUState *s)
+{
+    D("set Root Table Pointer");
+
+    vtd_root_table_setup(s);
+    /* Ok - report back to driver */
+    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
+}
+
+
+/* Handle Translation Enable/Disable */
+static void handle_gcmd_te(IntelIOMMUState *s, bool en)
+{
+    D("Translation Enable %s", (en ? "on" : "off"));
+
+    if (en) {
+        /* Ok - report back to driver */
+        set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
+    } else {
+        /* Ok - report back to driver */
+        set_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0);
+    }
+}
+
+/* Handle write to Global Command Register */
+static void handle_gcmd_write(IntelIOMMUState *s)
+{
+    uint32_t status = __get_long(s, DMAR_GSTS_REG);
+    uint32_t val = __get_long(s, DMAR_GCMD_REG);
+    uint32_t changed = status ^ val;
+
+    D("value 0x%x status 0x%x", val, status);
+    if (changed & VTD_GCMD_TE) {
+        /* Translation enable/disable */
+        handle_gcmd_te(s, val & VTD_GCMD_TE);
+    } else if (val & VTD_GCMD_SRTP) {
+        /* Set/update the root-table pointer */
+        handle_gcmd_srtp(s);
+    } else if (changed & VTD_GCMD_QIE) {
+        /* Queued Invalidation Enable */
+        handle_gcmd_qie(s, val & VTD_GCMD_QIE);
+    } else {
+        D("error: unhandled gcmd write");
+    }
+}
+
+/* Handle write to Context Command Register */
+static void handle_ccmd_write(IntelIOMMUState *s)
+{
+    uint64_t ret;
+    uint64_t val = __get_quad(s, DMAR_CCMD_REG);
+
+    /* Context-cache invalidation request */
+    if (val & VTD_CCMD_ICC) {
+        ret = vtd_context_cache_invalidate(s, val);
+
+        /* Invalidation completed. Change something to show */
+        set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_ICC, 0ULL);
+        ret = set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_CAIG_MASK, ret);
+        D("CCMD_REG write-back val: 0x%"PRIx64, ret);
+    }
+}
+
+/* Handle write to IOTLB Invalidation Register */
+static void handle_iotlb_write(IntelIOMMUState *s)
+{
+    uint64_t ret;
+    uint64_t val = __get_quad(s, DMAR_IOTLB_REG);
+
+    /* IOTLB invalidation request */
+    if (val & VTD_TLB_IVT) {
+        ret = vtd_iotlb_flush(s, val);
+
+        /* Invalidation completed. Change something to show */
+        set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_IVT, 0ULL);
+        ret = set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_FLUSH_GRANU_MASK_A, ret);
+        D("IOTLB_REG write-back val: 0x%"PRIx64, ret);
+    }
+}
+
+
+static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    IntelIOMMUState *s = opaque;
+    uint64_t val;
+
+    if (addr + size > DMAR_REG_SIZE) {
+        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
+          DMAR_REG_SIZE, addr, size);
+        return (uint64_t)-1;
+    }
+
+    assert(size == 4 || size == 8);
+
+    switch (addr) {
+    /* Root Table Address Register, 64-bit */
+    case DMAR_RTADDR_REG:
+        if (size == 4) {
+            val = (uint32_t)s->root;
+        } else {
+            val = s->root;
+        }
+        break;
+
+    case DMAR_RTADDR_REG_HI:
+        assert(size == 4);
+        val = s->root >> 32;
+        break;
+
+    default:
+        if (size == 4) {
+            val = get_long(s, addr);
+        } else {
+            val = get_quad(s, addr);
+        }
+    }
+
+    D("addr 0x%"PRIx64 " size %d val 0x%"PRIx64, addr, size, val);
+    return val;
+}
+
+static void vtd_mem_write(void *opaque, hwaddr addr,
+                          uint64_t val, unsigned size)
+{
+    IntelIOMMUState *s = opaque;
+
+    if (addr + size > DMAR_REG_SIZE) {
+        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
+          DMAR_REG_SIZE, addr, size);
+        return;
+    }
+
+    assert(size == 4 || size == 8);
+
+    /* Val should be written into csr within the handler */
+    switch (addr) {
+    /* Global Command Register, 32-bit */
+    case DMAR_GCMD_REG:
+        D("DMAR_GCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        set_long(s, addr, val);
+        handle_gcmd_write(s);
+        break;
+
+    /* Invalidation Queue Tail Register, 64-bit */
+    /*case DMAR_IQT_REG:
+        if (size == 4) {
+
+        }
+        if (size == 4) {
+            if (former_size == 0) {
+                former_size = size;
+                former_value = val;
+                goto out;
+            } else {
+                val = (val << 32) + former_value;
+                former_size = 0;
+                former_value = 0;
+            }
+        }
+        handle_iqt_write(s, val);
+        break;*/
+
+    /* Context Command Register, 64-bit */
+    case DMAR_CCMD_REG:
+        D("DMAR_CCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        if (size == 4) {
+            set_long(s, addr, val);
+        } else {
+            set_quad(s, addr, val);
+            handle_ccmd_write(s);
+        }
+        break;
+
+    case DMAR_CCMD_REG_HI:
+        D("DMAR_CCMD_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        assert(size == 4);
+        set_long(s, addr, val);
+        handle_ccmd_write(s);
+        break;
+
+
+    /* IOTLB Invalidation Register, 64-bit */
+    case DMAR_IOTLB_REG:
+        D("DMAR_IOTLB_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        if (size == 4) {
+            set_long(s, addr, val);
+        } else {
+            set_quad(s, addr, val);
+            handle_iotlb_write(s);
+        }
+        break;
+
+    case DMAR_IOTLB_REG_HI:
+        D("DMAR_IOTLB_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        assert(size == 4);
+        set_long(s, addr, val);
+        handle_iotlb_write(s);
+        break;
+
+    /* Fault Status Register, 32-bit */
+    case DMAR_FSTS_REG:
+    /* Fault Event Data Register, 32-bit */
+    case DMAR_FEDATA_REG:
+    /* Fault Event Address Register, 32-bit */
+    case DMAR_FEADDR_REG:
+    /* Fault Event Upper Address Register, 32-bit */
+    case DMAR_FEUADDR_REG:
+    /* Fault Event Control Register, 32-bit */
+    case DMAR_FECTL_REG:
+    /* Protected Memory Enable Register, 32-bit */
+    case DMAR_PMEN_REG:
+        D("known reg write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        set_long(s, addr, val);
+        break;
+
+
+    /* Root Table Address Register, 64-bit */
+    case DMAR_RTADDR_REG:
+        D("DMAR_RTADDR_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        if (size == 4) {
+            set_long(s, addr, val);
+        } else {
+            set_quad(s, addr, val);
+        }
+        break;
+
+    case DMAR_RTADDR_REG_HI:
+        D("DMAR_RTADDR_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
+          addr, size, val);
+        assert(size == 4);
+        set_long(s, addr, val);
+        break;
+
+    default:
+        D("error: unhandled reg write addr 0x%"PRIx64
+          ", size %d, val 0x%"PRIx64, addr, size, val);
+        if (size == 4) {
+            set_long(s, addr, val);
+        } else {
+            set_quad(s, addr, val);
+        }
+    }
+
+}
+
+
+static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr)
+{
+    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    int bus_num = vtd_as->bus_num;
+    int devfn = vtd_as->devfn;
+    IOMMUTLBEntry ret = {
+        .target_as = &address_space_memory,
+        .iova = 0,
+        .translated_addr = 0,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    if (!(__get_long(s, DMAR_GSTS_REG) & VTD_GSTS_TES)) {
+        /* DMAR disabled, passthrough, use 4k-page*/
+        ret.iova = addr & VTD_PAGE_MASK_4K;
+        ret.translated_addr = addr & VTD_PAGE_MASK_4K;
+        ret.addr_mask = ~VTD_PAGE_MASK_4K;
+        ret.perm = IOMMU_RW;
+        return ret;
+    }
+
+    iommu_translate(s, bus_num, devfn, addr, &ret);
+
+    D("bus %d slot %d func %d devfn %d gpa %"PRIx64 " hpa %"PRIx64,
+      bus_num, VTD_PCI_SLOT(devfn), VTD_PCI_FUNC(devfn), devfn, addr,
+      ret.translated_addr);
+    return ret;
+}
+
+static const VMStateDescription vtd_vmstate = {
+    .name = "iommu_intel",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(csr, IntelIOMMUState, DMAR_REG_SIZE),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+
+static const MemoryRegionOps vtd_mem_ops = {
+    .read = vtd_mem_read,
+    .write = vtd_mem_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+};
+
+
+static Property iommu_properties[] = {
+    DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+/* Do the real initialization. It will also be called when reset, so pay
+ * attention when adding new initialization stuff.
+ */
+static void do_vtd_init(IntelIOMMUState *s)
+{
+    memset(s->csr, 0, DMAR_REG_SIZE);
+    memset(s->wmask, 0, DMAR_REG_SIZE);
+    memset(s->w1cmask, 0, DMAR_REG_SIZE);
+    memset(s->womask, 0, DMAR_REG_SIZE);
+
+    s->iommu_ops.translate = vtd_iommu_translate;
+    s->root = 0;
+    s->extended = false;
+
+    /* b.0:2 = 6: Number of domains supported: 64K using 16 bit ids
+     * b.3   = 0: No advanced fault logging
+     * b.4   = 0: No required write buffer flushing
+     * b.5   = 0: Protected low memory region not supported
+     * b.6   = 0: Protected high memory region not supported
+     * b.8:12 = 2: SAGAW(Supported Adjusted Guest Address Widths), 39-bit,
+     *             3-level page-table
+     * b.16:21 = 38: MGAW(Maximum Guest Address Width) = 39
+     * b.22 = 1: ZLR(Zero Length Read) supports zero length DMA read requests
+     *           to write-only pages
+     * b.24:33 = 34: FRO(Fault-recording Register offset)
+     * b.54 = 0: DWD(Write Draining), draining of write requests not supported
+     * b.55 = 0: DRD(Read Draining), draining of read requests not supported
+     */
+    const uint64_t dmar_cap_reg_value = 0x00400000ULL | VTD_CAP_FRO |
+                                        VTD_CAP_NFR | VTD_CAP_ND |
+                                        VTD_CAP_MGAW | VTD_SAGAW_39bit;
+
+    /* b.1 = 0: QI(Queued Invalidation support) not supported
+     * b.2 = 0: DT(Device-TLB support)
+     * b.3 = 0: IR(Interrupt Remapping support) not supported
+     * b.4 = 0: EIM(Extended Interrupt Mode) not supported
+     * b.8:17 = 15: IRO(IOTLB Register Offset)
+     * b.20:23 = 15: MHMV(Maximum Handle Mask Value)
+     */
+    const uint64_t dmar_ecap_reg_value = 0xf00000ULL | VTD_ECAP_IRO;
+
+    /* Define registers with default values and bit semantics */
+    define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);  /* set MAX = 1, RO */
+    define_quad(s, DMAR_CAP_REG, dmar_cap_reg_value, 0, 0);
+    define_quad(s, DMAR_ECAP_REG, dmar_ecap_reg_value, 0, 0);
+    define_long(s, DMAR_GCMD_REG, 0, 0xff800000UL, 0);
+    define_long_wo(s, DMAR_GCMD_REG, 0xff800000UL);
+    define_long(s, DMAR_GSTS_REG, 0, 0, 0); /* All bits RO, default 0 */
+    define_quad(s, DMAR_RTADDR_REG, 0, 0xfffffffffffff000ULL, 0);
+    define_quad(s, DMAR_CCMD_REG, 0, 0xe0000003ffffffffULL, 0);
+    define_quad_wo(s, DMAR_CCMD_REG, 0x3ffff0000ULL);
+    define_long(s, DMAR_FSTS_REG, 0, 0, 0xfdUL);
+    define_long(s, DMAR_FECTL_REG, 0x80000000UL, 0x80000000UL, 0);
+    define_long(s, DMAR_FEDATA_REG, 0, 0xffffffffUL, 0); /* All bits RW */
+    define_long(s, DMAR_FEADDR_REG, 0, 0xfffffffcUL, 0); /* 31:2 RW */
+    define_long(s, DMAR_FEUADDR_REG, 0, 0xffffffffUL, 0); /* 31:0 RW */
+
+    define_quad(s, DMAR_AFLOG_REG, 0, 0xffffffffffffff00ULL, 0);
+
+    /* Treated as RO for implementations that PLMR and PHMR fields reported
+     * as Clear in the CAP_REG.
+     * define_long(s, DMAR_PMEN_REG, 0, 0x80000000UL, 0);
+     */
+    define_long(s, DMAR_PMEN_REG, 0, 0, 0);
+
+    /* TBD: The definition of these are dynamic:
+     * DMAR_PLMBASE_REG, DMAR_PLMLIMIT_REG, DMAR_PHMBASE_REG, DMAR_PHMLIMIT_REG
+     */
+
+    /* Bits 18:4 (0x7fff0) is RO, rest is RsvdZ
+     * IQH_REG is treated as RsvdZ when not supported in ECAP_REG
+     * define_quad(s, DMAR_IQH_REG, 0, 0, 0);
+     */
+    define_quad(s, DMAR_IQH_REG, 0, 0, 0);
+
+    /* IQT_REG and IQA_REG is treated as RsvdZ when not supported in ECAP_REG
+     * define_quad(s, DMAR_IQT_REG, 0, 0x7fff0ULL, 0);
+     * define_quad(s, DMAR_IQA_REG, 0, 0xfffffffffffff007ULL, 0);
+     */
+    define_quad(s, DMAR_IQT_REG, 0, 0, 0);
+    define_quad(s, DMAR_IQA_REG, 0, 0, 0);
+
+    /* Bit 0 is RW1CS - rest is RsvdZ */
+    define_long(s, DMAR_ICS_REG, 0, 0, 0x1UL);
+
+    /* b.31 is RW, b.30 RO, rest: RsvdZ */
+    define_long(s, DMAR_IECTL_REG, 0x80000000UL, 0x80000000UL, 0);
+
+    define_long(s, DMAR_IEDATA_REG, 0, 0xffffffffUL, 0);
+    define_long(s, DMAR_IEADDR_REG, 0, 0xfffffffcUL, 0);
+    define_long(s, DMAR_IEUADDR_REG, 0, 0xffffffffUL, 0);
+    define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
+    define_quad(s, DMAR_PQH_REG, 0, 0x7fff0ULL, 0);
+    define_quad(s, DMAR_PQT_REG, 0, 0x7fff0ULL, 0);
+    define_quad(s, DMAR_PQA_REG, 0, 0xfffffffffffff007ULL, 0);
+    define_long(s, DMAR_PRS_REG, 0, 0, 0x1UL);
+    define_long(s, DMAR_PECTL_REG, 0x80000000UL, 0x80000000UL, 0);
+    define_long(s, DMAR_PEDATA_REG, 0, 0xffffffffUL, 0);
+    define_long(s, DMAR_PEADDR_REG, 0, 0xfffffffcUL, 0);
+    define_long(s, DMAR_PEUADDR_REG, 0, 0xffffffffUL, 0);
+
+    /* When MTS not supported in ECAP_REG, these regs are RsvdZ */
+    define_long(s, DMAR_MTRRCAP_REG, 0, 0, 0);
+    define_long(s, DMAR_MTRRDEF_REG, 0, 0, 0);
+
+    /* IOTLB registers */
+    define_quad(s, DMAR_IOTLB_REG, 0, 0Xb003ffff00000000ULL, 0);
+    define_quad(s, DMAR_IVA_REG, 0, 0xfffffffffffff07fULL, 0);
+    define_quad_wo(s, DMAR_IVA_REG, 0xfffffffffffff07fULL);
+}
+
+#if 0
+/* Iterate IntelIOMMUState->address_spaces[] and free any allocated memory */
+static void clean_address_space(IntelIOMMUState *s)
+{
+    VTDAddressSpace **pvtd_as;
+    VTDAddressSpace *vtd_as;
+    int i;
+    int j;
+    const int MAX_DEVFN = VTD_PCI_SLOT_MAX * VTD_PCI_FUNC_MAX;
+
+    for (i = 0; i < VTD_PCI_BUS_MAX; ++i) {
+        pvtd_as = s->address_spaces[i];
+        if (!pvtd_as) {
+            continue;
+        }
+        for (j = 0; j < MAX_DEVFN; ++j) {
+            vtd_as = *(pvtd_as + j);
+            if (!vtd_as) {
+                continue;
+            }
+            g_free(vtd_as);
+            *(pvtd_as + j) = 0;
+        }
+        g_free(pvtd_as);
+        s->address_spaces[i] = 0;
+    }
+}
+#endif
+
+/* Reset function of QOM
+ * Should not reset address_spaces when reset
+ */
+static void vtd_reset(DeviceState *dev)
+{
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
+
+    D("");
+    do_vtd_init(s);
+}
+
+/* Initializatoin function of QOM */
+static void vtd_realize(DeviceState *dev, Error **errp)
+{
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
+
+    D("");
+    memset(s->address_spaces, 0, sizeof(s->address_spaces));
+    memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
+                          "intel_iommu", DMAR_REG_SIZE);
+
+    do_vtd_init(s);
+}
+
+static void vtd_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = vtd_reset;
+    dc->realize = vtd_realize;
+    dc->vmsd = &vtd_vmstate;
+    dc->props = iommu_properties;
+}
+
+static const TypeInfo vtd_info = {
+    .name          = TYPE_INTEL_IOMMU_DEVICE,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(IntelIOMMUState),
+    .class_init    = vtd_class_init,
+};
+
+static void vtd_register_types(void)
+{
+    D("");
+    type_register_static(&vtd_info);
+}
+
+type_init(vtd_register_types)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
new file mode 100644
index 0000000..af5aff8
--- /dev/null
+++ b/include/hw/i386/intel_iommu.h
@@ -0,0 +1,350 @@
+/*
+ * QEMU emulation of an Intel IOMMU (VT-d)
+ *   (DMA Remapping device)
+ *
+ * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
+ * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * Lots of defines copied from kernel/include/linux/intel-iommu.h:
+ *   Copyright (C) 2006-2008 Intel Corporation
+ *   Author: Ashok Raj <ashok.raj@intel.com>
+ *   Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
+ *
+ */
+
+#ifndef _INTEL_IOMMU_H
+#define _INTEL_IOMMU_H
+#include "hw/qdev.h"
+#include "sysemu/dma.h"
+
+#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
+#define INTEL_IOMMU_DEVICE(obj) \
+     OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
+
+/* DMAR Hardware Unit Definition address (IOMMU unit) set i bios */
+#define Q35_HOST_BRIDGE_IOMMU_ADDR 0xfed90000ULL
+
+/*
+ * Intel IOMMU register specification per version 1.0 public spec.
+ */
+
+#define DMAR_VER_REG    0x0 /* Arch version supported by this IOMMU */
+#define DMAR_CAP_REG    0x8 /* Hardware supported capabilities */
+#define DMAR_CAP_REG_HI 0xc /* High 32-bit of DMAR_CAP_REG */
+#define DMAR_ECAP_REG   0x10    /* Extended capabilities supported */
+#define DMAR_ECAP_REG_HI    0X14
+#define DMAR_GCMD_REG   0x18    /* Global command register */
+#define DMAR_GSTS_REG   0x1c    /* Global status register */
+#define DMAR_RTADDR_REG 0x20    /* Root entry table */
+#define DMAR_RTADDR_REG_HI  0X24
+#define DMAR_CCMD_REG   0x28  /* Context command reg */
+#define DMAR_CCMD_REG_HI    0x2c
+#define DMAR_FSTS_REG   0x34  /* Fault Status register */
+#define DMAR_FECTL_REG  0x38 /* Fault control register */
+#define DMAR_FEDATA_REG 0x3c    /* Fault event interrupt data register */
+#define DMAR_FEADDR_REG 0x40    /* Fault event interrupt addr register */
+#define DMAR_FEUADDR_REG    0x44   /* Upper address register */
+#define DMAR_AFLOG_REG  0x58 /* Advanced Fault control */
+#define DMAR_AFLOG_REG_HI   0X5c
+#define DMAR_PMEN_REG   0x64  /* Enable Protected Memory Region */
+#define DMAR_PLMBASE_REG    0x68    /* PMRR Low addr */
+#define DMAR_PLMLIMIT_REG 0x6c  /* PMRR low limit */
+#define DMAR_PHMBASE_REG 0x70   /* pmrr high base addr */
+#define DMAR_PHMBASE_REG_HI 0X74
+#define DMAR_PHMLIMIT_REG 0x78  /* pmrr high limit */
+#define DMAR_PHMLIMIT_REG_HI 0x7c
+#define DMAR_IQH_REG    0x80   /* Invalidation queue head register */
+#define DMAR_IQH_REG_HI 0X84
+#define DMAR_IQT_REG    0x88   /* Invalidation queue tail register */
+#define DMAR_IQT_REG_HI 0X8c
+#define DMAR_IQ_SHIFT   4 /* Invalidation queue head/tail shift */
+#define DMAR_IQA_REG    0x90   /* Invalidation queue addr register */
+#define DMAR_IQA_REG_HI 0x94
+#define DMAR_ICS_REG    0x9c   /* Invalidation complete status register */
+#define DMAR_IRTA_REG   0xb8    /* Interrupt remapping table addr register */
+#define DMAR_IRTA_REG_HI    0xbc
+
+/* From Vt-d 2.2 spec */
+#define DMAR_IECTL_REG  0xa0    /* Invalidation event control register */
+#define DMAR_IEDATA_REG 0xa4    /* Invalidation event data register */
+#define DMAR_IEADDR_REG 0xa8    /* Invalidation event address register */
+#define DMAR_IEUADDR_REG 0xac    /* Invalidation event address register */
+#define DMAR_PQH_REG    0xc0    /* Page request queue head register */
+#define DMAR_PQH_REG_HI 0xc4
+#define DMAR_PQT_REG    0xc8    /* Page request queue tail register*/
+#define DMAR_PQT_REG_HI     0xcc
+#define DMAR_PQA_REG    0xd0    /* Page request queue address register */
+#define DMAR_PQA_REG_HI 0xd4
+#define DMAR_PRS_REG    0xdc    /* Page request status register */
+#define DMAR_PECTL_REG  0xe0    /* Page request event control register */
+#define DMAR_PEDATA_REG 0xe4    /* Page request event data register */
+#define DMAR_PEADDR_REG 0xe8    /* Page request event address register */
+#define DMAR_PEUADDR_REG  0xec  /* Page event upper address register */
+#define DMAR_MTRRCAP_REG 0x100  /* MTRR capability register */
+#define DMAR_MTRRCAP_REG_HI 0x104
+#define DMAR_MTRRDEF_REG 0x108  /* MTRR default type register */
+#define DMAR_MTRRDEF_REG_HI 0x10c
+
+/* IOTLB */
+#define DMAR_IOTLB_REG_OFFSET 0xf0  /* Offset to the IOTLB registers */
+#define DMAR_IVA_REG DMAR_IOTLB_REG_OFFSET  /* Invalidate Address Register */
+#define DMAR_IVA_REG_HI (DMAR_IVA_REG + 4)
+/* IOTLB Invalidate Register */
+#define DMAR_IOTLB_REG (DMAR_IOTLB_REG_OFFSET + 0x8)
+#define DMAR_IOTLB_REG_HI (DMAR_IOTLB_REG + 4)
+
+/* FRCD */
+#define DMAR_FRCD_REG_OFFSET 0x220 /* Offset to the Fault Recording Registers */
+#define DMAR_FRCD_REG_NR 1 /* Num of Fault Recording Registers */
+
+#define DMAR_REG_SIZE   (DMAR_FRCD_REG_OFFSET + 128 * DMAR_FRCD_REG_NR)
+
+#define VTD_PCI_BUS_MAX 256
+#define VTD_PCI_SLOT_MAX 32
+#define VTD_PCI_FUNC_MAX 8
+#define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
+#define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
+
+typedef struct IntelIOMMUState IntelIOMMUState;
+typedef struct VTDAddressSpace VTDAddressSpace;
+
+struct VTDAddressSpace {
+    int bus_num;
+    int devfn;
+    AddressSpace as;
+    MemoryRegion iommu;
+    IntelIOMMUState *iommu_state;
+};
+
+/* The iommu (DMAR) device state struct */
+struct IntelIOMMUState {
+    SysBusDevice busdev;
+    MemoryRegion csrmem;
+    uint8_t csr[DMAR_REG_SIZE];     /* register values */
+    uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
+    uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
+    uint8_t womask[DMAR_REG_SIZE]; /* WO (write only - read returns 0) */
+    uint32_t version;
+
+    dma_addr_t root;  /* Current root table pointer */
+    bool extended;    /* Type of root table (extended or not) */
+    uint16_t iq_head; /* Current invalidation queue head */
+    uint16_t iq_tail; /* Current invalidation queue tail */
+    dma_addr_t iq;   /* Current invalidation queue (IQ) pointer */
+    size_t iq_sz;    /* IQ Size in number of entries */
+    bool iq_enable;  /* Set if the IQ is enabled */
+
+    MemoryRegionIOMMUOps iommu_ops;
+    VTDAddressSpace **address_spaces[VTD_PCI_BUS_MAX];
+};
+
+
+/* An invalidate descriptor */
+typedef struct intel_iommu_inv_desc {
+    uint64_t lower;
+    uint64_t upper;
+} intel_iommu_inv_desc;
+
+
+/* Invalidate descriptor types */
+#define CONTEXT_CACHE_INV_DESC  0x1
+#define PASID_CACHE_INV_DESC    0x7
+#define IOTLB_INV_DESC          0x2
+#define EXT_IOTLB_INV_DESC      0x6
+#define DEV_TLB_INV_DESC        0x3
+#define EXT_DEV_TLB_INV_DESC    0x8
+#define INT_ENTRY_INV_DESC      0x4
+#define INV_WAIT_DESC           0x5
+
+
+/* IOTLB_REG */
+#define VTD_TLB_GLOBAL_FLUSH (1ULL << 60) /* Global invalidation */
+#define VTD_TLB_DSI_FLUSH (2ULL << 60)  /* Domain-selective invalidation */
+#define VTD_TLB_PSI_FLUSH (3ULL << 60)  /* Page-selective invalidation */
+#define VTD_TLB_FLUSH_GRANU_MASK (3ULL << 60)
+#define VTD_TLB_GLOBAL_FLUSH_A (1ULL << 57)
+#define VTD_TLB_DSI_FLUSH_A (2ULL << 57)
+#define VTD_TLB_PSI_FLUSH_A (3ULL << 57)
+#define VTD_TLB_FLUSH_GRANU_MASK_A (3ULL << 57)
+#define VTD_TLB_READ_DRAIN (1ULL << 49)
+#define VTD_TLB_WRITE_DRAIN (1ULL << 48)
+#define VTD_TLB_DID(id) (((uint64_t)((id) & 0xffffULL)) << 32)
+#define VTD_TLB_IVT (1ULL << 63)
+#define VTD_TLB_IH_NONLEAF (1ULL << 6)
+#define VTD_TLB_MAX_SIZE (0x3f)
+
+/* INVALID_DESC */
+#define DMA_CCMD_INVL_GRANU_OFFSET  61
+#define DMA_ID_TLB_GLOBAL_FLUSH (((uint64_t)1) << 3)
+#define DMA_ID_TLB_DSI_FLUSH    (((uint64_t)2) << 3)
+#define DMA_ID_TLB_PSI_FLUSH    (((uint64_t)3) << 3)
+#define DMA_ID_TLB_READ_DRAIN   (((uint64_t)1) << 7)
+#define DMA_ID_TLB_WRITE_DRAIN  (((uint64_t)1) << 6)
+#define DMA_ID_TLB_DID(id)  (((uint64_t)((id & 0xffff) << 16)))
+#define DMA_ID_TLB_IH_NONLEAF   (((uint64_t)1) << 6)
+#define DMA_ID_TLB_ADDR(addr)   (addr)
+#define DMA_ID_TLB_ADDR_MASK(mask)  (mask)
+
+/* PMEN_REG */
+#define DMA_PMEN_EPM (((uint32_t)1)<<31)
+#define DMA_PMEN_PRS (((uint32_t)1)<<0)
+
+/* GCMD_REG */
+#define VTD_GCMD_TE (1UL << 31)
+#define VTD_GCMD_SRTP (1UL << 30)
+#define VTD_GCMD_SFL (1UL << 29)
+#define VTD_GCMD_EAFL (1UL << 28)
+#define VTD_GCMD_WBF (1UL << 27)
+#define VTD_GCMD_QIE (1UL << 26)
+#define VTD_GCMD_IRE (1UL << 25)
+#define VTD_GCMD_SIRTP (1UL << 24)
+#define VTD_GCMD_CFI (1UL << 23)
+
+/* GSTS_REG */
+#define VTD_GSTS_TES (1UL << 31)
+#define VTD_GSTS_RTPS (1UL << 30)
+#define VTD_GSTS_FLS (1UL << 29)
+#define VTD_GSTS_AFLS (1UL << 28)
+#define VTD_GSTS_WBFS (1UL << 27)
+#define VTD_GSTS_QIES (1UL << 26)
+#define VTD_GSTS_IRES (1UL << 25)
+#define VTD_GSTS_IRTPS (1UL << 24)
+#define VTD_GSTS_CFIS (1UL << 23)
+
+/* CCMD_REG */
+#define VTD_CCMD_ICC (1ULL << 63)
+#define VTD_CCMD_GLOBAL_INVL (1ULL << 61)
+#define VTD_CCMD_DOMAIN_INVL (2ULL << 61)
+#define VTD_CCMD_DEVICE_INVL (3ULL << 61)
+#define VTD_CCMD_CIRG_MASK (3ULL << 61)
+#define VTD_CCMD_GLOBAL_INVL_A (1ULL << 59)
+#define VTD_CCMD_DOMAIN_INVL_A (2ULL << 59)
+#define VTD_CCMD_DEVICE_INVL_A (3ULL << 59)
+#define VTD_CCMD_CAIG_MASK (3ULL << 59)
+#define VTD_CCMD_FM(m) (((uint64_t)((m) & 3ULL)) << 32)
+#define VTD_CCMD_MASK_NOBIT 0
+#define VTD_CCMD_MASK_1BIT 1
+#define VTD_CCMD_MASK_2BIT 2
+#define VTD_CCMD_MASK_3BIT 3
+#define VTD_CCMD_SID(s) (((uint64_t)((s) & 0xffffULL)) << 16)
+#define VTD_CCMD_DID(d) ((uint64_t)((d) & 0xffffULL))
+
+/* FECTL_REG */
+#define DMA_FECTL_IM (((uint32_t)1) << 31)
+
+/* FSTS_REG */
+#define DMA_FSTS_PPF ((uint32_t)2)
+#define DMA_FSTS_PFO ((uint32_t)1)
+#define DMA_FSTS_IQE (1 << 4)
+#define DMA_FSTS_ICE (1 << 5)
+#define DMA_FSTS_ITE (1 << 6)
+#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
+
+/* RTADDR_REG */
+#define VTD_RTADDR_RTT (1ULL << 11)
+
+
+/* ECAP_REG */
+#define VTD_ECAP_IRO (DMAR_IOTLB_REG_OFFSET << 4)   /* (val >> 4) << 8 */
+
+/* CAP_REG */
+
+/* (val >> 4) << 24 */
+#define VTD_CAP_FRO     ((uint64_t)DMAR_FRCD_REG_OFFSET << 20)
+
+#define VTD_CAP_NFR     ((uint64_t)(DMAR_FRCD_REG_NR - 1) << 40)
+#define VTD_DOMAIN_ID_SHIFT     16  /* 16-bit domain id for 64K domains */
+#define VTD_CAP_ND  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
+#define VTD_MGAW    39  /* Maximum Guest Address Width */
+#define VTD_CAP_MGAW    (((VTD_MGAW - 1) & 0x3fULL) << 16)
+/* Supported Adjusted Guest Address Widths */
+#define VTD_SAGAW_MASK  (0x1fULL << 8)
+#define VTD_SAGAW_39bit (0x2ULL << 8)   /* 39-bit AGAW, 3-level page-table */
+#define VTD_SAGAW_48bit (0x4ULL << 8)   /* 48-bit AGAW, 4-level page-table */
+
+
+/* Pagesize of VTD paging structures, including root and context tables */
+#define VTD_PAGE_SHIFT      (12)
+#define VTD_PAGE_SIZE       (1UL << VTD_PAGE_SHIFT)
+
+#define VTD_PAGE_SHIFT_4K   (12)
+#define VTD_PAGE_MASK_4K    (~((1ULL << VTD_PAGE_SHIFT_4K) - 1))
+#define VTD_PAGE_SHIFT_2M   (21)
+#define VTD_PAGE_MASK_2M    (~((1UL << VTD_PAGE_SHIFT_2M) - 1))
+#define VTD_PAGE_SHIFT_1G   (30)
+#define VTD_PAGE_MASK_1G    (~((1UL << VTD_PAGE_SHIFT_1G) - 1))
+
+/* Root-Entry
+ * 0: Present
+ * 1-11: Reserved
+ * 12-63: Context-table Pointer
+ * 64-127: Reserved
+ */
+struct vtd_root_entry {
+    uint64_t val;
+    uint64_t rsvd;
+};
+typedef struct vtd_root_entry vtd_root_entry;
+
+/* Masks for struct vtd_root_entry */
+#define ROOT_ENTRY_P (1ULL << 0)
+#define ROOT_ENTRY_CTP  (~0xfffULL)
+
+#define ROOT_ENTRY_NR   (VTD_PAGE_SIZE / sizeof(vtd_root_entry))
+
+
+/* Context-Entry */
+struct vtd_context_entry {
+    uint64_t lo;
+    uint64_t hi;
+};
+typedef struct vtd_context_entry vtd_context_entry;
+
+/* Masks for struct vtd_context_entry */
+/* lo */
+#define CONTEXT_ENTRY_P (1ULL << 0)
+#define CONTEXT_ENTRY_FPD   (1ULL << 1) /* Fault Processing Disable */
+#define CONTEXT_ENTRY_TT    (3ULL << 2) /* Translation Type */
+#define CONTEXT_TT_MULTI_LEVEL  (0)
+#define CONTEXT_TT_DEV_IOTLB    (1)
+#define CONTEXT_TT_PASS_THROUGH (2)
+/* Second Level Page Translation Pointer*/
+#define CONTEXT_ENTRY_SLPTPTR   (~0xfffULL)
+
+/* hi */
+#define CONTEXT_ENTRY_AW    (7ULL) /* Adjusted guest-address-width */
+#define CONTEXT_ENTRY_DID   (0xffffULL << 8)    /* Domain Identifier */
+
+
+#define CONTEXT_ENTRY_NR    (VTD_PAGE_SIZE / sizeof(vtd_context_entry))
+
+
+/* Paging Structure common */
+#define SL_PT_PAGE_SIZE_MASK   (1ULL << 7)
+#define SL_LEVEL_BITS   9   /* Bits to decide the offset for each level */
+
+/* Second Level Paging Structure */
+#define SL_PML4_LEVEL 4
+#define SL_PDP_LEVEL 3
+#define SL_PD_LEVEL 2
+#define SL_PT_LEVEL 1
+
+#define SL_PT_ENTRY_NR  512
+#define SL_PT_BASE_ADDR_MASK  (~(uint64_t)(VTD_PAGE_SIZE - 1))
+
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/3] intel-iommu: add DMAR table to ACPI tables
  2014-07-22 15:47 [Qemu-devel] [PATCH 0/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
@ 2014-07-22 15:47 ` Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch Le Tan
  2 siblings, 0 replies; 12+ messages in thread
From: Le Tan @ 2014-07-22 15:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Knut Omang, Le Tan, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

Expose Intel IOMMU to the BIOS. If object of TYPE_INTEL_IOMMU_DEVICE exists,
add DMAR table to ACPI RSDT table. For now the DMAR table indicates that there
is only one hardware unit without INTR_REMAP capability on the platform.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
---
 hw/i386/acpi-build.c | 41 ++++++++++++++++++++++++++++++
 hw/i386/acpi-defs.h  | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebc5f03..8241621 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -45,6 +45,7 @@
 #include "hw/i386/ich9.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
+#include "hw/i386/intel_iommu.h"
 
 #include "hw/i386/q35-acpi-dsdt.hex"
 #include "hw/i386/acpi-dsdt.hex"
@@ -1316,6 +1317,31 @@ build_mcfg_q35(GArray *table_data, GArray *linker, AcpiMcfgInfo *info)
 }
 
 static void
+build_dmar_q35(GArray *table_data, GArray *linker)
+{
+    int dmar_start = table_data->len;
+
+    AcpiTableDmar *dmar;
+    AcpiDmarHardwareUnit *drhd;
+
+    dmar = acpi_data_push(table_data, sizeof(*dmar));
+    dmar->host_address_width = 0x26;    /* 0x26 + 1 = 39 */
+    dmar->flags = 0;    /* No intr_remap for now */
+
+    /* DMAR Remapping Hardware Unit Definition structure */
+    drhd = acpi_data_push(table_data, sizeof(*drhd));
+    drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
+    drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
+    drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
+    drhd->pci_segment = cpu_to_le16(0);
+    drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
+
+    build_header(linker, table_data, (void *)(table_data->data + dmar_start),
+                 "DMAR", table_data->len - dmar_start, 1);
+}
+
+
+static void
 build_dsdt(GArray *table_data, GArray *linker, AcpiMiscInfo *misc)
 {
     AcpiTableHeader *dsdt;
@@ -1436,6 +1462,17 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
     return true;
 }
 
+static bool acpi_has_iommu(void)
+{
+    bool ambiguous;
+    Object *intel_iommu;
+
+    intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
+                                           &ambiguous);
+    return intel_iommu && !ambiguous;
+}
+
+
 static
 void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
 {
@@ -1497,6 +1534,10 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
         acpi_add_table(table_offsets, tables->table_data);
         build_mcfg_q35(tables->table_data, tables->linker, &mcfg);
     }
+    if (acpi_has_iommu()) {
+        acpi_add_table(table_offsets, tables->table_data);
+        build_dmar_q35(tables->table_data, tables->linker);
+    }
 
     /* Add tables supplied by user (if any) */
     for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
diff --git a/hw/i386/acpi-defs.h b/hw/i386/acpi-defs.h
index e93babb..9674825 100644
--- a/hw/i386/acpi-defs.h
+++ b/hw/i386/acpi-defs.h
@@ -314,4 +314,74 @@ struct AcpiTableMcfg {
 } QEMU_PACKED;
 typedef struct AcpiTableMcfg AcpiTableMcfg;
 
+/* DMAR - DMA Remapping table r2.2 */
+struct AcpiTableDmar {
+    ACPI_TABLE_HEADER_DEF
+    uint8_t host_address_width; /* Maximum DMA physical addressability */
+    uint8_t flags;
+    uint8_t reserved[10];
+} QEMU_PACKED;
+typedef struct AcpiTableDmar AcpiTableDmar;
+
+/* Masks for Flags field above */
+#define ACPI_DMAR_INTR_REMAP    (1)
+#define ACPI_DMAR_X2APIC_OPT_OUT    (2)
+
+/*
+ * DMAR sub-structures (Follow DMA Remapping table)
+ */
+#define ACPI_DMAR_SUB_HEADER_DEF /* Common ACPI DMAR sub-structure header */\
+    uint16_t type;  \
+    uint16_t length;
+
+/* Values for sub-structure type for DMAR */
+enum {
+    ACPI_DMAR_TYPE_HARDWARE_UNIT = 0,   /* DRHD */
+    ACPI_DMAR_TYPE_RESERVED_MEMORY = 1, /* RMRR */
+    ACPI_DMAR_TYPE_ATSR = 2,    /* ATSR */
+    ACPI_DMAR_TYPE_HARDWARE_AFFINITY = 3,   /* RHSR */
+    ACPI_DMAR_TYPE_ANDD = 4,    /* ANDD */
+    ACPI_DMAR_TYPE_RESERVED = 5 /* Reserved for furture use */
+};
+
+/*
+ * Sub-structures for DMAR, correspond to Type in ACPI_DMAR_SUB_HEADER_DEF
+ */
+
+/* DMAR Device Scope structures */
+struct AcpiDmarDeviceScope {
+    uint8_t type;
+    uint8_t length;
+    uint16_t reserved;
+    uint8_t enumeration_id;
+    uint8_t start_bus_number;
+    uint8_t path[0];
+} QEMU_PACKED;
+typedef struct AcpiDmarDeviceScope AcpiDmarDeviceScope;
+
+/* Values for type in struct AcpiDmarDeviceScope */
+enum {
+    ACPI_DMAR_SCOPE_TYPE_NOT_USED = 0,
+    ACPI_DMAR_SCOPE_TYPE_ENDPOINT = 1,
+    ACPI_DMAR_SCOPE_TYPE_BRIDGE = 2,
+    ACPI_DMAR_SCOPE_TYPE_IOAPIC = 3,
+    ACPI_DMAR_SCOPE_TYPE_HPET = 4,
+    ACPI_DMAR_SCOPE_TYPE_ACPI = 5,
+    ACPI_DMAR_SCOPE_TYPE_RESERVED = 6 /* Reserved for future use */
+};
+
+/* 0: Hardware Unit Definition */
+struct AcpiDmarHardwareUnit {
+    ACPI_DMAR_SUB_HEADER_DEF
+    uint8_t flags;
+    uint8_t reserved;
+    uint16_t pci_segment;   /* The PCI Segment associated with this unit */
+    uint64_t address;   /* Base address of remapping hardware register-set */
+} QEMU_PACKED;
+typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit;
+
+/* Masks for Flags field above */
+#define ACPI_DMAR_INCLUDE_PCI_ALL (1)
+
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch
  2014-07-22 15:47 [Qemu-devel] [PATCH 0/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 2/3] intel-iommu: add DMAR table to ACPI tables Le Tan
@ 2014-07-22 15:47 ` Le Tan
  2014-07-26  8:47   ` Jan Kiszka
  2 siblings, 1 reply; 12+ messages in thread
From: Le Tan @ 2014-07-22 15:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, Knut Omang, Le Tan, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

Add Intel IOMMU emulation to q35 chipset and expose it to the guest.
1. Add a machine option. Users can use "-machine vtd=on|off" in the command
line to enable/disable Intel IOMMU. The default is off.
2. Accroding to the machine option, q35 will initialize the Intel IOMMU and
use pci_setup_iommu() to setup q35_host_dma_iommu() as the IOMMU function for
the pci bus.
3. q35_host_dma_iommu() will return different address space according to the
bus_num and devfn of the device.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
---
 hw/core/machine.c         | 27 ++++++++++++++++--
 hw/pci-host/q35.c         | 72 +++++++++++++++++++++++++++++++++++++++++++----
 include/hw/boards.h       |  1 +
 include/hw/pci-host/q35.h |  2 ++
 qemu-options.hx           |  5 +++-
 vl.c                      |  4 +++
 6 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cbba679..1be9ef2 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -235,6 +235,20 @@ static void machine_set_firmware(Object *obj, const char *value, Error **errp)
     ms->firmware = g_strdup(value);
 }
 
+static bool machine_get_vtd(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->vtd;
+}
+
+static void machine_set_vtd(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->vtd = value;
+}
+
 static void machine_initfn(Object *obj)
 {
     object_property_add_str(obj, "accel",
@@ -270,10 +284,17 @@ static void machine_initfn(Object *obj)
                              machine_set_dump_guest_core,
                              NULL);
     object_property_add_bool(obj, "mem-merge",
-                             machine_get_mem_merge, machine_set_mem_merge, NULL);
-    object_property_add_bool(obj, "usb", machine_get_usb, machine_set_usb, NULL);
+                             machine_get_mem_merge,
+                             machine_set_mem_merge, NULL);
+    object_property_add_bool(obj, "usb",
+                             machine_get_usb,
+                             machine_set_usb, NULL);
     object_property_add_str(obj, "firmware",
-                            machine_get_firmware, machine_set_firmware, NULL);
+                            machine_get_firmware,
+                            machine_set_firmware, NULL);
+    object_property_add_bool(obj, "vtd",
+                             machine_get_vtd,
+                             machine_set_vtd, NULL);
 }
 
 static void machine_finalize(Object *obj)
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index a0a3068..933f76e 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -347,6 +347,61 @@ static void mch_reset(DeviceState *qdev)
     mch_update(mch);
 }
 
+static AddressSpace *q35_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
+{
+    IntelIOMMUState *s = opaque;
+    VTDAddressSpace **pvtd_as;
+    VTDAddressSpace *vtd_as;
+    int bus_num = pci_bus_num(bus);
+
+    assert(devfn >= 0);
+
+    pvtd_as = s->address_spaces[bus_num];
+    if (!pvtd_as) {
+        /* No corresponding free() */
+        pvtd_as = g_malloc0(sizeof(VTDAddressSpace *) *
+                            VTD_PCI_SLOT_MAX * VTD_PCI_FUNC_MAX);
+        s->address_spaces[bus_num] = pvtd_as;
+    }
+
+    vtd_as = *(pvtd_as + devfn);
+    if (!vtd_as) {
+        vtd_as = g_malloc0(sizeof(*vtd_as));
+        *(pvtd_as + devfn) = vtd_as;
+
+        vtd_as->bus_num = bus_num;
+        vtd_as->devfn = devfn;
+        vtd_as->iommu_state = s;
+        memory_region_init_iommu(&vtd_as->iommu, OBJECT(s), &s->iommu_ops,
+                                 "intel_iommu", UINT64_MAX);
+        address_space_init(&vtd_as->as, &vtd_as->iommu, "intel_iommu");
+    }
+
+    return &vtd_as->as;
+}
+
+static void mch_init_dmar(MCHPCIState *mch)
+{
+    Error *error = NULL;
+    PCIBus *pci_bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
+
+    mch->iommu = INTEL_IOMMU_DEVICE(object_new(TYPE_INTEL_IOMMU_DEVICE));
+    qdev_set_parent_bus(DEVICE(mch->iommu), sysbus_get_default());
+    object_property_set_bool(OBJECT(mch->iommu), true, "realized", &error);
+
+    if (error) {
+        fprintf(stderr, "%s\n", error_get_pretty(error));
+        error_free(error);
+        return;
+    }
+
+    memory_region_add_subregion(mch->pci_address_space,
+                                Q35_HOST_BRIDGE_IOMMU_ADDR,
+                                &mch->iommu->csrmem);
+    pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
+}
+
+
 static int mch_init(PCIDevice *d)
 {
     int i;
@@ -363,13 +418,20 @@ static int mch_init(PCIDevice *d)
     memory_region_add_subregion_overlap(mch->system_memory, 0xa0000,
                                         &mch->smram_region, 1);
     memory_region_set_enabled(&mch->smram_region, false);
-    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory, mch->pci_address_space,
-             &mch->pam_regions[0], PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+             mch->pci_address_space, &mch->pam_regions[0], PAM_BIOS_BASE,
+             PAM_BIOS_SIZE);
     for (i = 0; i < 12; ++i) {
-        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory, mch->pci_address_space,
-                 &mch->pam_regions[i+1], PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE,
-                 PAM_EXPAN_SIZE);
+        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+                 mch->pci_address_space, &mch->pam_regions[i+1],
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
+    /* Intel IOMMU (VT-d) */
+    if (qemu_opt_get_bool(qemu_get_machine_opts(), "vtd", false)) {
+        mch_init_dmar(mch);
     }
+
     return 0;
 }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 605a970..1c03566 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -123,6 +123,7 @@ struct MachineState {
     bool mem_merge;
     bool usb;
     char *firmware;
+    bool vtd;
 
     ram_addr_t ram_size;
     ram_addr_t maxram_size;
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index d9ee978..025d6e6 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -33,6 +33,7 @@
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/ich9.h"
 #include "hw/pci-host/pam.h"
+#include "hw/i386/intel_iommu.h"
 
 #define TYPE_Q35_HOST_DEVICE "q35-pcihost"
 #define Q35_HOST_DEVICE(obj) \
@@ -60,6 +61,7 @@ typedef struct MCHPCIState {
     uint64_t pci_hole64_size;
     PcGuestInfo *guest_info;
     uint32_t short_root_bus;
+    IntelIOMMUState *iommu;
 } MCHPCIState;
 
 typedef struct Q35PCIHost {
diff --git a/qemu-options.hx b/qemu-options.hx
index 9e54686..cd54d30 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -35,7 +35,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                kernel_irqchip=on|off controls accelerated irqchip support\n"
     "                kvm_shadow_mem=size of KVM shadow MMU\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
-    "                mem-merge=on|off controls memory merge support (default: on)\n",
+    "                mem-merge=on|off controls memory merge support (default: on)\n"
+    "                vtd=on|off controls emulated Intel IOMMU (VT-d) support (default=off)\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -58,6 +59,8 @@ Include guest memory in a core dump. The default is on.
 Enables or disables memory merge support. This feature, when supported by
 the host, de-duplicates identical memory pages among VMs instances
 (enabled by default).
+@item vtd=on|off
+Enables or disables emulated Intel IOMMU (VT-d) support. The default is off.
 @end table
 ETEXI
 
diff --git a/vl.c b/vl.c
index 6abedcf..7031873 100644
--- a/vl.c
+++ b/vl.c
@@ -387,6 +387,10 @@ static QemuOptsList qemu_machine_opts = {
             .name = PC_MACHINE_MAX_RAM_BELOW_4G,
             .type = QEMU_OPT_SIZE,
             .help = "maximum ram below the 4G boundary (32bit boundary)",
+        },{
+            .name = "vtd",
+            .type = QEMU_OPT_BOOL,
+            .help = "Set on/off to enable/disable Intel IOMMU (VT-d)",
         },
         { /* End of list */ }
     },
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
@ 2014-07-22 20:05   ` Michael S. Tsirkin
  2014-07-23  1:25     ` Le Tan
  2014-07-23  7:58   ` Paolo Bonzini
  2014-07-23 20:29   ` Stefan Weil
  2 siblings, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2014-07-22 20:05 UTC (permalink / raw)
  To: Le Tan
  Cc: Knut Omang, qemu-devel, Alex Williamson, Jan Kiszka,
	Anthony Liguori, Paolo Bonzini

On Tue, Jul 22, 2014 at 11:47:48PM +0800, Le Tan wrote:
> Add support for emulating Intel IOMMU according to the VT-d specification for
> the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
> PASID support. Use register-based invalidation for context-cache invalidation
> and IOTLB invalidation.
> Basic fault reporting and caching are not implemented yet.
> 
> Signed-off-by: Le Tan <tamlokveer@gmail.com>
> ---
>  hw/i386/Makefile.objs         |    1 +
>  hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/i386/intel_iommu.h |  350 +++++++++++++
>  3 files changed, 1490 insertions(+)
>  create mode 100644 hw/i386/intel_iommu.c
>  create mode 100644 include/hw/i386/intel_iommu.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 48014ab..6936111 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o smbios.o
>  obj-y += pc.o pc_piix.o pc_q35.o
>  obj-y += pc_sysfw.o
> +obj-y += intel_iommu.o
>  obj-$(CONFIG_XEN) += ../xenpv/ xen/
>  
>  obj-y += kvmvapic.o
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> new file mode 100644
> index 0000000..3ba0e1e
> --- /dev/null
> +++ b/hw/i386/intel_iommu.c
> @@ -0,0 +1,1139 @@
> +/*
> + * QEMU emulation of an Intel IOMMU (VT-d)
> + *   (DMA Remapping device)
> + *
> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + */
> +
> +#include "hw/sysbus.h"
> +#include "exec/address-spaces.h"
> +#include "hw/i386/intel_iommu.h"
> +
> +/* #define DEBUG_INTEL_IOMMU */
> +#ifdef DEBUG_INTEL_IOMMU
> +#define D(fmt, ...) \
> +    do { fprintf(stderr, "(vtd)%s: " fmt "\n", __func__, \
> +                 ## __VA_ARGS__); } while (0)
> +#else
> +#define D(fmt, ...) \
> +    do { } while (0)
> +#endif
> +


Way too short for a macro name, might conflict with some
header you include.
You are polluting the global namespace.
Best to prefix everything by INTEL_IOMMU_ and intel_iommu_.


> +
> +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
> +                        uint64_t wmask, uint64_t w1cmask)
> +{
> +    *((uint64_t *)&s->csr[addr]) = val;
> +    *((uint64_t *)&s->wmask[addr]) = wmask;
> +    *((uint64_t *)&s->w1cmask[addr]) = w1cmask;
> +}
> +
> +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
> +                                  uint64_t mask)
> +{
> +    *((uint64_t *)&s->womask[addr]) = mask;
> +}
> +
> +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
> +                        uint32_t wmask, uint32_t w1cmask)
> +{
> +    *((uint32_t *)&s->csr[addr]) = val;
> +    *((uint32_t *)&s->wmask[addr]) = wmask;
> +    *((uint32_t *)&s->w1cmask[addr]) = w1cmask;
> +}
> +
> +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
> +                                  uint32_t mask)
> +{
> +    *((uint32_t *)&s->womask[addr]) = mask;
> +}
> +
> +/* "External" get/set operations */
> +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
> +{
> +    uint64_t oldval = *((uint64_t *)&s->csr[addr]);
> +    uint64_t wmask = *((uint64_t *)&s->wmask[addr]);
> +    uint64_t w1cmask = *((uint64_t *)&s->w1cmask[addr]);
> +    *((uint64_t *)&s->csr[addr]) =
> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
> +}
> +
> +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
> +{
> +    uint32_t oldval = *((uint32_t *)&s->csr[addr]);
> +    uint32_t wmask = *((uint32_t *)&s->wmask[addr]);
> +    uint32_t w1cmask = *((uint32_t *)&s->w1cmask[addr]);
> +    *((uint32_t *)&s->csr[addr]) =
> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
> +}
> +
> +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
> +{
> +    uint64_t val = *((uint64_t *)&s->csr[addr]);
> +    uint64_t womask = *((uint64_t *)&s->womask[addr]);
> +    return val & ~womask;
> +}
> +
> +
> +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
> +{
> +    uint32_t val = *((uint32_t *)&s->csr[addr]);
> +    uint32_t womask = *((uint32_t *)&s->womask[addr]);
> +    return val & ~womask;
> +}
> +
> +
> +
> +/* "Internal" get/set operations */
> +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)
> +{
> +    return *((uint64_t *)&s->csr[addr]);
> +}
> +
> +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)
> +{
> +    return *((uint32_t *)&s->csr[addr]);
> +}

we don't allow starting names with __ in qemu.

> +
> +
> +/* val = (val & ~clear) | mask */
what does this comment mean? best to remove it imho

> +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,
> +                                     uint32_t clear, uint32_t mask)
> +{
> +    uint32_t *ptr = (uint32_t *)&s->csr[addr];
> +    uint32_t val = (*ptr & ~clear) | mask;
> +    *ptr = val;
> +    return val;
> +}
> +
> +/* val = (val & ~clear) | mask */

what does this comment mean? best to remove it imho

> +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,
> +                                     uint64_t clear, uint64_t mask)
> +{
> +    uint64_t *ptr = (uint64_t *)&s->csr[addr];
> +    uint64_t val = (*ptr & ~clear) | mask;
> +    *ptr = val;
> +    return val;
> +}

Above looks suspicious endian-ness wise.
Use APIs to fix this, will remove need for so
many casts at the same time.

I generally don't believe these wrappers buy you much.
You could just open-code and it would be clearer.


> +
> +
> +
> +
> +

Don't put more than 1 empty line between functions.

> +static inline bool root_entry_present(vtd_root_entry *root)
> +{
> +    return root->val & ROOT_ENTRY_P;
> +}
> +
> +
> +static bool get_root_entry(IntelIOMMUState *s, int index, vtd_root_entry *re)
> +{
> +    dma_addr_t addr;
> +
> +    assert(index >= 0 && index < ROOT_ENTRY_NR);
> +
> +    addr = s->root + index * sizeof(*re);
> +    if (dma_memory_read(&address_space_memory, addr, re, sizeof(*re))) {
> +        fprintf(stderr, "(vtd)error: fail to read root table\n");

Will flood log for management.
Can guest trigger this?
When?
Maybe just assert?

> +        return false;
> +    }
> +    re->val = le64_to_cpu(re->val);
> +    return true;
> +}
> +
> +
> +static inline bool context_entry_present(vtd_context_entry *context)
> +{
> +    return context->lo & CONTEXT_ENTRY_P;
> +}
> +
> +static bool get_context_entry_from_root(vtd_root_entry *root, int index,
> +                                        vtd_context_entry *ce)
> +{
> +    dma_addr_t addr;
> +
> +    if (!root_entry_present(root)) {
> +        ce->lo = 0;
> +        ce->hi = 0;
> +        return false;
> +    }
> +
> +    assert(index >= 0 && index < CONTEXT_ENTRY_NR);
> +
> +    addr = (root->val & ROOT_ENTRY_CTP) + index * sizeof(*ce);
> +    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
> +        fprintf(stderr, "(vtd)error: fail to read context table\n");
> +        return false;
> +    }
> +    ce->lo = le64_to_cpu(ce->lo);
> +    ce->hi = le64_to_cpu(ce->hi);
> +    return true;
> +}
> +
> +static inline dma_addr_t get_slpt_base_from_context(vtd_context_entry *ce)
> +{
> +    return ce->lo & CONTEXT_ENTRY_SLPTPTR;
> +}
> +
> +
> +/* The shift of an addr for a certain level of paging structure */
> +static inline int slpt_level_shift(int level)
> +{
> +    return VTD_PAGE_SHIFT_4K + (level - 1) * SL_LEVEL_BITS;
> +}
> +
> +static inline bool slpte_present(uint64_t slpte)
> +{
> +    return slpte & 3;
> +}
> +
> +/* Calculate the GPA given the base address, the index in the page table and
> + * the level of this page table.
> + */
> +static inline uint64_t get_slpt_gpa(uint64_t addr, int index, int level)
> +{
> +    return addr + (((uint64_t)index) << slpt_level_shift(level));
> +}
> +
> +static inline uint64_t get_slpte_addr(uint64_t slpte)
> +{
> +    return slpte & SL_PT_BASE_ADDR_MASK;
> +}
> +
> +/* Whether the pte points to a large page */
> +static inline bool is_large_pte(uint64_t pte)
> +{
> +    return pte & SL_PT_PAGE_SIZE_MASK;
> +}
> +
> +/* Whether the pte indicates the address of the page frame */
> +static inline bool is_last_slpte(uint64_t slpte, int level)
> +{
> +    if (level == SL_PT_LEVEL) {
> +        return true;
> +    }
> +    if (is_large_pte(slpte)) {
> +        return true;
> +    }
> +    return false;
> +}
> +
> +/* Get the content of a spte located in @base_addr[@index] */
> +static inline uint64_t get_slpte(dma_addr_t base_addr, int index)
> +{
> +    uint64_t slpte;
> +
> +    assert(index >= 0 && index < SL_PT_ENTRY_NR);
> +
> +    if (dma_memory_read(&address_space_memory,
> +                        base_addr + index * sizeof(slpte), &slpte,
> +                        sizeof(slpte))) {
> +        fprintf(stderr, "(vtd)error: fail to read slpte\n");
> +        return (uint64_t)-1;
> +    }
> +    slpte = le64_to_cpu(slpte);
> +    return slpte;
> +}
> +
> +#if 0

drop dead code or make it easy to enable.

> +static inline void print_slpt(dma_addr_t base_addr)
> +{
> +    int i;
> +    uint64_t slpte;
> +
> +    D("slpt at addr 0x%"PRIx64 "===========", base_addr);
> +    for (i = 0; i < SL_PT_ENTRY_NR; i++) {
> +        slpte = get_slpte(base_addr, i);
> +        D("slpte #%d 0x%"PRIx64, i, slpte);
> +    }
> +}
> +
> +static void print_root_table(IntelIOMMUState *s)
> +{
> +    int i;
> +    vtd_root_entry re;
> +    D("root-table=====================");
> +    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
> +        get_root_entry(s, i, &re);
> +        if (root_entry_present(&re)) {
> +            D("root-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
> +              re.val);
> +        }
> +    }
> +}
> +
> +static void print_context_table(vtd_root_entry *re)
> +{
> +    int i;
> +    vtd_context_entry ce;
> +    D("context-table==================");
> +    for (i = 0; i < CONTEXT_ENTRY_NR; ++i) {
> +        get_context_entry_from_root(re, i, &ce);
> +        if (context_entry_present(&ce)) {
> +            D("context-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, ce.hi,
> +              ce.lo);
> +        }
> +    }
> +}
> +#endif
> +
> +/* Given a gpa and the level of paging structure, return the offset of current
> + * level.
> + */
> +static inline int gpa_level_offset(uint64_t gpa, int level)
> +{
> +    return (gpa >> slpt_level_shift(level)) & ((1ULL << SL_LEVEL_BITS) - 1);
> +}
> +
> +/* Get the page-table level that hardware should use for the second-level
> + * page-table walk from the Address Width field of context-entry.
> + */
> +static inline int get_level_from_context_entry(vtd_context_entry *ce)
> +{
> +    return 2 + (ce->hi & CONTEXT_ENTRY_AW);
> +}
> +
> +/* Given the @gpa, return relevant slpte. @slpte_level will be the last level
> + * of the translation, can be used for deciding the size of large page.
> + */
> +static uint64_t gpa_to_slpte(vtd_context_entry *ce, uint64_t gpa,
> +                             int *slpte_level)
> +{
> +    dma_addr_t addr = get_slpt_base_from_context(ce);
> +    int level = get_level_from_context_entry(ce);
> +    int offset;
> +    uint64_t slpte;
> +
> +    /* D("slpt_base 0x%"PRIx64, addr); */

drop comments like this or uncomment.

> +    while (true) {
> +        offset = gpa_level_offset(gpa, level);
> +        slpte = get_slpte(addr, offset);
> +        /* D("level %d slpte 0x%"PRIx64, level, slpte); */
> +        if (!slpte_present(slpte)) {
> +            D("error: slpte 0x%"PRIx64 " is not present", slpte);
> +            slpte = (uint64_t)-1;
> +            *slpte_level = level;
> +            break;
> +        }
> +        if (is_last_slpte(slpte, level)) {
> +            *slpte_level = level;
> +            break;
> +        }
> +        addr = get_slpte_addr(slpte);
> +        level--;
> +    }
> +    return slpte;
> +}
> +
> +/* Do a paging-structures walk to do a iommu translation
> + * @bus_num: The bus number
> + * @devfn: The devfn, which is the  combined of device and function number
> + * @entry: IOMMUTLBEntry that contain the addr to be translated and result
> + */
> +static void iommu_translate(IntelIOMMUState *s, int bus_num, int devfn,
> +                            hwaddr addr, IOMMUTLBEntry *entry)
> +{
> +    vtd_root_entry re;
> +    vtd_context_entry ce;
> +    uint64_t slpte;
> +    int level;
> +    uint64_t page_mask = VTD_PAGE_MASK_4K;
> +
> +
> +    if (!get_root_entry(s, bus_num, &re)) {
> +        /* FIXME */
> +        return;
> +    }
> +    if (!root_entry_present(&re)) {
> +        /* FIXME */
> +        D("error: root-entry #%d is not present", bus_num);
> +        return;
> +    }
> +    /* D("root-entry low 0x%"PRIx64, re.val); */
> +    if (!get_context_entry_from_root(&re, devfn, &ce)) {
> +        /* FIXME */
> +        return;
> +    }
> +    if (!context_entry_present(&ce)) {
> +        /* FIXME */
> +        D("error: context-entry #%d(bus #%d) is not present", devfn, bus_num);
> +        return;
> +    }
> +    /* D("context-entry hi 0x%"PRIx64 " low 0x%"PRIx64, ce.hi, ce.lo); */
> +    slpte = gpa_to_slpte(&ce, addr, &level);
> +    if (slpte == (uint64_t)-1) {
> +        /* FIXME */
> +        D("error: can't get slpte for gpa %"PRIx64, addr);
> +        return;
> +    }

what's the plan for these FIXME's?

> +
> +    if (is_large_pte(slpte)) {
> +        if (level == SL_PDP_LEVEL) {
> +            /* 1-GB page */
> +            page_mask = VTD_PAGE_MASK_1G;
> +        } else {
> +            /* 2-MB page */
> +            page_mask = VTD_PAGE_MASK_2M;
> +        }
> +    }
> +
> +    entry->iova = addr & page_mask;
> +    entry->translated_addr = get_slpte_addr(slpte) & page_mask;
> +    entry->addr_mask = ~page_mask;
> +    entry->perm = IOMMU_RW;
> +}
> +
> +#if 0
> +/* Iterate a Second Level Page Table */
> +static void __walk_slpt(dma_addr_t table_addr, int level, uint64_t gpa)
> +{
> +    int index;
> +    uint64_t next_gpa;
> +    dma_addr_t next_table_addr;
> +    uint64_t slpte;
> +
> +    if (level < SL_PT_LEVEL) {
> +        return;
> +    }
> +
> +
> +    for (index = 0; index < SL_PT_ENTRY_NR; ++index) {
> +        slpte = get_slpte(table_addr, index);
> +        if (!slpte_present(slpte)) {
> +            continue;
> +        }
> +        D("level %d, index 0x%x, gpa 0x%"PRIx64, level, index, gpa);
> +        next_gpa = get_slpt_gpa(gpa, index, level);
> +        next_table_addr = get_slpte_addr(slpte);
> +
> +        if (is_last_slpte(slpte, level)) {
> +            D("slpte gpa 0x%"PRIx64 ", hpa 0x%"PRIx64, next_gpa,
> +              next_table_addr);
> +            if (next_gpa != next_table_addr) {
> +                D("Not 1:1 mapping, slpte 0x%"PRIx64, slpte);
> +            }
> +        } else {
> +            __walk_slpt(next_table_addr, level - 1, next_gpa);
> +        }
> +    }
> +}
> +
> +
> +static void print_paging_structure_from_context(vtd_context_entry *ce)
> +{
> +    dma_addr_t table_addr;
> +
> +    table_addr = get_slpt_base_from_context(ce);
> +    __walk_slpt(table_addr, SL_PDP_LEVEL, 0);
> +}
> +
> +
> +static void print_root_table_all(IntelIOMMUState *s)
> +{
> +    int i;
> +    int j;
> +    vtd_root_entry re;
> +    vtd_context_entry ce;
> +
> +    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
> +        if (get_root_entry(s, i, &re) && root_entry_present(&re)) {
> +            D("root_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
> +              re.val);
> +
> +            for (j = 0; j < CONTEXT_ENTRY_NR; ++j) {
> +                if (get_context_entry_from_root(&re, j, &ce)
> +                    && context_entry_present(&ce)) {
> +                    D("context_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64,
> +                      j, ce.hi, ce.lo);
> +                    D("--------------------------------");
> +                    print_paging_structure_from_context(&ce);
> +                }
> +            }
> +        }
> +    }
> +}
> +#endif
> +
> +
> +
> +
> +static void vtd_root_table_setup(IntelIOMMUState *s)
> +{
> +    s->root = *((uint64_t *)&s->csr[DMAR_RTADDR_REG]);
> +    s->extended = s->root & VTD_RTADDR_RTT;
> +    s->root &= ~0xfff;
> +    D("root_table addr 0x%"PRIx64 " %s", s->root,
> +      (s->extended ? "(Extended)" : ""));
> +}
> +
> +/* Context-cache invalidation
> + * Returns the Context Actual Invalidation Granularity.
> + * @val: the content of the CCMD_REG
> + */
> +static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
> +{
> +    uint64_t caig;
> +    uint64_t type = val & VTD_CCMD_CIRG_MASK;
> +
> +    switch (type) {
> +    case VTD_CCMD_GLOBAL_INVL:
> +        D("Global invalidation request");
> +        caig = VTD_CCMD_GLOBAL_INVL_A;
> +        break;
> +
> +    case VTD_CCMD_DOMAIN_INVL:
> +        D("Domain-selective invalidation request");
> +        caig = VTD_CCMD_DOMAIN_INVL_A;
> +        break;
> +
> +    case VTD_CCMD_DEVICE_INVL:
> +        D("Domain-selective invalidation request");
> +        caig = VTD_CCMD_DEVICE_INVL_A;
> +        break;
> +
> +    default:
> +        D("error: wrong context-cache invalidation granularity");
> +        caig = 0;
> +    }
> +
> +    return caig;
> +}
> +
> +
> +/* Flush IOTLB
> + * Returns the IOTLB Actual Invalidation Granularity.
> + * @val: the content of the IOTLB_REG
> + */
> +static uint64_t vtd_iotlb_flush(IntelIOMMUState *s, uint64_t val)
> +{
> +    uint64_t iaig;
> +    uint64_t type = val & VTD_TLB_FLUSH_GRANU_MASK;
> +
> +    switch (type) {
> +    case VTD_TLB_GLOBAL_FLUSH:
> +        D("Global IOTLB flush");
> +        iaig = VTD_TLB_GLOBAL_FLUSH_A;
> +        break;
> +
> +    case VTD_TLB_DSI_FLUSH:
> +        D("Domain-selective IOTLB flush");
> +        iaig = VTD_TLB_DSI_FLUSH_A;
> +        break;
> +
> +    case VTD_TLB_PSI_FLUSH:
> +        D("Page-selective-within-domain IOTLB flush");
> +        iaig = VTD_TLB_PSI_FLUSH_A;
> +        break;
> +
> +    default:
> +        D("error: wrong iotlb flush granularity");
> +        iaig = 0;
> +    }
> +
> +    return iaig;
> +}
> +
> +
> +#if 0
> +static void iommu_inv_queue_setup(IntelIOMMUState *s)
> +{
> +    uint64_t tail_val;
> +    s->iq = *((uint64_t *)&s->csr[DMAR_IQA_REG]);
> +    s->iq_sz = 0x100 << (s->iq & 0x7);  /* 256 entries per page */
> +    s->iq &= ~0x7;
> +    s->iq_enable = true;
> +
> +    /* Init head pointers */
> +    tail_val = *((uint64_t *)&s->csr[DMAR_IQT_REG]);
> +    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = tail_val;
> +    s->iq_head = s->iq_tail = (tail_val >> 4) & 0x7fff;
> +    D(" -- address: 0x%lx size 0x%lx", s->iq, s->iq_sz);
> +}
> +
> +
> +static int handle_invalidate(IntelIOMMUState *s, uint16_t i)
> +{
> +    intel_iommu_inv_desc entry;
> +    uint8_t type;
> +    dma_memory_read(&address_space_memory, s->iq + sizeof(entry) * i, &entry,
> +                    sizeof(entry));
> +    type = entry.lower & 0xf;
> +    D("Processing invalidate request %d - desc: %016lx.%016lx", i,
> +      entry.upper, entry.lower);
> +    switch (type) {
> +    case CONTEXT_CACHE_INV_DESC:
> +        D("Context-cache Invalidate");
> +        break;
> +    case IOTLB_INV_DESC:
> +        D("IOTLB Invalidate");
> +        break;
> +    case INV_WAIT_DESC:
> +        D("Invalidate Wait");
> +        if (status_write(entry.lower)) {
> +            dma_memory_write(&address_space_memory, entry.upper,
> +                             (uint8_t *)&entry.lower + 4, 4);
> +        }
> +        break;
> +    default:
> +        D(" - not impl - ");
> +    }
> +    return 0;
> +}
> +
> +
> +static void handle_iqt_write(IntelIOMMUState *s, uint64_t val)
> +{
> +    s->iq_tail = (val >> 4) & 0x7fff;
> +    D("Write to IQT_REG new tail = %d", s->iq_tail);
> +
> +    if (!s->iq_enable) {
> +        return;
> +    }
> +
> +    /* Process the invalidation queue */
> +    while (s->iq_head != s->iq_tail) {
> +        handle_invalidate(s, s->iq_head++);
> +        if (s->iq_head == s->iq_sz) {
> +            s->iq_head = 0;
> +        }
> +    }
> +    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = s->iq_head << 4;
> +
> +    set_quad(s, DMAR_IQT_REG, val);
> +}
> +#endif
> +
> +/* FIXME: Not implemented yet */
> +static void handle_gcmd_qie(IntelIOMMUState *s, bool en)
> +{
> +    D("Queued Invalidation Enable %s", (en ? "on" : "off"));
> +
> +    /*if (en) {
> +        iommu_inv_queue_setup(s);
> +    }*/
> +
> +    /* Ok - report back to driver */
> +    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_QIES);
> +}
> +
> +
> +/* Set Root Table Pointer */
> +static void handle_gcmd_srtp(IntelIOMMUState *s)
> +{
> +    D("set Root Table Pointer");
> +
> +    vtd_root_table_setup(s);
> +    /* Ok - report back to driver */
> +    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
> +}
> +
> +
> +/* Handle Translation Enable/Disable */
> +static void handle_gcmd_te(IntelIOMMUState *s, bool en)
> +{
> +    D("Translation Enable %s", (en ? "on" : "off"));
> +
> +    if (en) {
> +        /* Ok - report back to driver */
> +        set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
> +    } else {
> +        /* Ok - report back to driver */
> +        set_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0);
> +    }
> +}
> +
> +/* Handle write to Global Command Register */
> +static void handle_gcmd_write(IntelIOMMUState *s)
> +{
> +    uint32_t status = __get_long(s, DMAR_GSTS_REG);
> +    uint32_t val = __get_long(s, DMAR_GCMD_REG);
> +    uint32_t changed = status ^ val;
> +
> +    D("value 0x%x status 0x%x", val, status);
> +    if (changed & VTD_GCMD_TE) {
> +        /* Translation enable/disable */
> +        handle_gcmd_te(s, val & VTD_GCMD_TE);
> +    } else if (val & VTD_GCMD_SRTP) {
> +        /* Set/update the root-table pointer */
> +        handle_gcmd_srtp(s);
> +    } else if (changed & VTD_GCMD_QIE) {
> +        /* Queued Invalidation Enable */
> +        handle_gcmd_qie(s, val & VTD_GCMD_QIE);
> +    } else {
> +        D("error: unhandled gcmd write");
> +    }
> +}
> +
> +/* Handle write to Context Command Register */
> +static void handle_ccmd_write(IntelIOMMUState *s)
> +{
> +    uint64_t ret;
> +    uint64_t val = __get_quad(s, DMAR_CCMD_REG);
> +
> +    /* Context-cache invalidation request */
> +    if (val & VTD_CCMD_ICC) {
> +        ret = vtd_context_cache_invalidate(s, val);
> +
> +        /* Invalidation completed. Change something to show */
> +        set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_ICC, 0ULL);
> +        ret = set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_CAIG_MASK, ret);
> +        D("CCMD_REG write-back val: 0x%"PRIx64, ret);
> +    }
> +}
> +
> +/* Handle write to IOTLB Invalidation Register */
> +static void handle_iotlb_write(IntelIOMMUState *s)
> +{
> +    uint64_t ret;
> +    uint64_t val = __get_quad(s, DMAR_IOTLB_REG);
> +
> +    /* IOTLB invalidation request */
> +    if (val & VTD_TLB_IVT) {
> +        ret = vtd_iotlb_flush(s, val);
> +
> +        /* Invalidation completed. Change something to show */
> +        set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_IVT, 0ULL);
> +        ret = set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_FLUSH_GRANU_MASK_A, ret);
> +        D("IOTLB_REG write-back val: 0x%"PRIx64, ret);
> +    }
> +}
> +
> +
> +static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    IntelIOMMUState *s = opaque;
> +    uint64_t val;
> +
> +    if (addr + size > DMAR_REG_SIZE) {
> +        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
> +          DMAR_REG_SIZE, addr, size);
> +        return (uint64_t)-1;
> +    }
> +
> +    assert(size == 4 || size == 8);
> +
> +    switch (addr) {
> +    /* Root Table Address Register, 64-bit */
> +    case DMAR_RTADDR_REG:
> +        if (size == 4) {
> +            val = (uint32_t)s->root;
> +        } else {
> +            val = s->root;
> +        }
> +        break;
> +
> +    case DMAR_RTADDR_REG_HI:
> +        assert(size == 4);
> +        val = s->root >> 32;
> +        break;
> +
> +    default:
> +        if (size == 4) {
> +            val = get_long(s, addr);
> +        } else {
> +            val = get_quad(s, addr);
> +        }
> +    }
> +
> +    D("addr 0x%"PRIx64 " size %d val 0x%"PRIx64, addr, size, val);
> +    return val;
> +}
> +
> +static void vtd_mem_write(void *opaque, hwaddr addr,
> +                          uint64_t val, unsigned size)
> +{
> +    IntelIOMMUState *s = opaque;
> +
> +    if (addr + size > DMAR_REG_SIZE) {
> +        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
> +          DMAR_REG_SIZE, addr, size);
> +        return;
> +    }
> +
> +    assert(size == 4 || size == 8);
> +
> +    /* Val should be written into csr within the handler */
> +    switch (addr) {
> +    /* Global Command Register, 32-bit */
> +    case DMAR_GCMD_REG:
> +        D("DMAR_GCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        set_long(s, addr, val);
> +        handle_gcmd_write(s);
> +        break;
> +
> +    /* Invalidation Queue Tail Register, 64-bit */
> +    /*case DMAR_IQT_REG:
> +        if (size == 4) {
> +
> +        }
> +        if (size == 4) {
> +            if (former_size == 0) {
> +                former_size = size;
> +                former_value = val;
> +                goto out;
> +            } else {
> +                val = (val << 32) + former_value;
> +                former_size = 0;
> +                former_value = 0;
> +            }
> +        }
> +        handle_iqt_write(s, val);
> +        break;*/
> +
> +    /* Context Command Register, 64-bit */
> +    case DMAR_CCMD_REG:
> +        D("DMAR_CCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        if (size == 4) {
> +            set_long(s, addr, val);
> +        } else {
> +            set_quad(s, addr, val);
> +            handle_ccmd_write(s);
> +        }
> +        break;
> +
> +    case DMAR_CCMD_REG_HI:
> +        D("DMAR_CCMD_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        assert(size == 4);
> +        set_long(s, addr, val);
> +        handle_ccmd_write(s);
> +        break;
> +
> +
> +    /* IOTLB Invalidation Register, 64-bit */
> +    case DMAR_IOTLB_REG:
> +        D("DMAR_IOTLB_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        if (size == 4) {
> +            set_long(s, addr, val);
> +        } else {
> +            set_quad(s, addr, val);
> +            handle_iotlb_write(s);
> +        }
> +        break;
> +
> +    case DMAR_IOTLB_REG_HI:
> +        D("DMAR_IOTLB_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        assert(size == 4);
> +        set_long(s, addr, val);
> +        handle_iotlb_write(s);
> +        break;
> +
> +    /* Fault Status Register, 32-bit */
> +    case DMAR_FSTS_REG:
> +    /* Fault Event Data Register, 32-bit */
> +    case DMAR_FEDATA_REG:
> +    /* Fault Event Address Register, 32-bit */
> +    case DMAR_FEADDR_REG:
> +    /* Fault Event Upper Address Register, 32-bit */
> +    case DMAR_FEUADDR_REG:
> +    /* Fault Event Control Register, 32-bit */
> +    case DMAR_FECTL_REG:
> +    /* Protected Memory Enable Register, 32-bit */
> +    case DMAR_PMEN_REG:
> +        D("known reg write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        set_long(s, addr, val);
> +        break;
> +
> +
> +    /* Root Table Address Register, 64-bit */
> +    case DMAR_RTADDR_REG:
> +        D("DMAR_RTADDR_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        if (size == 4) {
> +            set_long(s, addr, val);
> +        } else {
> +            set_quad(s, addr, val);
> +        }
> +        break;
> +
> +    case DMAR_RTADDR_REG_HI:
> +        D("DMAR_RTADDR_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
> +          addr, size, val);
> +        assert(size == 4);
> +        set_long(s, addr, val);
> +        break;
> +
> +    default:
> +        D("error: unhandled reg write addr 0x%"PRIx64
> +          ", size %d, val 0x%"PRIx64, addr, size, val);
> +        if (size == 4) {
> +            set_long(s, addr, val);
> +        } else {
> +            set_quad(s, addr, val);
> +        }
> +    }
> +
> +}
> +
> +
> +static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr)
> +{
> +    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    int bus_num = vtd_as->bus_num;
> +    int devfn = vtd_as->devfn;
> +    IOMMUTLBEntry ret = {
> +        .target_as = &address_space_memory,
> +        .iova = 0,
> +        .translated_addr = 0,
> +        .addr_mask = ~(hwaddr)0,
> +        .perm = IOMMU_NONE,
> +    };
> +
> +    if (!(__get_long(s, DMAR_GSTS_REG) & VTD_GSTS_TES)) {
> +        /* DMAR disabled, passthrough, use 4k-page*/
> +        ret.iova = addr & VTD_PAGE_MASK_4K;
> +        ret.translated_addr = addr & VTD_PAGE_MASK_4K;
> +        ret.addr_mask = ~VTD_PAGE_MASK_4K;
> +        ret.perm = IOMMU_RW;
> +        return ret;
> +    }
> +
> +    iommu_translate(s, bus_num, devfn, addr, &ret);
> +
> +    D("bus %d slot %d func %d devfn %d gpa %"PRIx64 " hpa %"PRIx64,
> +      bus_num, VTD_PCI_SLOT(devfn), VTD_PCI_FUNC(devfn), devfn, addr,
> +      ret.translated_addr);
> +    return ret;
> +}
> +
> +static const VMStateDescription vtd_vmstate = {
> +    .name = "iommu_intel",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8_ARRAY(csr, IntelIOMMUState, DMAR_REG_SIZE),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +
> +static const MemoryRegionOps vtd_mem_ops = {
> +    .read = vtd_mem_read,
> +    .write = vtd_mem_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +    .valid = {
> +        .min_access_size = 4,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +
> +static Property iommu_properties[] = {
> +    DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +/* Do the real initialization. It will also be called when reset, so pay
> + * attention when adding new initialization stuff.
> + */
> +static void do_vtd_init(IntelIOMMUState *s)
> +{
> +    memset(s->csr, 0, DMAR_REG_SIZE);
> +    memset(s->wmask, 0, DMAR_REG_SIZE);
> +    memset(s->w1cmask, 0, DMAR_REG_SIZE);
> +    memset(s->womask, 0, DMAR_REG_SIZE);
> +
> +    s->iommu_ops.translate = vtd_iommu_translate;
> +    s->root = 0;
> +    s->extended = false;
> +
> +    /* b.0:2 = 6: Number of domains supported: 64K using 16 bit ids
> +     * b.3   = 0: No advanced fault logging
> +     * b.4   = 0: No required write buffer flushing
> +     * b.5   = 0: Protected low memory region not supported
> +     * b.6   = 0: Protected high memory region not supported
> +     * b.8:12 = 2: SAGAW(Supported Adjusted Guest Address Widths), 39-bit,
> +     *             3-level page-table
> +     * b.16:21 = 38: MGAW(Maximum Guest Address Width) = 39
> +     * b.22 = 1: ZLR(Zero Length Read) supports zero length DMA read requests
> +     *           to write-only pages
> +     * b.24:33 = 34: FRO(Fault-recording Register offset)
> +     * b.54 = 0: DWD(Write Draining), draining of write requests not supported
> +     * b.55 = 0: DRD(Read Draining), draining of read requests not supported
> +     */
> +    const uint64_t dmar_cap_reg_value = 0x00400000ULL | VTD_CAP_FRO |
> +                                        VTD_CAP_NFR | VTD_CAP_ND |
> +                                        VTD_CAP_MGAW | VTD_SAGAW_39bit;
> +
> +    /* b.1 = 0: QI(Queued Invalidation support) not supported
> +     * b.2 = 0: DT(Device-TLB support)
> +     * b.3 = 0: IR(Interrupt Remapping support) not supported
> +     * b.4 = 0: EIM(Extended Interrupt Mode) not supported
> +     * b.8:17 = 15: IRO(IOTLB Register Offset)
> +     * b.20:23 = 15: MHMV(Maximum Handle Mask Value)
> +     */
> +    const uint64_t dmar_ecap_reg_value = 0xf00000ULL | VTD_ECAP_IRO;
> +
> +    /* Define registers with default values and bit semantics */
> +    define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);  /* set MAX = 1, RO */
> +    define_quad(s, DMAR_CAP_REG, dmar_cap_reg_value, 0, 0);
> +    define_quad(s, DMAR_ECAP_REG, dmar_ecap_reg_value, 0, 0);
> +    define_long(s, DMAR_GCMD_REG, 0, 0xff800000UL, 0);
> +    define_long_wo(s, DMAR_GCMD_REG, 0xff800000UL);
> +    define_long(s, DMAR_GSTS_REG, 0, 0, 0); /* All bits RO, default 0 */
> +    define_quad(s, DMAR_RTADDR_REG, 0, 0xfffffffffffff000ULL, 0);
> +    define_quad(s, DMAR_CCMD_REG, 0, 0xe0000003ffffffffULL, 0);
> +    define_quad_wo(s, DMAR_CCMD_REG, 0x3ffff0000ULL);
> +    define_long(s, DMAR_FSTS_REG, 0, 0, 0xfdUL);
> +    define_long(s, DMAR_FECTL_REG, 0x80000000UL, 0x80000000UL, 0);
> +    define_long(s, DMAR_FEDATA_REG, 0, 0xffffffffUL, 0); /* All bits RW */
> +    define_long(s, DMAR_FEADDR_REG, 0, 0xfffffffcUL, 0); /* 31:2 RW */
> +    define_long(s, DMAR_FEUADDR_REG, 0, 0xffffffffUL, 0); /* 31:0 RW */
> +
> +    define_quad(s, DMAR_AFLOG_REG, 0, 0xffffffffffffff00ULL, 0);
> +
> +    /* Treated as RO for implementations that PLMR and PHMR fields reported
> +     * as Clear in the CAP_REG.
> +     * define_long(s, DMAR_PMEN_REG, 0, 0x80000000UL, 0);
> +     */
> +    define_long(s, DMAR_PMEN_REG, 0, 0, 0);
> +
> +    /* TBD: The definition of these are dynamic:
> +     * DMAR_PLMBASE_REG, DMAR_PLMLIMIT_REG, DMAR_PHMBASE_REG, DMAR_PHMLIMIT_REG
> +     */
> +
> +    /* Bits 18:4 (0x7fff0) is RO, rest is RsvdZ
> +     * IQH_REG is treated as RsvdZ when not supported in ECAP_REG
> +     * define_quad(s, DMAR_IQH_REG, 0, 0, 0);
> +     */
> +    define_quad(s, DMAR_IQH_REG, 0, 0, 0);
> +
> +    /* IQT_REG and IQA_REG is treated as RsvdZ when not supported in ECAP_REG
> +     * define_quad(s, DMAR_IQT_REG, 0, 0x7fff0ULL, 0);
> +     * define_quad(s, DMAR_IQA_REG, 0, 0xfffffffffffff007ULL, 0);
> +     */
> +    define_quad(s, DMAR_IQT_REG, 0, 0, 0);
> +    define_quad(s, DMAR_IQA_REG, 0, 0, 0);
> +
> +    /* Bit 0 is RW1CS - rest is RsvdZ */
> +    define_long(s, DMAR_ICS_REG, 0, 0, 0x1UL);
> +
> +    /* b.31 is RW, b.30 RO, rest: RsvdZ */
> +    define_long(s, DMAR_IECTL_REG, 0x80000000UL, 0x80000000UL, 0);
> +
> +    define_long(s, DMAR_IEDATA_REG, 0, 0xffffffffUL, 0);
> +    define_long(s, DMAR_IEADDR_REG, 0, 0xfffffffcUL, 0);
> +    define_long(s, DMAR_IEUADDR_REG, 0, 0xffffffffUL, 0);
> +    define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
> +    define_quad(s, DMAR_PQH_REG, 0, 0x7fff0ULL, 0);
> +    define_quad(s, DMAR_PQT_REG, 0, 0x7fff0ULL, 0);
> +    define_quad(s, DMAR_PQA_REG, 0, 0xfffffffffffff007ULL, 0);
> +    define_long(s, DMAR_PRS_REG, 0, 0, 0x1UL);
> +    define_long(s, DMAR_PECTL_REG, 0x80000000UL, 0x80000000UL, 0);
> +    define_long(s, DMAR_PEDATA_REG, 0, 0xffffffffUL, 0);
> +    define_long(s, DMAR_PEADDR_REG, 0, 0xfffffffcUL, 0);
> +    define_long(s, DMAR_PEUADDR_REG, 0, 0xffffffffUL, 0);
> +
> +    /* When MTS not supported in ECAP_REG, these regs are RsvdZ */
> +    define_long(s, DMAR_MTRRCAP_REG, 0, 0, 0);
> +    define_long(s, DMAR_MTRRDEF_REG, 0, 0, 0);
> +
> +    /* IOTLB registers */
> +    define_quad(s, DMAR_IOTLB_REG, 0, 0Xb003ffff00000000ULL, 0);
> +    define_quad(s, DMAR_IVA_REG, 0, 0xfffffffffffff07fULL, 0);
> +    define_quad_wo(s, DMAR_IVA_REG, 0xfffffffffffff07fULL);
> +}
> +
> +#if 0
> +/* Iterate IntelIOMMUState->address_spaces[] and free any allocated memory */
> +static void clean_address_space(IntelIOMMUState *s)
> +{
> +    VTDAddressSpace **pvtd_as;
> +    VTDAddressSpace *vtd_as;
> +    int i;
> +    int j;
> +    const int MAX_DEVFN = VTD_PCI_SLOT_MAX * VTD_PCI_FUNC_MAX;
> +
> +    for (i = 0; i < VTD_PCI_BUS_MAX; ++i) {
> +        pvtd_as = s->address_spaces[i];
> +        if (!pvtd_as) {
> +            continue;
> +        }
> +        for (j = 0; j < MAX_DEVFN; ++j) {
> +            vtd_as = *(pvtd_as + j);
> +            if (!vtd_as) {
> +                continue;
> +            }
> +            g_free(vtd_as);
> +            *(pvtd_as + j) = 0;
> +        }
> +        g_free(pvtd_as);
> +        s->address_spaces[i] = 0;
> +    }
> +}
> +#endif
> +
> +/* Reset function of QOM
> + * Should not reset address_spaces when reset
> + */
> +static void vtd_reset(DeviceState *dev)
> +{
> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
> +
> +    D("");
> +    do_vtd_init(s);
> +}
> +
> +/* Initializatoin function of QOM */

typo

> +static void vtd_realize(DeviceState *dev, Error **errp)
> +{
> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
> +
> +    D("");
> +    memset(s->address_spaces, 0, sizeof(s->address_spaces));
> +    memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
> +                          "intel_iommu", DMAR_REG_SIZE);
> +
> +    do_vtd_init(s);
> +}
> +
> +static void vtd_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->reset = vtd_reset;
> +    dc->realize = vtd_realize;
> +    dc->vmsd = &vtd_vmstate;
> +    dc->props = iommu_properties;
> +}
> +
> +static const TypeInfo vtd_info = {
> +    .name          = TYPE_INTEL_IOMMU_DEVICE,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(IntelIOMMUState),
> +    .class_init    = vtd_class_init,
> +};
> +
> +static void vtd_register_types(void)
> +{
> +    D("");
> +    type_register_static(&vtd_info);
> +}
> +
> +type_init(vtd_register_types)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> new file mode 100644
> index 0000000..af5aff8
> --- /dev/null
> +++ b/include/hw/i386/intel_iommu.h


the only user seems to be hw/i386/intel_iommu.c
so you can move this header to hw/i386/ - include is
for external headers.

> @@ -0,0 +1,350 @@
> +/*
> + * QEMU emulation of an Intel IOMMU (VT-d)
> + *   (DMA Remapping device)
> + *
> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + *
> + * Lots of defines copied from kernel/include/linux/intel-iommu.h:
> + *   Copyright (C) 2006-2008 Intel Corporation
> + *   Author: Ashok Raj <ashok.raj@intel.com>
> + *   Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> + *
> + */
> +
> +#ifndef _INTEL_IOMMU_H
> +#define _INTEL_IOMMU_H
> +#include "hw/qdev.h"
> +#include "sysemu/dma.h"
> +
> +#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
> +#define INTEL_IOMMU_DEVICE(obj) \
> +     OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
> +
> +/* DMAR Hardware Unit Definition address (IOMMU unit) set i bios */
> +#define Q35_HOST_BRIDGE_IOMMU_ADDR 0xfed90000ULL
> +
> +/*
> + * Intel IOMMU register specification per version 1.0 public spec.
> + */
> +
> +#define DMAR_VER_REG    0x0 /* Arch version supported by this IOMMU */
> +#define DMAR_CAP_REG    0x8 /* Hardware supported capabilities */
> +#define DMAR_CAP_REG_HI 0xc /* High 32-bit of DMAR_CAP_REG */
> +#define DMAR_ECAP_REG   0x10    /* Extended capabilities supported */
> +#define DMAR_ECAP_REG_HI    0X14
> +#define DMAR_GCMD_REG   0x18    /* Global command register */
> +#define DMAR_GSTS_REG   0x1c    /* Global status register */
> +#define DMAR_RTADDR_REG 0x20    /* Root entry table */
> +#define DMAR_RTADDR_REG_HI  0X24
> +#define DMAR_CCMD_REG   0x28  /* Context command reg */
> +#define DMAR_CCMD_REG_HI    0x2c
> +#define DMAR_FSTS_REG   0x34  /* Fault Status register */
> +#define DMAR_FECTL_REG  0x38 /* Fault control register */
> +#define DMAR_FEDATA_REG 0x3c    /* Fault event interrupt data register */
> +#define DMAR_FEADDR_REG 0x40    /* Fault event interrupt addr register */
> +#define DMAR_FEUADDR_REG    0x44   /* Upper address register */
> +#define DMAR_AFLOG_REG  0x58 /* Advanced Fault control */
> +#define DMAR_AFLOG_REG_HI   0X5c
> +#define DMAR_PMEN_REG   0x64  /* Enable Protected Memory Region */
> +#define DMAR_PLMBASE_REG    0x68    /* PMRR Low addr */
> +#define DMAR_PLMLIMIT_REG 0x6c  /* PMRR low limit */
> +#define DMAR_PHMBASE_REG 0x70   /* pmrr high base addr */
> +#define DMAR_PHMBASE_REG_HI 0X74
> +#define DMAR_PHMLIMIT_REG 0x78  /* pmrr high limit */
> +#define DMAR_PHMLIMIT_REG_HI 0x7c
> +#define DMAR_IQH_REG    0x80   /* Invalidation queue head register */
> +#define DMAR_IQH_REG_HI 0X84
> +#define DMAR_IQT_REG    0x88   /* Invalidation queue tail register */
> +#define DMAR_IQT_REG_HI 0X8c
> +#define DMAR_IQ_SHIFT   4 /* Invalidation queue head/tail shift */
> +#define DMAR_IQA_REG    0x90   /* Invalidation queue addr register */
> +#define DMAR_IQA_REG_HI 0x94
> +#define DMAR_ICS_REG    0x9c   /* Invalidation complete status register */
> +#define DMAR_IRTA_REG   0xb8    /* Interrupt remapping table addr register */
> +#define DMAR_IRTA_REG_HI    0xbc
> +
> +/* From Vt-d 2.2 spec */
> +#define DMAR_IECTL_REG  0xa0    /* Invalidation event control register */
> +#define DMAR_IEDATA_REG 0xa4    /* Invalidation event data register */
> +#define DMAR_IEADDR_REG 0xa8    /* Invalidation event address register */
> +#define DMAR_IEUADDR_REG 0xac    /* Invalidation event address register */
> +#define DMAR_PQH_REG    0xc0    /* Page request queue head register */
> +#define DMAR_PQH_REG_HI 0xc4
> +#define DMAR_PQT_REG    0xc8    /* Page request queue tail register*/
> +#define DMAR_PQT_REG_HI     0xcc
> +#define DMAR_PQA_REG    0xd0    /* Page request queue address register */
> +#define DMAR_PQA_REG_HI 0xd4
> +#define DMAR_PRS_REG    0xdc    /* Page request status register */
> +#define DMAR_PECTL_REG  0xe0    /* Page request event control register */
> +#define DMAR_PEDATA_REG 0xe4    /* Page request event data register */
> +#define DMAR_PEADDR_REG 0xe8    /* Page request event address register */
> +#define DMAR_PEUADDR_REG  0xec  /* Page event upper address register */
> +#define DMAR_MTRRCAP_REG 0x100  /* MTRR capability register */
> +#define DMAR_MTRRCAP_REG_HI 0x104
> +#define DMAR_MTRRDEF_REG 0x108  /* MTRR default type register */
> +#define DMAR_MTRRDEF_REG_HI 0x10c
> +
> +/* IOTLB */
> +#define DMAR_IOTLB_REG_OFFSET 0xf0  /* Offset to the IOTLB registers */
> +#define DMAR_IVA_REG DMAR_IOTLB_REG_OFFSET  /* Invalidate Address Register */
> +#define DMAR_IVA_REG_HI (DMAR_IVA_REG + 4)
> +/* IOTLB Invalidate Register */
> +#define DMAR_IOTLB_REG (DMAR_IOTLB_REG_OFFSET + 0x8)
> +#define DMAR_IOTLB_REG_HI (DMAR_IOTLB_REG + 4)
> +
> +/* FRCD */
> +#define DMAR_FRCD_REG_OFFSET 0x220 /* Offset to the Fault Recording Registers */
> +#define DMAR_FRCD_REG_NR 1 /* Num of Fault Recording Registers */
> +
> +#define DMAR_REG_SIZE   (DMAR_FRCD_REG_OFFSET + 128 * DMAR_FRCD_REG_NR)
> +
> +#define VTD_PCI_BUS_MAX 256
> +#define VTD_PCI_SLOT_MAX 32
> +#define VTD_PCI_FUNC_MAX 8
> +#define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
> +#define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
> +
> +typedef struct IntelIOMMUState IntelIOMMUState;
> +typedef struct VTDAddressSpace VTDAddressSpace;
> +
> +struct VTDAddressSpace {
> +    int bus_num;
> +    int devfn;
> +    AddressSpace as;
> +    MemoryRegion iommu;
> +    IntelIOMMUState *iommu_state;
> +};
> +
> +/* The iommu (DMAR) device state struct */
> +struct IntelIOMMUState {
> +    SysBusDevice busdev;
> +    MemoryRegion csrmem;
> +    uint8_t csr[DMAR_REG_SIZE];     /* register values */
> +    uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
> +    uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
> +    uint8_t womask[DMAR_REG_SIZE]; /* WO (write only - read returns 0) */
> +    uint32_t version;
> +
> +    dma_addr_t root;  /* Current root table pointer */
> +    bool extended;    /* Type of root table (extended or not) */
> +    uint16_t iq_head; /* Current invalidation queue head */
> +    uint16_t iq_tail; /* Current invalidation queue tail */
> +    dma_addr_t iq;   /* Current invalidation queue (IQ) pointer */
> +    size_t iq_sz;    /* IQ Size in number of entries */
> +    bool iq_enable;  /* Set if the IQ is enabled */
> +
> +    MemoryRegionIOMMUOps iommu_ops;
> +    VTDAddressSpace **address_spaces[VTD_PCI_BUS_MAX];
> +};
> +
> +
> +/* An invalidate descriptor */
> +typedef struct intel_iommu_inv_desc {
> +    uint64_t lower;
> +    uint64_t upper;
> +} intel_iommu_inv_desc;

voilates qemu coding style.

> +
> +
> +/* Invalidate descriptor types */
> +#define CONTEXT_CACHE_INV_DESC  0x1
> +#define PASID_CACHE_INV_DESC    0x7
> +#define IOTLB_INV_DESC          0x2
> +#define EXT_IOTLB_INV_DESC      0x6
> +#define DEV_TLB_INV_DESC        0x3
> +#define EXT_DEV_TLB_INV_DESC    0x8
> +#define INT_ENTRY_INV_DESC      0x4
> +#define INV_WAIT_DESC           0x5

Pls use consistent prefix for everything.
E.g, INTEL_IOMMU_ ...

> +
> +
> +/* IOTLB_REG */
> +#define VTD_TLB_GLOBAL_FLUSH (1ULL << 60) /* Global invalidation */
> +#define VTD_TLB_DSI_FLUSH (2ULL << 60)  /* Domain-selective invalidation */
> +#define VTD_TLB_PSI_FLUSH (3ULL << 60)  /* Page-selective invalidation */
> +#define VTD_TLB_FLUSH_GRANU_MASK (3ULL << 60)
> +#define VTD_TLB_GLOBAL_FLUSH_A (1ULL << 57)
> +#define VTD_TLB_DSI_FLUSH_A (2ULL << 57)
> +#define VTD_TLB_PSI_FLUSH_A (3ULL << 57)
> +#define VTD_TLB_FLUSH_GRANU_MASK_A (3ULL << 57)
> +#define VTD_TLB_READ_DRAIN (1ULL << 49)
> +#define VTD_TLB_WRITE_DRAIN (1ULL << 48)
> +#define VTD_TLB_DID(id) (((uint64_t)((id) & 0xffffULL)) << 32)
> +#define VTD_TLB_IVT (1ULL << 63)
> +#define VTD_TLB_IH_NONLEAF (1ULL << 6)
> +#define VTD_TLB_MAX_SIZE (0x3f)
> +
> +/* INVALID_DESC */
> +#define DMA_CCMD_INVL_GRANU_OFFSET  61
> +#define DMA_ID_TLB_GLOBAL_FLUSH (((uint64_t)1) << 3)
> +#define DMA_ID_TLB_DSI_FLUSH    (((uint64_t)2) << 3)
> +#define DMA_ID_TLB_PSI_FLUSH    (((uint64_t)3) << 3)
> +#define DMA_ID_TLB_READ_DRAIN   (((uint64_t)1) << 7)
> +#define DMA_ID_TLB_WRITE_DRAIN  (((uint64_t)1) << 6)
> +#define DMA_ID_TLB_DID(id)  (((uint64_t)((id & 0xffff) << 16)))
> +#define DMA_ID_TLB_IH_NONLEAF   (((uint64_t)1) << 6)
> +#define DMA_ID_TLB_ADDR(addr)   (addr)
> +#define DMA_ID_TLB_ADDR_MASK(mask)  (mask)
> +
> +/* PMEN_REG */
> +#define DMA_PMEN_EPM (((uint32_t)1)<<31)
> +#define DMA_PMEN_PRS (((uint32_t)1)<<0)
> +
> +/* GCMD_REG */
> +#define VTD_GCMD_TE (1UL << 31)
> +#define VTD_GCMD_SRTP (1UL << 30)
> +#define VTD_GCMD_SFL (1UL << 29)
> +#define VTD_GCMD_EAFL (1UL << 28)
> +#define VTD_GCMD_WBF (1UL << 27)
> +#define VTD_GCMD_QIE (1UL << 26)
> +#define VTD_GCMD_IRE (1UL << 25)
> +#define VTD_GCMD_SIRTP (1UL << 24)
> +#define VTD_GCMD_CFI (1UL << 23)
> +
> +/* GSTS_REG */
> +#define VTD_GSTS_TES (1UL << 31)
> +#define VTD_GSTS_RTPS (1UL << 30)
> +#define VTD_GSTS_FLS (1UL << 29)
> +#define VTD_GSTS_AFLS (1UL << 28)
> +#define VTD_GSTS_WBFS (1UL << 27)
> +#define VTD_GSTS_QIES (1UL << 26)
> +#define VTD_GSTS_IRES (1UL << 25)
> +#define VTD_GSTS_IRTPS (1UL << 24)
> +#define VTD_GSTS_CFIS (1UL << 23)
> +
> +/* CCMD_REG */
> +#define VTD_CCMD_ICC (1ULL << 63)
> +#define VTD_CCMD_GLOBAL_INVL (1ULL << 61)
> +#define VTD_CCMD_DOMAIN_INVL (2ULL << 61)
> +#define VTD_CCMD_DEVICE_INVL (3ULL << 61)
> +#define VTD_CCMD_CIRG_MASK (3ULL << 61)
> +#define VTD_CCMD_GLOBAL_INVL_A (1ULL << 59)
> +#define VTD_CCMD_DOMAIN_INVL_A (2ULL << 59)
> +#define VTD_CCMD_DEVICE_INVL_A (3ULL << 59)
> +#define VTD_CCMD_CAIG_MASK (3ULL << 59)
> +#define VTD_CCMD_FM(m) (((uint64_t)((m) & 3ULL)) << 32)
> +#define VTD_CCMD_MASK_NOBIT 0
> +#define VTD_CCMD_MASK_1BIT 1
> +#define VTD_CCMD_MASK_2BIT 2
> +#define VTD_CCMD_MASK_3BIT 3
> +#define VTD_CCMD_SID(s) (((uint64_t)((s) & 0xffffULL)) << 16)
> +#define VTD_CCMD_DID(d) ((uint64_t)((d) & 0xffffULL))
> +
> +/* FECTL_REG */
> +#define DMA_FECTL_IM (((uint32_t)1) << 31)
> +
> +/* FSTS_REG */
> +#define DMA_FSTS_PPF ((uint32_t)2)
> +#define DMA_FSTS_PFO ((uint32_t)1)
> +#define DMA_FSTS_IQE (1 << 4)
> +#define DMA_FSTS_ICE (1 << 5)
> +#define DMA_FSTS_ITE (1 << 6)
> +#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
> +
> +/* RTADDR_REG */
> +#define VTD_RTADDR_RTT (1ULL << 11)
> +
> +
> +/* ECAP_REG */
> +#define VTD_ECAP_IRO (DMAR_IOTLB_REG_OFFSET << 4)   /* (val >> 4) << 8 */
> +
> +/* CAP_REG */
> +
> +/* (val >> 4) << 24 */
> +#define VTD_CAP_FRO     ((uint64_t)DMAR_FRCD_REG_OFFSET << 20)

cast not needed

> +
> +#define VTD_CAP_NFR     ((uint64_t)(DMAR_FRCD_REG_NR - 1) << 40)

make DMAR_FRCD_REG_NR 1ULL and cast won't be needed.

> +#define VTD_DOMAIN_ID_SHIFT     16  /* 16-bit domain id for 64K domains */
> +#define VTD_CAP_ND  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
> +#define VTD_MGAW    39  /* Maximum Guest Address Width */
> +#define VTD_CAP_MGAW    (((VTD_MGAW - 1) & 0x3fULL) << 16)
> +/* Supported Adjusted Guest Address Widths */
> +#define VTD_SAGAW_MASK  (0x1fULL << 8)
> +#define VTD_SAGAW_39bit (0x2ULL << 8)   /* 39-bit AGAW, 3-level page-table */
> +#define VTD_SAGAW_48bit (0x4ULL << 8)   /* 48-bit AGAW, 4-level page-table */
> +
> +
> +/* Pagesize of VTD paging structures, including root and context tables */
> +#define VTD_PAGE_SHIFT      (12)
> +#define VTD_PAGE_SIZE       (1UL << VTD_PAGE_SHIFT)
> +
> +#define VTD_PAGE_SHIFT_4K   (12)
> +#define VTD_PAGE_MASK_4K    (~((1ULL << VTD_PAGE_SHIFT_4K) - 1))
> +#define VTD_PAGE_SHIFT_2M   (21)
> +#define VTD_PAGE_MASK_2M    (~((1UL << VTD_PAGE_SHIFT_2M) - 1))
> +#define VTD_PAGE_SHIFT_1G   (30)
> +#define VTD_PAGE_MASK_1G    (~((1UL << VTD_PAGE_SHIFT_1G) - 1))
> +
> +/* Root-Entry
> + * 0: Present
> + * 1-11: Reserved
> + * 12-63: Context-table Pointer
> + * 64-127: Reserved
> + */
> +struct vtd_root_entry {
> +    uint64_t val;
> +    uint64_t rsvd;
> +};
> +typedef struct vtd_root_entry vtd_root_entry;

voilates qemu coding style

> +
> +/* Masks for struct vtd_root_entry */
> +#define ROOT_ENTRY_P (1ULL << 0)
> +#define ROOT_ENTRY_CTP  (~0xfffULL)
> +
> +#define ROOT_ENTRY_NR   (VTD_PAGE_SIZE / sizeof(vtd_root_entry))
> +
> +
> +/* Context-Entry */
> +struct vtd_context_entry {
> +    uint64_t lo;
> +    uint64_t hi;
> +};
> +typedef struct vtd_context_entry vtd_context_entry;

same

> +
> +/* Masks for struct vtd_context_entry */
> +/* lo */
> +#define CONTEXT_ENTRY_P (1ULL << 0)
> +#define CONTEXT_ENTRY_FPD   (1ULL << 1) /* Fault Processing Disable */
> +#define CONTEXT_ENTRY_TT    (3ULL << 2) /* Translation Type */
> +#define CONTEXT_TT_MULTI_LEVEL  (0)
> +#define CONTEXT_TT_DEV_IOTLB    (1)
> +#define CONTEXT_TT_PASS_THROUGH (2)
> +/* Second Level Page Translation Pointer*/
> +#define CONTEXT_ENTRY_SLPTPTR   (~0xfffULL)
> +
> +/* hi */
> +#define CONTEXT_ENTRY_AW    (7ULL) /* Adjusted guest-address-width */
> +#define CONTEXT_ENTRY_DID   (0xffffULL << 8)    /* Domain Identifier */
> +
> +
> +#define CONTEXT_ENTRY_NR    (VTD_PAGE_SIZE / sizeof(vtd_context_entry))
> +
> +
> +/* Paging Structure common */
> +#define SL_PT_PAGE_SIZE_MASK   (1ULL << 7)
> +#define SL_LEVEL_BITS   9   /* Bits to decide the offset for each level */
> +
> +/* Second Level Paging Structure */
> +#define SL_PML4_LEVEL 4
> +#define SL_PDP_LEVEL 3
> +#define SL_PD_LEVEL 2
> +#define SL_PT_LEVEL 1
> +
> +#define SL_PT_ENTRY_NR  512
> +#define SL_PT_BASE_ADDR_MASK  (~(uint64_t)(VTD_PAGE_SIZE - 1))
> +
> +
> +#endif
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-22 20:05   ` Michael S. Tsirkin
@ 2014-07-23  1:25     ` Le Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Le Tan @ 2014-07-23  1:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Knut Omang, qemu-devel, Alex Williamson, Jan Kiszka,
	Anthony Liguori, Paolo Bonzini

Hi Michael,
Thanks very much for your careful reviewing!

2014-07-23 4:05 GMT+08:00 Michael S. Tsirkin <mst@redhat.com>:
> On Tue, Jul 22, 2014 at 11:47:48PM +0800, Le Tan wrote:
>> Add support for emulating Intel IOMMU according to the VT-d specification for
>> the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
>> PASID support. Use register-based invalidation for context-cache invalidation
>> and IOTLB invalidation.
>> Basic fault reporting and caching are not implemented yet.
>>
>> Signed-off-by: Le Tan <tamlokveer@gmail.com>
>> ---
>>  hw/i386/Makefile.objs         |    1 +
>>  hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
>>  include/hw/i386/intel_iommu.h |  350 +++++++++++++
>>  3 files changed, 1490 insertions(+)
>>  create mode 100644 hw/i386/intel_iommu.c
>>  create mode 100644 include/hw/i386/intel_iommu.h
>>
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 48014ab..6936111 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o smbios.o
>>  obj-y += pc.o pc_piix.o pc_q35.o
>>  obj-y += pc_sysfw.o
>> +obj-y += intel_iommu.o
>>  obj-$(CONFIG_XEN) += ../xenpv/ xen/
>>
>>  obj-y += kvmvapic.o
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> new file mode 100644
>> index 0000000..3ba0e1e
>> --- /dev/null
>> +++ b/hw/i386/intel_iommu.c
>> @@ -0,0 +1,1139 @@
>> +/*
>> + * QEMU emulation of an Intel IOMMU (VT-d)
>> + *   (DMA Remapping device)
>> + *
>> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
>> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> + */
>> +
>> +#include "hw/sysbus.h"
>> +#include "exec/address-spaces.h"
>> +#include "hw/i386/intel_iommu.h"
>> +
>> +/* #define DEBUG_INTEL_IOMMU */
>> +#ifdef DEBUG_INTEL_IOMMU
>> +#define D(fmt, ...) \
>> +    do { fprintf(stderr, "(vtd)%s: " fmt "\n", __func__, \
>> +                 ## __VA_ARGS__); } while (0)
>> +#else
>> +#define D(fmt, ...) \
>> +    do { } while (0)
>> +#endif
>> +
>
>
> Way too short for a macro name, might conflict with some
> header you include.
> You are polluting the global namespace.
> Best to prefix everything by INTEL_IOMMU_ and intel_iommu_.

Get it. Maybe I will use VTD as the prefix for this.

>
>> +
>> +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>> +                        uint64_t wmask, uint64_t w1cmask)
>> +{
>> +    *((uint64_t *)&s->csr[addr]) = val;
>> +    *((uint64_t *)&s->wmask[addr]) = wmask;
>> +    *((uint64_t *)&s->w1cmask[addr]) = w1cmask;
>> +}
>> +
>> +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
>> +                                  uint64_t mask)
>> +{
>> +    *((uint64_t *)&s->womask[addr]) = mask;
>> +}
>> +
>> +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
>> +                        uint32_t wmask, uint32_t w1cmask)
>> +{
>> +    *((uint32_t *)&s->csr[addr]) = val;
>> +    *((uint32_t *)&s->wmask[addr]) = wmask;
>> +    *((uint32_t *)&s->w1cmask[addr]) = w1cmask;
>> +}
>> +
>> +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
>> +                                  uint32_t mask)
>> +{
>> +    *((uint32_t *)&s->womask[addr]) = mask;
>> +}
>> +
>> +/* "External" get/set operations */
>> +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
>> +{
>> +    uint64_t oldval = *((uint64_t *)&s->csr[addr]);
>> +    uint64_t wmask = *((uint64_t *)&s->wmask[addr]);
>> +    uint64_t w1cmask = *((uint64_t *)&s->w1cmask[addr]);
>> +    *((uint64_t *)&s->csr[addr]) =
>> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
>> +}
>> +
>> +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
>> +{
>> +    uint32_t oldval = *((uint32_t *)&s->csr[addr]);
>> +    uint32_t wmask = *((uint32_t *)&s->wmask[addr]);
>> +    uint32_t w1cmask = *((uint32_t *)&s->w1cmask[addr]);
>> +    *((uint32_t *)&s->csr[addr]) =
>> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
>> +}
>> +
>> +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    uint64_t val = *((uint64_t *)&s->csr[addr]);
>> +    uint64_t womask = *((uint64_t *)&s->womask[addr]);
>> +    return val & ~womask;
>> +}
>> +
>> +
>> +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    uint32_t val = *((uint32_t *)&s->csr[addr]);
>> +    uint32_t womask = *((uint32_t *)&s->womask[addr]);
>> +    return val & ~womask;
>> +}
>> +
>> +
>> +
>> +/* "Internal" get/set operations */
>> +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    return *((uint64_t *)&s->csr[addr]);
>> +}
>> +
>> +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    return *((uint32_t *)&s->csr[addr]);
>> +}
>
> we don't allow starting names with __ in qemu.

Get it.

>> +
>> +
>> +/* val = (val & ~clear) | mask */
> what does this comment mean? best to remove it imho
>
>> +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,
>> +                                     uint32_t clear, uint32_t mask)
>> +{
>> +    uint32_t *ptr = (uint32_t *)&s->csr[addr];
>> +    uint32_t val = (*ptr & ~clear) | mask;
>> +    *ptr = val;
>> +    return val;
>> +}
>> +
>> +/* val = (val & ~clear) | mask */
>
> what does this comment mean? best to remove it imho

This means this function will first clear bits of val specified in
clear, and then set those bits specified in mask. Maybe this is
obvious in the code and it is no need to comment it. I will remove it
then.

>> +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,
>> +                                     uint64_t clear, uint64_t mask)
>> +{
>> +    uint64_t *ptr = (uint64_t *)&s->csr[addr];
>> +    uint64_t val = (*ptr & ~clear) | mask;
>> +    *ptr = val;
>> +    return val;
>> +}
>
> Above looks suspicious endian-ness wise.
> Use APIs to fix this, will remove need for so
> many casts at the same time.

Do you mean all the functions above or just set_mask_long() and
set_mask_quad()? These functions read/write the CSRs maintained by
s->csr[]. These function will be called in MMIO read/write function. I
think they do not need to consider the endian-ness and only those
involving guest memory need to. Is that right?
I search and reference some device emulation codes in QEMU, but I
can't find APIs relevant to this. Can you just give me an example?
Thanks!

> I generally don't believe these wrappers buy you much.
> You could just open-code and it would be clearer.

set_mask_quad() and set_mask_long() are used to changed the Status
Register, for example to report the status to driver. If you think it
is no need to wrap this, I will remove them later.

>
>> +
>> +
>> +
>> +
>> +
>
> Don't put more than 1 empty line between functions.

Ah, I didn't know this before.

>> +static inline bool root_entry_present(vtd_root_entry *root)
>> +{
>> +    return root->val & ROOT_ENTRY_P;
>> +}
>> +
>> +
>> +static bool get_root_entry(IntelIOMMUState *s, int index, vtd_root_entry *re)
>> +{
>> +    dma_addr_t addr;
>> +
>> +    assert(index >= 0 && index < ROOT_ENTRY_NR);
>> +
>> +    addr = s->root + index * sizeof(*re);
>> +    if (dma_memory_read(&address_space_memory, addr, re, sizeof(*re))) {
>> +        fprintf(stderr, "(vtd)error: fail to read root table\n");
>
> Will flood log for management.
> Can guest trigger this?
> When?
> Maybe just assert?

I don't think guest can trigger this. It tries to read the guest
memory. I find that most callers of dma_memory_read() do not check the
return value. Do you think it is necessary to check
dma_memory_read(&address_space_memory)? I think if this returns error,
maybe that means there is something wrong in QEMU and I can do nothing
bu simply abort and exit.

>> +        return false;
>> +    }
>> +    re->val = le64_to_cpu(re->val);
>> +    return true;
>> +}
>> +
>> +
>> +static inline bool context_entry_present(vtd_context_entry *context)
>> +{
>> +    return context->lo & CONTEXT_ENTRY_P;
>> +}
>> +
>> +static bool get_context_entry_from_root(vtd_root_entry *root, int index,
>> +                                        vtd_context_entry *ce)
>> +{
>> +    dma_addr_t addr;
>> +
>> +    if (!root_entry_present(root)) {
>> +        ce->lo = 0;
>> +        ce->hi = 0;
>> +        return false;
>> +    }
>> +
>> +    assert(index >= 0 && index < CONTEXT_ENTRY_NR);
>> +
>> +    addr = (root->val & ROOT_ENTRY_CTP) + index * sizeof(*ce);
>> +    if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
>> +        fprintf(stderr, "(vtd)error: fail to read context table\n");
>> +        return false;
>> +    }
>> +    ce->lo = le64_to_cpu(ce->lo);
>> +    ce->hi = le64_to_cpu(ce->hi);
>> +    return true;
>> +}
>> +
>> +static inline dma_addr_t get_slpt_base_from_context(vtd_context_entry *ce)
>> +{
>> +    return ce->lo & CONTEXT_ENTRY_SLPTPTR;
>> +}
>> +
>> +
>> +/* The shift of an addr for a certain level of paging structure */
>> +static inline int slpt_level_shift(int level)
>> +{
>> +    return VTD_PAGE_SHIFT_4K + (level - 1) * SL_LEVEL_BITS;
>> +}
>> +
>> +static inline bool slpte_present(uint64_t slpte)
>> +{
>> +    return slpte & 3;
>> +}
>> +
>> +/* Calculate the GPA given the base address, the index in the page table and
>> + * the level of this page table.
>> + */
>> +static inline uint64_t get_slpt_gpa(uint64_t addr, int index, int level)
>> +{
>> +    return addr + (((uint64_t)index) << slpt_level_shift(level));
>> +}
>> +
>> +static inline uint64_t get_slpte_addr(uint64_t slpte)
>> +{
>> +    return slpte & SL_PT_BASE_ADDR_MASK;
>> +}
>> +
>> +/* Whether the pte points to a large page */
>> +static inline bool is_large_pte(uint64_t pte)
>> +{
>> +    return pte & SL_PT_PAGE_SIZE_MASK;
>> +}
>> +
>> +/* Whether the pte indicates the address of the page frame */
>> +static inline bool is_last_slpte(uint64_t slpte, int level)
>> +{
>> +    if (level == SL_PT_LEVEL) {
>> +        return true;
>> +    }
>> +    if (is_large_pte(slpte)) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +/* Get the content of a spte located in @base_addr[@index] */
>> +static inline uint64_t get_slpte(dma_addr_t base_addr, int index)
>> +{
>> +    uint64_t slpte;
>> +
>> +    assert(index >= 0 && index < SL_PT_ENTRY_NR);
>> +
>> +    if (dma_memory_read(&address_space_memory,
>> +                        base_addr + index * sizeof(slpte), &slpte,
>> +                        sizeof(slpte))) {
>> +        fprintf(stderr, "(vtd)error: fail to read slpte\n");
>> +        return (uint64_t)-1;
>> +    }
>> +    slpte = le64_to_cpu(slpte);
>> +    return slpte;
>> +}
>> +
>> +#if 0
>
> drop dead code or make it easy to enable.

Sorry for that. Because the project is in progress and some codes are
for debugging (but not used even when DEBUG is on), so I just simple
comment out them. Maybe I should drop them in the public version.

>> +static inline void print_slpt(dma_addr_t base_addr)
>> +{
>> +    int i;
>> +    uint64_t slpte;
>> +
>> +    D("slpt at addr 0x%"PRIx64 "===========", base_addr);
>> +    for (i = 0; i < SL_PT_ENTRY_NR; i++) {
>> +        slpte = get_slpte(base_addr, i);
>> +        D("slpte #%d 0x%"PRIx64, i, slpte);
>> +    }
>> +}
>> +
>> +static void print_root_table(IntelIOMMUState *s)
>> +{
>> +    int i;
>> +    vtd_root_entry re;
>> +    D("root-table=====================");
>> +    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
>> +        get_root_entry(s, i, &re);
>> +        if (root_entry_present(&re)) {
>> +            D("root-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
>> +              re.val);
>> +        }
>> +    }
>> +}
>> +
>> +static void print_context_table(vtd_root_entry *re)
>> +{
>> +    int i;
>> +    vtd_context_entry ce;
>> +    D("context-table==================");
>> +    for (i = 0; i < CONTEXT_ENTRY_NR; ++i) {
>> +        get_context_entry_from_root(re, i, &ce);
>> +        if (context_entry_present(&ce)) {
>> +            D("context-entry #%d hi 0x%"PRIx64 " low 0x%"PRIx64, i, ce.hi,
>> +              ce.lo);
>> +        }
>> +    }
>> +}
>> +#endif
>> +
>> +/* Given a gpa and the level of paging structure, return the offset of current
>> + * level.
>> + */
>> +static inline int gpa_level_offset(uint64_t gpa, int level)
>> +{
>> +    return (gpa >> slpt_level_shift(level)) & ((1ULL << SL_LEVEL_BITS) - 1);
>> +}
>> +
>> +/* Get the page-table level that hardware should use for the second-level
>> + * page-table walk from the Address Width field of context-entry.
>> + */
>> +static inline int get_level_from_context_entry(vtd_context_entry *ce)
>> +{
>> +    return 2 + (ce->hi & CONTEXT_ENTRY_AW);
>> +}
>> +
>> +/* Given the @gpa, return relevant slpte. @slpte_level will be the last level
>> + * of the translation, can be used for deciding the size of large page.
>> + */
>> +static uint64_t gpa_to_slpte(vtd_context_entry *ce, uint64_t gpa,
>> +                             int *slpte_level)
>> +{
>> +    dma_addr_t addr = get_slpt_base_from_context(ce);
>> +    int level = get_level_from_context_entry(ce);
>> +    int offset;
>> +    uint64_t slpte;
>> +
>> +    /* D("slpt_base 0x%"PRIx64, addr); */
>
> drop comments like this or uncomment.

Get it.

>> +    while (true) {
>> +        offset = gpa_level_offset(gpa, level);
>> +        slpte = get_slpte(addr, offset);
>> +        /* D("level %d slpte 0x%"PRIx64, level, slpte); */
>> +        if (!slpte_present(slpte)) {
>> +            D("error: slpte 0x%"PRIx64 " is not present", slpte);
>> +            slpte = (uint64_t)-1;
>> +            *slpte_level = level;
>> +            break;
>> +        }
>> +        if (is_last_slpte(slpte, level)) {
>> +            *slpte_level = level;
>> +            break;
>> +        }
>> +        addr = get_slpte_addr(slpte);
>> +        level--;
>> +    }
>> +    return slpte;
>> +}
>> +
>> +/* Do a paging-structures walk to do a iommu translation
>> + * @bus_num: The bus number
>> + * @devfn: The devfn, which is the  combined of device and function number
>> + * @entry: IOMMUTLBEntry that contain the addr to be translated and result
>> + */
>> +static void iommu_translate(IntelIOMMUState *s, int bus_num, int devfn,
>> +                            hwaddr addr, IOMMUTLBEntry *entry)
>> +{
>> +    vtd_root_entry re;
>> +    vtd_context_entry ce;
>> +    uint64_t slpte;
>> +    int level;
>> +    uint64_t page_mask = VTD_PAGE_MASK_4K;
>> +
>> +
>> +    if (!get_root_entry(s, bus_num, &re)) {
>> +        /* FIXME */
>> +        return;
>> +    }
>> +    if (!root_entry_present(&re)) {
>> +        /* FIXME */
>> +        D("error: root-entry #%d is not present", bus_num);
>> +        return;
>> +    }
>> +    /* D("root-entry low 0x%"PRIx64, re.val); */
>> +    if (!get_context_entry_from_root(&re, devfn, &ce)) {
>> +        /* FIXME */
>> +        return;
>> +    }
>> +    if (!context_entry_present(&ce)) {
>> +        /* FIXME */
>> +        D("error: context-entry #%d(bus #%d) is not present", devfn, bus_num);
>> +        return;
>> +    }
>> +    /* D("context-entry hi 0x%"PRIx64 " low 0x%"PRIx64, ce.hi, ce.lo); */
>> +    slpte = gpa_to_slpte(&ce, addr, &level);
>> +    if (slpte == (uint64_t)-1) {
>> +        /* FIXME */
>> +        D("error: can't get slpte for gpa %"PRIx64, addr);
>> +        return;
>> +    }
>
> what's the plan for these FIXME's?

This means there is something wrong when mapping device to domain or
walking the paging structures maintained by the guest. And I think
this will be part of the fault reporting which is not implemented yet.
For now, I don't know how to handle these errors, so just simply print
warnings and return, then the guest will continue with a wrong
translation entry, which maybe improper.

>> +
>> +    if (is_large_pte(slpte)) {
>> +        if (level == SL_PDP_LEVEL) {
>> +            /* 1-GB page */
>> +            page_mask = VTD_PAGE_MASK_1G;
>> +        } else {
>> +            /* 2-MB page */
>> +            page_mask = VTD_PAGE_MASK_2M;
>> +        }
>> +    }
>> +
>> +    entry->iova = addr & page_mask;
>> +    entry->translated_addr = get_slpte_addr(slpte) & page_mask;
>> +    entry->addr_mask = ~page_mask;
>> +    entry->perm = IOMMU_RW;
>> +}
>> +
>> +#if 0
>> +/* Iterate a Second Level Page Table */
>> +static void __walk_slpt(dma_addr_t table_addr, int level, uint64_t gpa)
>> +{
>> +    int index;
>> +    uint64_t next_gpa;
>> +    dma_addr_t next_table_addr;
>> +    uint64_t slpte;
>> +
>> +    if (level < SL_PT_LEVEL) {
>> +        return;
>> +    }
>> +
>> +
>> +    for (index = 0; index < SL_PT_ENTRY_NR; ++index) {
>> +        slpte = get_slpte(table_addr, index);
>> +        if (!slpte_present(slpte)) {
>> +            continue;
>> +        }
>> +        D("level %d, index 0x%x, gpa 0x%"PRIx64, level, index, gpa);
>> +        next_gpa = get_slpt_gpa(gpa, index, level);
>> +        next_table_addr = get_slpte_addr(slpte);
>> +
>> +        if (is_last_slpte(slpte, level)) {
>> +            D("slpte gpa 0x%"PRIx64 ", hpa 0x%"PRIx64, next_gpa,
>> +              next_table_addr);
>> +            if (next_gpa != next_table_addr) {
>> +                D("Not 1:1 mapping, slpte 0x%"PRIx64, slpte);
>> +            }
>> +        } else {
>> +            __walk_slpt(next_table_addr, level - 1, next_gpa);
>> +        }
>> +    }
>> +}
>> +
>> +
>> +static void print_paging_structure_from_context(vtd_context_entry *ce)
>> +{
>> +    dma_addr_t table_addr;
>> +
>> +    table_addr = get_slpt_base_from_context(ce);
>> +    __walk_slpt(table_addr, SL_PDP_LEVEL, 0);
>> +}
>> +
>> +
>> +static void print_root_table_all(IntelIOMMUState *s)
>> +{
>> +    int i;
>> +    int j;
>> +    vtd_root_entry re;
>> +    vtd_context_entry ce;
>> +
>> +    for (i = 0; i < ROOT_ENTRY_NR; ++i) {
>> +        if (get_root_entry(s, i, &re) && root_entry_present(&re)) {
>> +            D("root_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64, i, re.rsvd,
>> +              re.val);
>> +
>> +            for (j = 0; j < CONTEXT_ENTRY_NR; ++j) {
>> +                if (get_context_entry_from_root(&re, j, &ce)
>> +                    && context_entry_present(&ce)) {
>> +                    D("context_entry 0x%x: hi 0x%"PRIx64 " low 0x%"PRIx64,
>> +                      j, ce.hi, ce.lo);
>> +                    D("--------------------------------");
>> +                    print_paging_structure_from_context(&ce);
>> +                }
>> +            }
>> +        }
>> +    }
>> +}
>> +#endif
>> +
>> +
>> +
>> +
>> +static void vtd_root_table_setup(IntelIOMMUState *s)
>> +{
>> +    s->root = *((uint64_t *)&s->csr[DMAR_RTADDR_REG]);
>> +    s->extended = s->root & VTD_RTADDR_RTT;
>> +    s->root &= ~0xfff;
>> +    D("root_table addr 0x%"PRIx64 " %s", s->root,
>> +      (s->extended ? "(Extended)" : ""));
>> +}
>> +
>> +/* Context-cache invalidation
>> + * Returns the Context Actual Invalidation Granularity.
>> + * @val: the content of the CCMD_REG
>> + */
>> +static uint64_t vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
>> +{
>> +    uint64_t caig;
>> +    uint64_t type = val & VTD_CCMD_CIRG_MASK;
>> +
>> +    switch (type) {
>> +    case VTD_CCMD_GLOBAL_INVL:
>> +        D("Global invalidation request");
>> +        caig = VTD_CCMD_GLOBAL_INVL_A;
>> +        break;
>> +
>> +    case VTD_CCMD_DOMAIN_INVL:
>> +        D("Domain-selective invalidation request");
>> +        caig = VTD_CCMD_DOMAIN_INVL_A;
>> +        break;
>> +
>> +    case VTD_CCMD_DEVICE_INVL:
>> +        D("Domain-selective invalidation request");
>> +        caig = VTD_CCMD_DEVICE_INVL_A;
>> +        break;
>> +
>> +    default:
>> +        D("error: wrong context-cache invalidation granularity");
>> +        caig = 0;
>> +    }
>> +
>> +    return caig;
>> +}
>> +
>> +
>> +/* Flush IOTLB
>> + * Returns the IOTLB Actual Invalidation Granularity.
>> + * @val: the content of the IOTLB_REG
>> + */
>> +static uint64_t vtd_iotlb_flush(IntelIOMMUState *s, uint64_t val)
>> +{
>> +    uint64_t iaig;
>> +    uint64_t type = val & VTD_TLB_FLUSH_GRANU_MASK;
>> +
>> +    switch (type) {
>> +    case VTD_TLB_GLOBAL_FLUSH:
>> +        D("Global IOTLB flush");
>> +        iaig = VTD_TLB_GLOBAL_FLUSH_A;
>> +        break;
>> +
>> +    case VTD_TLB_DSI_FLUSH:
>> +        D("Domain-selective IOTLB flush");
>> +        iaig = VTD_TLB_DSI_FLUSH_A;
>> +        break;
>> +
>> +    case VTD_TLB_PSI_FLUSH:
>> +        D("Page-selective-within-domain IOTLB flush");
>> +        iaig = VTD_TLB_PSI_FLUSH_A;
>> +        break;
>> +
>> +    default:
>> +        D("error: wrong iotlb flush granularity");
>> +        iaig = 0;
>> +    }
>> +
>> +    return iaig;
>> +}
>> +
>> +
>> +#if 0
>> +static void iommu_inv_queue_setup(IntelIOMMUState *s)
>> +{
>> +    uint64_t tail_val;
>> +    s->iq = *((uint64_t *)&s->csr[DMAR_IQA_REG]);
>> +    s->iq_sz = 0x100 << (s->iq & 0x7);  /* 256 entries per page */
>> +    s->iq &= ~0x7;
>> +    s->iq_enable = true;
>> +
>> +    /* Init head pointers */
>> +    tail_val = *((uint64_t *)&s->csr[DMAR_IQT_REG]);
>> +    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = tail_val;
>> +    s->iq_head = s->iq_tail = (tail_val >> 4) & 0x7fff;
>> +    D(" -- address: 0x%lx size 0x%lx", s->iq, s->iq_sz);
>> +}
>> +
>> +
>> +static int handle_invalidate(IntelIOMMUState *s, uint16_t i)
>> +{
>> +    intel_iommu_inv_desc entry;
>> +    uint8_t type;
>> +    dma_memory_read(&address_space_memory, s->iq + sizeof(entry) * i, &entry,
>> +                    sizeof(entry));
>> +    type = entry.lower & 0xf;
>> +    D("Processing invalidate request %d - desc: %016lx.%016lx", i,
>> +      entry.upper, entry.lower);
>> +    switch (type) {
>> +    case CONTEXT_CACHE_INV_DESC:
>> +        D("Context-cache Invalidate");
>> +        break;
>> +    case IOTLB_INV_DESC:
>> +        D("IOTLB Invalidate");
>> +        break;
>> +    case INV_WAIT_DESC:
>> +        D("Invalidate Wait");
>> +        if (status_write(entry.lower)) {
>> +            dma_memory_write(&address_space_memory, entry.upper,
>> +                             (uint8_t *)&entry.lower + 4, 4);
>> +        }
>> +        break;
>> +    default:
>> +        D(" - not impl - ");
>> +    }
>> +    return 0;
>> +}
>> +
>> +
>> +static void handle_iqt_write(IntelIOMMUState *s, uint64_t val)
>> +{
>> +    s->iq_tail = (val >> 4) & 0x7fff;
>> +    D("Write to IQT_REG new tail = %d", s->iq_tail);
>> +
>> +    if (!s->iq_enable) {
>> +        return;
>> +    }
>> +
>> +    /* Process the invalidation queue */
>> +    while (s->iq_head != s->iq_tail) {
>> +        handle_invalidate(s, s->iq_head++);
>> +        if (s->iq_head == s->iq_sz) {
>> +            s->iq_head = 0;
>> +        }
>> +    }
>> +    *((uint64_t *)&s->csr[DMAR_IQH_REG]) = s->iq_head << 4;
>> +
>> +    set_quad(s, DMAR_IQT_REG, val);
>> +}
>> +#endif
>> +
>> +/* FIXME: Not implemented yet */
>> +static void handle_gcmd_qie(IntelIOMMUState *s, bool en)
>> +{
>> +    D("Queued Invalidation Enable %s", (en ? "on" : "off"));
>> +
>> +    /*if (en) {
>> +        iommu_inv_queue_setup(s);
>> +    }*/
>> +
>> +    /* Ok - report back to driver */
>> +    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_QIES);
>> +}
>> +
>> +
>> +/* Set Root Table Pointer */
>> +static void handle_gcmd_srtp(IntelIOMMUState *s)
>> +{
>> +    D("set Root Table Pointer");
>> +
>> +    vtd_root_table_setup(s);
>> +    /* Ok - report back to driver */
>> +    set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
>> +}
>> +
>> +
>> +/* Handle Translation Enable/Disable */
>> +static void handle_gcmd_te(IntelIOMMUState *s, bool en)
>> +{
>> +    D("Translation Enable %s", (en ? "on" : "off"));
>> +
>> +    if (en) {
>> +        /* Ok - report back to driver */
>> +        set_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
>> +    } else {
>> +        /* Ok - report back to driver */
>> +        set_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0);
>> +    }
>> +}
>> +
>> +/* Handle write to Global Command Register */
>> +static void handle_gcmd_write(IntelIOMMUState *s)
>> +{
>> +    uint32_t status = __get_long(s, DMAR_GSTS_REG);
>> +    uint32_t val = __get_long(s, DMAR_GCMD_REG);
>> +    uint32_t changed = status ^ val;
>> +
>> +    D("value 0x%x status 0x%x", val, status);
>> +    if (changed & VTD_GCMD_TE) {
>> +        /* Translation enable/disable */
>> +        handle_gcmd_te(s, val & VTD_GCMD_TE);
>> +    } else if (val & VTD_GCMD_SRTP) {
>> +        /* Set/update the root-table pointer */
>> +        handle_gcmd_srtp(s);
>> +    } else if (changed & VTD_GCMD_QIE) {
>> +        /* Queued Invalidation Enable */
>> +        handle_gcmd_qie(s, val & VTD_GCMD_QIE);
>> +    } else {
>> +        D("error: unhandled gcmd write");
>> +    }
>> +}
>> +
>> +/* Handle write to Context Command Register */
>> +static void handle_ccmd_write(IntelIOMMUState *s)
>> +{
>> +    uint64_t ret;
>> +    uint64_t val = __get_quad(s, DMAR_CCMD_REG);
>> +
>> +    /* Context-cache invalidation request */
>> +    if (val & VTD_CCMD_ICC) {
>> +        ret = vtd_context_cache_invalidate(s, val);
>> +
>> +        /* Invalidation completed. Change something to show */
>> +        set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_ICC, 0ULL);
>> +        ret = set_mask_quad(s, DMAR_CCMD_REG, VTD_CCMD_CAIG_MASK, ret);
>> +        D("CCMD_REG write-back val: 0x%"PRIx64, ret);
>> +    }
>> +}
>> +
>> +/* Handle write to IOTLB Invalidation Register */
>> +static void handle_iotlb_write(IntelIOMMUState *s)
>> +{
>> +    uint64_t ret;
>> +    uint64_t val = __get_quad(s, DMAR_IOTLB_REG);
>> +
>> +    /* IOTLB invalidation request */
>> +    if (val & VTD_TLB_IVT) {
>> +        ret = vtd_iotlb_flush(s, val);
>> +
>> +        /* Invalidation completed. Change something to show */
>> +        set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_IVT, 0ULL);
>> +        ret = set_mask_quad(s, DMAR_IOTLB_REG, VTD_TLB_FLUSH_GRANU_MASK_A, ret);
>> +        D("IOTLB_REG write-back val: 0x%"PRIx64, ret);
>> +    }
>> +}
>> +
>> +
>> +static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    IntelIOMMUState *s = opaque;
>> +    uint64_t val;
>> +
>> +    if (addr + size > DMAR_REG_SIZE) {
>> +        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
>> +          DMAR_REG_SIZE, addr, size);
>> +        return (uint64_t)-1;
>> +    }
>> +
>> +    assert(size == 4 || size == 8);
>> +
>> +    switch (addr) {
>> +    /* Root Table Address Register, 64-bit */
>> +    case DMAR_RTADDR_REG:
>> +        if (size == 4) {
>> +            val = (uint32_t)s->root;
>> +        } else {
>> +            val = s->root;
>> +        }
>> +        break;
>> +
>> +    case DMAR_RTADDR_REG_HI:
>> +        assert(size == 4);
>> +        val = s->root >> 32;
>> +        break;
>> +
>> +    default:
>> +        if (size == 4) {
>> +            val = get_long(s, addr);
>> +        } else {
>> +            val = get_quad(s, addr);
>> +        }
>> +    }
>> +
>> +    D("addr 0x%"PRIx64 " size %d val 0x%"PRIx64, addr, size, val);
>> +    return val;
>> +}
>> +
>> +static void vtd_mem_write(void *opaque, hwaddr addr,
>> +                          uint64_t val, unsigned size)
>> +{
>> +    IntelIOMMUState *s = opaque;
>> +
>> +    if (addr + size > DMAR_REG_SIZE) {
>> +        D("error: addr outside region: max 0x%x, got 0x%"PRIx64 " %d",
>> +          DMAR_REG_SIZE, addr, size);
>> +        return;
>> +    }
>> +
>> +    assert(size == 4 || size == 8);
>> +
>> +    /* Val should be written into csr within the handler */
>> +    switch (addr) {
>> +    /* Global Command Register, 32-bit */
>> +    case DMAR_GCMD_REG:
>> +        D("DMAR_GCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        set_long(s, addr, val);
>> +        handle_gcmd_write(s);
>> +        break;
>> +
>> +    /* Invalidation Queue Tail Register, 64-bit */
>> +    /*case DMAR_IQT_REG:
>> +        if (size == 4) {
>> +
>> +        }
>> +        if (size == 4) {
>> +            if (former_size == 0) {
>> +                former_size = size;
>> +                former_value = val;
>> +                goto out;
>> +            } else {
>> +                val = (val << 32) + former_value;
>> +                former_size = 0;
>> +                former_value = 0;
>> +            }
>> +        }
>> +        handle_iqt_write(s, val);
>> +        break;*/
>> +
>> +    /* Context Command Register, 64-bit */
>> +    case DMAR_CCMD_REG:
>> +        D("DMAR_CCMD_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        if (size == 4) {
>> +            set_long(s, addr, val);
>> +        } else {
>> +            set_quad(s, addr, val);
>> +            handle_ccmd_write(s);
>> +        }
>> +        break;
>> +
>> +    case DMAR_CCMD_REG_HI:
>> +        D("DMAR_CCMD_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        assert(size == 4);
>> +        set_long(s, addr, val);
>> +        handle_ccmd_write(s);
>> +        break;
>> +
>> +
>> +    /* IOTLB Invalidation Register, 64-bit */
>> +    case DMAR_IOTLB_REG:
>> +        D("DMAR_IOTLB_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        if (size == 4) {
>> +            set_long(s, addr, val);
>> +        } else {
>> +            set_quad(s, addr, val);
>> +            handle_iotlb_write(s);
>> +        }
>> +        break;
>> +
>> +    case DMAR_IOTLB_REG_HI:
>> +        D("DMAR_IOTLB_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        assert(size == 4);
>> +        set_long(s, addr, val);
>> +        handle_iotlb_write(s);
>> +        break;
>> +
>> +    /* Fault Status Register, 32-bit */
>> +    case DMAR_FSTS_REG:
>> +    /* Fault Event Data Register, 32-bit */
>> +    case DMAR_FEDATA_REG:
>> +    /* Fault Event Address Register, 32-bit */
>> +    case DMAR_FEADDR_REG:
>> +    /* Fault Event Upper Address Register, 32-bit */
>> +    case DMAR_FEUADDR_REG:
>> +    /* Fault Event Control Register, 32-bit */
>> +    case DMAR_FECTL_REG:
>> +    /* Protected Memory Enable Register, 32-bit */
>> +    case DMAR_PMEN_REG:
>> +        D("known reg write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        set_long(s, addr, val);
>> +        break;
>> +
>> +
>> +    /* Root Table Address Register, 64-bit */
>> +    case DMAR_RTADDR_REG:
>> +        D("DMAR_RTADDR_REG write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        if (size == 4) {
>> +            set_long(s, addr, val);
>> +        } else {
>> +            set_quad(s, addr, val);
>> +        }
>> +        break;
>> +
>> +    case DMAR_RTADDR_REG_HI:
>> +        D("DMAR_RTADDR_REG_HI write addr 0x%"PRIx64 ", size %d, val 0x%"PRIx64,
>> +          addr, size, val);
>> +        assert(size == 4);
>> +        set_long(s, addr, val);
>> +        break;
>> +
>> +    default:
>> +        D("error: unhandled reg write addr 0x%"PRIx64
>> +          ", size %d, val 0x%"PRIx64, addr, size, val);
>> +        if (size == 4) {
>> +            set_long(s, addr, val);
>> +        } else {
>> +            set_quad(s, addr, val);
>> +        }
>> +    }
>> +
>> +}
>> +
>> +
>> +static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr)
>> +{
>> +    VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>> +    IntelIOMMUState *s = vtd_as->iommu_state;
>> +    int bus_num = vtd_as->bus_num;
>> +    int devfn = vtd_as->devfn;
>> +    IOMMUTLBEntry ret = {
>> +        .target_as = &address_space_memory,
>> +        .iova = 0,
>> +        .translated_addr = 0,
>> +        .addr_mask = ~(hwaddr)0,
>> +        .perm = IOMMU_NONE,
>> +    };
>> +
>> +    if (!(__get_long(s, DMAR_GSTS_REG) & VTD_GSTS_TES)) {
>> +        /* DMAR disabled, passthrough, use 4k-page*/
>> +        ret.iova = addr & VTD_PAGE_MASK_4K;
>> +        ret.translated_addr = addr & VTD_PAGE_MASK_4K;
>> +        ret.addr_mask = ~VTD_PAGE_MASK_4K;
>> +        ret.perm = IOMMU_RW;
>> +        return ret;
>> +    }
>> +
>> +    iommu_translate(s, bus_num, devfn, addr, &ret);
>> +
>> +    D("bus %d slot %d func %d devfn %d gpa %"PRIx64 " hpa %"PRIx64,
>> +      bus_num, VTD_PCI_SLOT(devfn), VTD_PCI_FUNC(devfn), devfn, addr,
>> +      ret.translated_addr);
>> +    return ret;
>> +}
>> +
>> +static const VMStateDescription vtd_vmstate = {
>> +    .name = "iommu_intel",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT8_ARRAY(csr, IntelIOMMUState, DMAR_REG_SIZE),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +
>> +static const MemoryRegionOps vtd_mem_ops = {
>> +    .read = vtd_mem_read,
>> +    .write = vtd_mem_write,
>> +    .endianness = DEVICE_LITTLE_ENDIAN,
>> +    .impl = {
>> +        .min_access_size = 4,
>> +        .max_access_size = 8,
>> +    },
>> +    .valid = {
>> +        .min_access_size = 4,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +
>> +static Property iommu_properties[] = {
>> +    DEFINE_PROP_UINT32("version", IntelIOMMUState, version, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +/* Do the real initialization. It will also be called when reset, so pay
>> + * attention when adding new initialization stuff.
>> + */
>> +static void do_vtd_init(IntelIOMMUState *s)
>> +{
>> +    memset(s->csr, 0, DMAR_REG_SIZE);
>> +    memset(s->wmask, 0, DMAR_REG_SIZE);
>> +    memset(s->w1cmask, 0, DMAR_REG_SIZE);
>> +    memset(s->womask, 0, DMAR_REG_SIZE);
>> +
>> +    s->iommu_ops.translate = vtd_iommu_translate;
>> +    s->root = 0;
>> +    s->extended = false;
>> +
>> +    /* b.0:2 = 6: Number of domains supported: 64K using 16 bit ids
>> +     * b.3   = 0: No advanced fault logging
>> +     * b.4   = 0: No required write buffer flushing
>> +     * b.5   = 0: Protected low memory region not supported
>> +     * b.6   = 0: Protected high memory region not supported
>> +     * b.8:12 = 2: SAGAW(Supported Adjusted Guest Address Widths), 39-bit,
>> +     *             3-level page-table
>> +     * b.16:21 = 38: MGAW(Maximum Guest Address Width) = 39
>> +     * b.22 = 1: ZLR(Zero Length Read) supports zero length DMA read requests
>> +     *           to write-only pages
>> +     * b.24:33 = 34: FRO(Fault-recording Register offset)
>> +     * b.54 = 0: DWD(Write Draining), draining of write requests not supported
>> +     * b.55 = 0: DRD(Read Draining), draining of read requests not supported
>> +     */
>> +    const uint64_t dmar_cap_reg_value = 0x00400000ULL | VTD_CAP_FRO |
>> +                                        VTD_CAP_NFR | VTD_CAP_ND |
>> +                                        VTD_CAP_MGAW | VTD_SAGAW_39bit;
>> +
>> +    /* b.1 = 0: QI(Queued Invalidation support) not supported
>> +     * b.2 = 0: DT(Device-TLB support)
>> +     * b.3 = 0: IR(Interrupt Remapping support) not supported
>> +     * b.4 = 0: EIM(Extended Interrupt Mode) not supported
>> +     * b.8:17 = 15: IRO(IOTLB Register Offset)
>> +     * b.20:23 = 15: MHMV(Maximum Handle Mask Value)
>> +     */
>> +    const uint64_t dmar_ecap_reg_value = 0xf00000ULL | VTD_ECAP_IRO;
>> +
>> +    /* Define registers with default values and bit semantics */
>> +    define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);  /* set MAX = 1, RO */
>> +    define_quad(s, DMAR_CAP_REG, dmar_cap_reg_value, 0, 0);
>> +    define_quad(s, DMAR_ECAP_REG, dmar_ecap_reg_value, 0, 0);
>> +    define_long(s, DMAR_GCMD_REG, 0, 0xff800000UL, 0);
>> +    define_long_wo(s, DMAR_GCMD_REG, 0xff800000UL);
>> +    define_long(s, DMAR_GSTS_REG, 0, 0, 0); /* All bits RO, default 0 */
>> +    define_quad(s, DMAR_RTADDR_REG, 0, 0xfffffffffffff000ULL, 0);
>> +    define_quad(s, DMAR_CCMD_REG, 0, 0xe0000003ffffffffULL, 0);
>> +    define_quad_wo(s, DMAR_CCMD_REG, 0x3ffff0000ULL);
>> +    define_long(s, DMAR_FSTS_REG, 0, 0, 0xfdUL);
>> +    define_long(s, DMAR_FECTL_REG, 0x80000000UL, 0x80000000UL, 0);
>> +    define_long(s, DMAR_FEDATA_REG, 0, 0xffffffffUL, 0); /* All bits RW */
>> +    define_long(s, DMAR_FEADDR_REG, 0, 0xfffffffcUL, 0); /* 31:2 RW */
>> +    define_long(s, DMAR_FEUADDR_REG, 0, 0xffffffffUL, 0); /* 31:0 RW */
>> +
>> +    define_quad(s, DMAR_AFLOG_REG, 0, 0xffffffffffffff00ULL, 0);
>> +
>> +    /* Treated as RO for implementations that PLMR and PHMR fields reported
>> +     * as Clear in the CAP_REG.
>> +     * define_long(s, DMAR_PMEN_REG, 0, 0x80000000UL, 0);
>> +     */
>> +    define_long(s, DMAR_PMEN_REG, 0, 0, 0);
>> +
>> +    /* TBD: The definition of these are dynamic:
>> +     * DMAR_PLMBASE_REG, DMAR_PLMLIMIT_REG, DMAR_PHMBASE_REG, DMAR_PHMLIMIT_REG
>> +     */
>> +
>> +    /* Bits 18:4 (0x7fff0) is RO, rest is RsvdZ
>> +     * IQH_REG is treated as RsvdZ when not supported in ECAP_REG
>> +     * define_quad(s, DMAR_IQH_REG, 0, 0, 0);
>> +     */
>> +    define_quad(s, DMAR_IQH_REG, 0, 0, 0);
>> +
>> +    /* IQT_REG and IQA_REG is treated as RsvdZ when not supported in ECAP_REG
>> +     * define_quad(s, DMAR_IQT_REG, 0, 0x7fff0ULL, 0);
>> +     * define_quad(s, DMAR_IQA_REG, 0, 0xfffffffffffff007ULL, 0);
>> +     */
>> +    define_quad(s, DMAR_IQT_REG, 0, 0, 0);
>> +    define_quad(s, DMAR_IQA_REG, 0, 0, 0);
>> +
>> +    /* Bit 0 is RW1CS - rest is RsvdZ */
>> +    define_long(s, DMAR_ICS_REG, 0, 0, 0x1UL);
>> +
>> +    /* b.31 is RW, b.30 RO, rest: RsvdZ */
>> +    define_long(s, DMAR_IECTL_REG, 0x80000000UL, 0x80000000UL, 0);
>> +
>> +    define_long(s, DMAR_IEDATA_REG, 0, 0xffffffffUL, 0);
>> +    define_long(s, DMAR_IEADDR_REG, 0, 0xfffffffcUL, 0);
>> +    define_long(s, DMAR_IEUADDR_REG, 0, 0xffffffffUL, 0);
>> +    define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0);
>> +    define_quad(s, DMAR_PQH_REG, 0, 0x7fff0ULL, 0);
>> +    define_quad(s, DMAR_PQT_REG, 0, 0x7fff0ULL, 0);
>> +    define_quad(s, DMAR_PQA_REG, 0, 0xfffffffffffff007ULL, 0);
>> +    define_long(s, DMAR_PRS_REG, 0, 0, 0x1UL);
>> +    define_long(s, DMAR_PECTL_REG, 0x80000000UL, 0x80000000UL, 0);
>> +    define_long(s, DMAR_PEDATA_REG, 0, 0xffffffffUL, 0);
>> +    define_long(s, DMAR_PEADDR_REG, 0, 0xfffffffcUL, 0);
>> +    define_long(s, DMAR_PEUADDR_REG, 0, 0xffffffffUL, 0);
>> +
>> +    /* When MTS not supported in ECAP_REG, these regs are RsvdZ */
>> +    define_long(s, DMAR_MTRRCAP_REG, 0, 0, 0);
>> +    define_long(s, DMAR_MTRRDEF_REG, 0, 0, 0);
>> +
>> +    /* IOTLB registers */
>> +    define_quad(s, DMAR_IOTLB_REG, 0, 0Xb003ffff00000000ULL, 0);
>> +    define_quad(s, DMAR_IVA_REG, 0, 0xfffffffffffff07fULL, 0);
>> +    define_quad_wo(s, DMAR_IVA_REG, 0xfffffffffffff07fULL);
>> +}
>> +
>> +#if 0
>> +/* Iterate IntelIOMMUState->address_spaces[] and free any allocated memory */
>> +static void clean_address_space(IntelIOMMUState *s)
>> +{
>> +    VTDAddressSpace **pvtd_as;
>> +    VTDAddressSpace *vtd_as;
>> +    int i;
>> +    int j;
>> +    const int MAX_DEVFN = VTD_PCI_SLOT_MAX * VTD_PCI_FUNC_MAX;
>> +
>> +    for (i = 0; i < VTD_PCI_BUS_MAX; ++i) {
>> +        pvtd_as = s->address_spaces[i];
>> +        if (!pvtd_as) {
>> +            continue;
>> +        }
>> +        for (j = 0; j < MAX_DEVFN; ++j) {
>> +            vtd_as = *(pvtd_as + j);
>> +            if (!vtd_as) {
>> +                continue;
>> +            }
>> +            g_free(vtd_as);
>> +            *(pvtd_as + j) = 0;
>> +        }
>> +        g_free(pvtd_as);
>> +        s->address_spaces[i] = 0;
>> +    }
>> +}
>> +#endif
>> +
>> +/* Reset function of QOM
>> + * Should not reset address_spaces when reset
>> + */
>> +static void vtd_reset(DeviceState *dev)
>> +{
>> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>> +
>> +    D("");
>> +    do_vtd_init(s);
>> +}
>> +
>> +/* Initializatoin function of QOM */
>
> typo

:)

>
>> +static void vtd_realize(DeviceState *dev, Error **errp)
>> +{
>> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>> +
>> +    D("");
>> +    memset(s->address_spaces, 0, sizeof(s->address_spaces));
>> +    memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
>> +                          "intel_iommu", DMAR_REG_SIZE);
>> +
>> +    do_vtd_init(s);
>> +}
>> +
>> +static void vtd_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->reset = vtd_reset;
>> +    dc->realize = vtd_realize;
>> +    dc->vmsd = &vtd_vmstate;
>> +    dc->props = iommu_properties;
>> +}
>> +
>> +static const TypeInfo vtd_info = {
>> +    .name          = TYPE_INTEL_IOMMU_DEVICE,
>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(IntelIOMMUState),
>> +    .class_init    = vtd_class_init,
>> +};
>> +
>> +static void vtd_register_types(void)
>> +{
>> +    D("");
>> +    type_register_static(&vtd_info);
>> +}
>> +
>> +type_init(vtd_register_types)
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> new file mode 100644
>> index 0000000..af5aff8
>> --- /dev/null
>> +++ b/include/hw/i386/intel_iommu.h
>
>
> the only user seems to be hw/i386/intel_iommu.c
> so you can move this header to hw/i386/ - include is
> for external headers.

Get it.

>> @@ -0,0 +1,350 @@
>> +/*
>> + * QEMU emulation of an Intel IOMMU (VT-d)
>> + *   (DMA Remapping device)
>> + *
>> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
>> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> + *
>> + * Lots of defines copied from kernel/include/linux/intel-iommu.h:
>> + *   Copyright (C) 2006-2008 Intel Corporation
>> + *   Author: Ashok Raj <ashok.raj@intel.com>
>> + *   Author: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
>> + *
>> + */
>> +
>> +#ifndef _INTEL_IOMMU_H
>> +#define _INTEL_IOMMU_H
>> +#include "hw/qdev.h"
>> +#include "sysemu/dma.h"
>> +
>> +#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
>> +#define INTEL_IOMMU_DEVICE(obj) \
>> +     OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
>> +
>> +/* DMAR Hardware Unit Definition address (IOMMU unit) set i bios */
>> +#define Q35_HOST_BRIDGE_IOMMU_ADDR 0xfed90000ULL
>> +
>> +/*
>> + * Intel IOMMU register specification per version 1.0 public spec.
>> + */
>> +
>> +#define DMAR_VER_REG    0x0 /* Arch version supported by this IOMMU */
>> +#define DMAR_CAP_REG    0x8 /* Hardware supported capabilities */
>> +#define DMAR_CAP_REG_HI 0xc /* High 32-bit of DMAR_CAP_REG */
>> +#define DMAR_ECAP_REG   0x10    /* Extended capabilities supported */
>> +#define DMAR_ECAP_REG_HI    0X14
>> +#define DMAR_GCMD_REG   0x18    /* Global command register */
>> +#define DMAR_GSTS_REG   0x1c    /* Global status register */
>> +#define DMAR_RTADDR_REG 0x20    /* Root entry table */
>> +#define DMAR_RTADDR_REG_HI  0X24
>> +#define DMAR_CCMD_REG   0x28  /* Context command reg */
>> +#define DMAR_CCMD_REG_HI    0x2c
>> +#define DMAR_FSTS_REG   0x34  /* Fault Status register */
>> +#define DMAR_FECTL_REG  0x38 /* Fault control register */
>> +#define DMAR_FEDATA_REG 0x3c    /* Fault event interrupt data register */
>> +#define DMAR_FEADDR_REG 0x40    /* Fault event interrupt addr register */
>> +#define DMAR_FEUADDR_REG    0x44   /* Upper address register */
>> +#define DMAR_AFLOG_REG  0x58 /* Advanced Fault control */
>> +#define DMAR_AFLOG_REG_HI   0X5c
>> +#define DMAR_PMEN_REG   0x64  /* Enable Protected Memory Region */
>> +#define DMAR_PLMBASE_REG    0x68    /* PMRR Low addr */
>> +#define DMAR_PLMLIMIT_REG 0x6c  /* PMRR low limit */
>> +#define DMAR_PHMBASE_REG 0x70   /* pmrr high base addr */
>> +#define DMAR_PHMBASE_REG_HI 0X74
>> +#define DMAR_PHMLIMIT_REG 0x78  /* pmrr high limit */
>> +#define DMAR_PHMLIMIT_REG_HI 0x7c
>> +#define DMAR_IQH_REG    0x80   /* Invalidation queue head register */
>> +#define DMAR_IQH_REG_HI 0X84
>> +#define DMAR_IQT_REG    0x88   /* Invalidation queue tail register */
>> +#define DMAR_IQT_REG_HI 0X8c
>> +#define DMAR_IQ_SHIFT   4 /* Invalidation queue head/tail shift */
>> +#define DMAR_IQA_REG    0x90   /* Invalidation queue addr register */
>> +#define DMAR_IQA_REG_HI 0x94
>> +#define DMAR_ICS_REG    0x9c   /* Invalidation complete status register */
>> +#define DMAR_IRTA_REG   0xb8    /* Interrupt remapping table addr register */
>> +#define DMAR_IRTA_REG_HI    0xbc
>> +
>> +/* From Vt-d 2.2 spec */
>> +#define DMAR_IECTL_REG  0xa0    /* Invalidation event control register */
>> +#define DMAR_IEDATA_REG 0xa4    /* Invalidation event data register */
>> +#define DMAR_IEADDR_REG 0xa8    /* Invalidation event address register */
>> +#define DMAR_IEUADDR_REG 0xac    /* Invalidation event address register */
>> +#define DMAR_PQH_REG    0xc0    /* Page request queue head register */
>> +#define DMAR_PQH_REG_HI 0xc4
>> +#define DMAR_PQT_REG    0xc8    /* Page request queue tail register*/
>> +#define DMAR_PQT_REG_HI     0xcc
>> +#define DMAR_PQA_REG    0xd0    /* Page request queue address register */
>> +#define DMAR_PQA_REG_HI 0xd4
>> +#define DMAR_PRS_REG    0xdc    /* Page request status register */
>> +#define DMAR_PECTL_REG  0xe0    /* Page request event control register */
>> +#define DMAR_PEDATA_REG 0xe4    /* Page request event data register */
>> +#define DMAR_PEADDR_REG 0xe8    /* Page request event address register */
>> +#define DMAR_PEUADDR_REG  0xec  /* Page event upper address register */
>> +#define DMAR_MTRRCAP_REG 0x100  /* MTRR capability register */
>> +#define DMAR_MTRRCAP_REG_HI 0x104
>> +#define DMAR_MTRRDEF_REG 0x108  /* MTRR default type register */
>> +#define DMAR_MTRRDEF_REG_HI 0x10c
>> +
>> +/* IOTLB */
>> +#define DMAR_IOTLB_REG_OFFSET 0xf0  /* Offset to the IOTLB registers */
>> +#define DMAR_IVA_REG DMAR_IOTLB_REG_OFFSET  /* Invalidate Address Register */
>> +#define DMAR_IVA_REG_HI (DMAR_IVA_REG + 4)
>> +/* IOTLB Invalidate Register */
>> +#define DMAR_IOTLB_REG (DMAR_IOTLB_REG_OFFSET + 0x8)
>> +#define DMAR_IOTLB_REG_HI (DMAR_IOTLB_REG + 4)
>> +
>> +/* FRCD */
>> +#define DMAR_FRCD_REG_OFFSET 0x220 /* Offset to the Fault Recording Registers */
>> +#define DMAR_FRCD_REG_NR 1 /* Num of Fault Recording Registers */
>> +
>> +#define DMAR_REG_SIZE   (DMAR_FRCD_REG_OFFSET + 128 * DMAR_FRCD_REG_NR)
>> +
>> +#define VTD_PCI_BUS_MAX 256
>> +#define VTD_PCI_SLOT_MAX 32
>> +#define VTD_PCI_FUNC_MAX 8
>> +#define VTD_PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
>> +#define VTD_PCI_FUNC(devfn)         ((devfn) & 0x07)
>> +
>> +typedef struct IntelIOMMUState IntelIOMMUState;
>> +typedef struct VTDAddressSpace VTDAddressSpace;
>> +
>> +struct VTDAddressSpace {
>> +    int bus_num;
>> +    int devfn;
>> +    AddressSpace as;
>> +    MemoryRegion iommu;
>> +    IntelIOMMUState *iommu_state;
>> +};
>> +
>> +/* The iommu (DMAR) device state struct */
>> +struct IntelIOMMUState {
>> +    SysBusDevice busdev;
>> +    MemoryRegion csrmem;
>> +    uint8_t csr[DMAR_REG_SIZE];     /* register values */
>> +    uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
>> +    uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>> +    uint8_t womask[DMAR_REG_SIZE]; /* WO (write only - read returns 0) */
>> +    uint32_t version;
>> +
>> +    dma_addr_t root;  /* Current root table pointer */
>> +    bool extended;    /* Type of root table (extended or not) */
>> +    uint16_t iq_head; /* Current invalidation queue head */
>> +    uint16_t iq_tail; /* Current invalidation queue tail */
>> +    dma_addr_t iq;   /* Current invalidation queue (IQ) pointer */
>> +    size_t iq_sz;    /* IQ Size in number of entries */
>> +    bool iq_enable;  /* Set if the IQ is enabled */
>> +
>> +    MemoryRegionIOMMUOps iommu_ops;
>> +    VTDAddressSpace **address_spaces[VTD_PCI_BUS_MAX];
>> +};
>> +
>> +
>> +/* An invalidate descriptor */
>> +typedef struct intel_iommu_inv_desc {
>> +    uint64_t lower;
>> +    uint64_t upper;
>> +} intel_iommu_inv_desc;
>
> voilates qemu coding style.

Get it.

>> +
>> +
>> +/* Invalidate descriptor types */
>> +#define CONTEXT_CACHE_INV_DESC  0x1
>> +#define PASID_CACHE_INV_DESC    0x7
>> +#define IOTLB_INV_DESC          0x2
>> +#define EXT_IOTLB_INV_DESC      0x6
>> +#define DEV_TLB_INV_DESC        0x3
>> +#define EXT_DEV_TLB_INV_DESC    0x8
>> +#define INT_ENTRY_INV_DESC      0x4
>> +#define INV_WAIT_DESC           0x5
>
> Pls use consistent prefix for everything.
> E.g, INTEL_IOMMU_ ...

Get it. Because my work is based on existing framework, so codes of
unimplemented functionality are not touched and modified yet. Maybe I
can just simple remove them for now. Some of the definitions do use
different prefix, for example use VTD_ for CSRs definitions but use
ROOT_ for root entry and CONTEXT_ for context entry. I will change
these to use VTD_ at all.

>> +
>> +
>> +/* IOTLB_REG */
>> +#define VTD_TLB_GLOBAL_FLUSH (1ULL << 60) /* Global invalidation */
>> +#define VTD_TLB_DSI_FLUSH (2ULL << 60)  /* Domain-selective invalidation */
>> +#define VTD_TLB_PSI_FLUSH (3ULL << 60)  /* Page-selective invalidation */
>> +#define VTD_TLB_FLUSH_GRANU_MASK (3ULL << 60)
>> +#define VTD_TLB_GLOBAL_FLUSH_A (1ULL << 57)
>> +#define VTD_TLB_DSI_FLUSH_A (2ULL << 57)
>> +#define VTD_TLB_PSI_FLUSH_A (3ULL << 57)
>> +#define VTD_TLB_FLUSH_GRANU_MASK_A (3ULL << 57)
>> +#define VTD_TLB_READ_DRAIN (1ULL << 49)
>> +#define VTD_TLB_WRITE_DRAIN (1ULL << 48)
>> +#define VTD_TLB_DID(id) (((uint64_t)((id) & 0xffffULL)) << 32)
>> +#define VTD_TLB_IVT (1ULL << 63)
>> +#define VTD_TLB_IH_NONLEAF (1ULL << 6)
>> +#define VTD_TLB_MAX_SIZE (0x3f)
>> +
>> +/* INVALID_DESC */
>> +#define DMA_CCMD_INVL_GRANU_OFFSET  61
>> +#define DMA_ID_TLB_GLOBAL_FLUSH (((uint64_t)1) << 3)
>> +#define DMA_ID_TLB_DSI_FLUSH    (((uint64_t)2) << 3)
>> +#define DMA_ID_TLB_PSI_FLUSH    (((uint64_t)3) << 3)
>> +#define DMA_ID_TLB_READ_DRAIN   (((uint64_t)1) << 7)
>> +#define DMA_ID_TLB_WRITE_DRAIN  (((uint64_t)1) << 6)
>> +#define DMA_ID_TLB_DID(id)  (((uint64_t)((id & 0xffff) << 16)))
>> +#define DMA_ID_TLB_IH_NONLEAF   (((uint64_t)1) << 6)
>> +#define DMA_ID_TLB_ADDR(addr)   (addr)
>> +#define DMA_ID_TLB_ADDR_MASK(mask)  (mask)
>> +
>> +/* PMEN_REG */
>> +#define DMA_PMEN_EPM (((uint32_t)1)<<31)
>> +#define DMA_PMEN_PRS (((uint32_t)1)<<0)
>> +
>> +/* GCMD_REG */
>> +#define VTD_GCMD_TE (1UL << 31)
>> +#define VTD_GCMD_SRTP (1UL << 30)
>> +#define VTD_GCMD_SFL (1UL << 29)
>> +#define VTD_GCMD_EAFL (1UL << 28)
>> +#define VTD_GCMD_WBF (1UL << 27)
>> +#define VTD_GCMD_QIE (1UL << 26)
>> +#define VTD_GCMD_IRE (1UL << 25)
>> +#define VTD_GCMD_SIRTP (1UL << 24)
>> +#define VTD_GCMD_CFI (1UL << 23)
>> +
>> +/* GSTS_REG */
>> +#define VTD_GSTS_TES (1UL << 31)
>> +#define VTD_GSTS_RTPS (1UL << 30)
>> +#define VTD_GSTS_FLS (1UL << 29)
>> +#define VTD_GSTS_AFLS (1UL << 28)
>> +#define VTD_GSTS_WBFS (1UL << 27)
>> +#define VTD_GSTS_QIES (1UL << 26)
>> +#define VTD_GSTS_IRES (1UL << 25)
>> +#define VTD_GSTS_IRTPS (1UL << 24)
>> +#define VTD_GSTS_CFIS (1UL << 23)
>> +
>> +/* CCMD_REG */
>> +#define VTD_CCMD_ICC (1ULL << 63)
>> +#define VTD_CCMD_GLOBAL_INVL (1ULL << 61)
>> +#define VTD_CCMD_DOMAIN_INVL (2ULL << 61)
>> +#define VTD_CCMD_DEVICE_INVL (3ULL << 61)
>> +#define VTD_CCMD_CIRG_MASK (3ULL << 61)
>> +#define VTD_CCMD_GLOBAL_INVL_A (1ULL << 59)
>> +#define VTD_CCMD_DOMAIN_INVL_A (2ULL << 59)
>> +#define VTD_CCMD_DEVICE_INVL_A (3ULL << 59)
>> +#define VTD_CCMD_CAIG_MASK (3ULL << 59)
>> +#define VTD_CCMD_FM(m) (((uint64_t)((m) & 3ULL)) << 32)
>> +#define VTD_CCMD_MASK_NOBIT 0
>> +#define VTD_CCMD_MASK_1BIT 1
>> +#define VTD_CCMD_MASK_2BIT 2
>> +#define VTD_CCMD_MASK_3BIT 3
>> +#define VTD_CCMD_SID(s) (((uint64_t)((s) & 0xffffULL)) << 16)
>> +#define VTD_CCMD_DID(d) ((uint64_t)((d) & 0xffffULL))
>> +
>> +/* FECTL_REG */
>> +#define DMA_FECTL_IM (((uint32_t)1) << 31)
>> +
>> +/* FSTS_REG */
>> +#define DMA_FSTS_PPF ((uint32_t)2)
>> +#define DMA_FSTS_PFO ((uint32_t)1)
>> +#define DMA_FSTS_IQE (1 << 4)
>> +#define DMA_FSTS_ICE (1 << 5)
>> +#define DMA_FSTS_ITE (1 << 6)
>> +#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
>> +
>> +/* RTADDR_REG */
>> +#define VTD_RTADDR_RTT (1ULL << 11)
>> +
>> +
>> +/* ECAP_REG */
>> +#define VTD_ECAP_IRO (DMAR_IOTLB_REG_OFFSET << 4)   /* (val >> 4) << 8 */
>> +
>> +/* CAP_REG */
>> +
>> +/* (val >> 4) << 24 */
>> +#define VTD_CAP_FRO     ((uint64_t)DMAR_FRCD_REG_OFFSET << 20)
>
> cast not needed

Get it.

>> +
>> +#define VTD_CAP_NFR     ((uint64_t)(DMAR_FRCD_REG_NR - 1) << 40)
>
> make DMAR_FRCD_REG_NR 1ULL and cast won't be needed.

Get it.

>> +#define VTD_DOMAIN_ID_SHIFT     16  /* 16-bit domain id for 64K domains */
>> +#define VTD_CAP_ND  (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
>> +#define VTD_MGAW    39  /* Maximum Guest Address Width */
>> +#define VTD_CAP_MGAW    (((VTD_MGAW - 1) & 0x3fULL) << 16)
>> +/* Supported Adjusted Guest Address Widths */
>> +#define VTD_SAGAW_MASK  (0x1fULL << 8)
>> +#define VTD_SAGAW_39bit (0x2ULL << 8)   /* 39-bit AGAW, 3-level page-table */
>> +#define VTD_SAGAW_48bit (0x4ULL << 8)   /* 48-bit AGAW, 4-level page-table */
>> +
>> +
>> +/* Pagesize of VTD paging structures, including root and context tables */
>> +#define VTD_PAGE_SHIFT      (12)
>> +#define VTD_PAGE_SIZE       (1UL << VTD_PAGE_SHIFT)
>> +
>> +#define VTD_PAGE_SHIFT_4K   (12)
>> +#define VTD_PAGE_MASK_4K    (~((1ULL << VTD_PAGE_SHIFT_4K) - 1))
>> +#define VTD_PAGE_SHIFT_2M   (21)
>> +#define VTD_PAGE_MASK_2M    (~((1UL << VTD_PAGE_SHIFT_2M) - 1))
>> +#define VTD_PAGE_SHIFT_1G   (30)
>> +#define VTD_PAGE_MASK_1G    (~((1UL << VTD_PAGE_SHIFT_1G) - 1))
>> +
>> +/* Root-Entry
>> + * 0: Present
>> + * 1-11: Reserved
>> + * 12-63: Context-table Pointer
>> + * 64-127: Reserved
>> + */
>> +struct vtd_root_entry {
>> +    uint64_t val;
>> +    uint64_t rsvd;
>> +};
>> +typedef struct vtd_root_entry vtd_root_entry;
>
> voilates qemu coding style
>
>> +
>> +/* Masks for struct vtd_root_entry */
>> +#define ROOT_ENTRY_P (1ULL << 0)
>> +#define ROOT_ENTRY_CTP  (~0xfffULL)
>> +
>> +#define ROOT_ENTRY_NR   (VTD_PAGE_SIZE / sizeof(vtd_root_entry))
>> +
>> +
>> +/* Context-Entry */
>> +struct vtd_context_entry {
>> +    uint64_t lo;
>> +    uint64_t hi;
>> +};
>> +typedef struct vtd_context_entry vtd_context_entry;
>
> same

Get it.
Thanks very much! :)

>
>> +
>> +/* Masks for struct vtd_context_entry */
>> +/* lo */
>> +#define CONTEXT_ENTRY_P (1ULL << 0)
>> +#define CONTEXT_ENTRY_FPD   (1ULL << 1) /* Fault Processing Disable */
>> +#define CONTEXT_ENTRY_TT    (3ULL << 2) /* Translation Type */
>> +#define CONTEXT_TT_MULTI_LEVEL  (0)
>> +#define CONTEXT_TT_DEV_IOTLB    (1)
>> +#define CONTEXT_TT_PASS_THROUGH (2)
>> +/* Second Level Page Translation Pointer*/
>> +#define CONTEXT_ENTRY_SLPTPTR   (~0xfffULL)
>> +
>> +/* hi */
>> +#define CONTEXT_ENTRY_AW    (7ULL) /* Adjusted guest-address-width */
>> +#define CONTEXT_ENTRY_DID   (0xffffULL << 8)    /* Domain Identifier */
>> +
>> +
>> +#define CONTEXT_ENTRY_NR    (VTD_PAGE_SIZE / sizeof(vtd_context_entry))
>> +
>> +
>> +/* Paging Structure common */
>> +#define SL_PT_PAGE_SIZE_MASK   (1ULL << 7)
>> +#define SL_LEVEL_BITS   9   /* Bits to decide the offset for each level */
>> +
>> +/* Second Level Paging Structure */
>> +#define SL_PML4_LEVEL 4
>> +#define SL_PDP_LEVEL 3
>> +#define SL_PD_LEVEL 2
>> +#define SL_PT_LEVEL 1
>> +
>> +#define SL_PT_ENTRY_NR  512
>> +#define SL_PT_BASE_ADDR_MASK  (~(uint64_t)(VTD_PAGE_SIZE - 1))
>> +
>> +
>> +#endif
>> --
>> 1.9.1

Regards,
Le

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
  2014-07-22 20:05   ` Michael S. Tsirkin
@ 2014-07-23  7:58   ` Paolo Bonzini
  2014-07-23 23:21     ` Le Tan
  2014-07-23 20:29   ` Stefan Weil
  2 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-07-23  7:58 UTC (permalink / raw)
  To: Le Tan, qemu-devel
  Cc: Jan Kiszka, Alex Williamson, Knut Omang, Anthony Liguori,
	Michael S. Tsirkin

Il 22/07/2014 17:47, Le Tan ha scritto:
> +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
> +                        uint64_t wmask, uint64_t w1cmask)
> +{
> +    *((uint64_t *)&s->csr[addr]) = val;

All these casts are not endian-safe.  Please use ldl_le_p, ldq_le_p,
stl_le_p, stq_le_p.

> +    *((uint64_t *)&s->wmask[addr]) = wmask;
> +    *((uint64_t *)&s->w1cmask[addr]) = w1cmask;
> +}
> +
> +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
> +                                  uint64_t mask)
> +{
> +    *((uint64_t *)&s->womask[addr]) = mask;
> +}
> +
> +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
> +                        uint32_t wmask, uint32_t w1cmask)
> +{
> +    *((uint32_t *)&s->csr[addr]) = val;
> +    *((uint32_t *)&s->wmask[addr]) = wmask;
> +    *((uint32_t *)&s->w1cmask[addr]) = w1cmask;
> +}
> +
> +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
> +                                  uint32_t mask)
> +{
> +    *((uint32_t *)&s->womask[addr]) = mask;
> +}
> +
> +/* "External" get/set operations */
> +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
> +{
> +    uint64_t oldval = *((uint64_t *)&s->csr[addr]);
> +    uint64_t wmask = *((uint64_t *)&s->wmask[addr]);
> +    uint64_t w1cmask = *((uint64_t *)&s->w1cmask[addr]);
> +    *((uint64_t *)&s->csr[addr]) =
> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
> +}
> +
> +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
> +{
> +    uint32_t oldval = *((uint32_t *)&s->csr[addr]);
> +    uint32_t wmask = *((uint32_t *)&s->wmask[addr]);
> +    uint32_t w1cmask = *((uint32_t *)&s->w1cmask[addr]);
> +    *((uint32_t *)&s->csr[addr]) =
> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
> +}
> +
> +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
> +{
> +    uint64_t val = *((uint64_t *)&s->csr[addr]);
> +    uint64_t womask = *((uint64_t *)&s->womask[addr]);
> +    return val & ~womask;
> +}
> +
> +
> +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
> +{
> +    uint32_t val = *((uint32_t *)&s->csr[addr]);
> +    uint32_t womask = *((uint32_t *)&s->womask[addr]);
> +    return val & ~womask;
> +}
> +
> +
> +
> +/* "Internal" get/set operations */
> +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)

get_quad_raw?

> +{
> +    return *((uint64_t *)&s->csr[addr]);
> +}
> +
> +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)

get_long_raw?

> +{
> +    return *((uint32_t *)&s->csr[addr]);
> +}
> +
> +
> +/* val = (val & ~clear) | mask */
> +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,

set_clear_long?

> +                                     uint32_t clear, uint32_t mask)
> +{
> +    uint32_t *ptr = (uint32_t *)&s->csr[addr];
> +    uint32_t val = (*ptr & ~clear) | mask;
> +    *ptr = val;
> +    return val;
> +}
> +
> +/* val = (val & ~clear) | mask */
> +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,

set_clear_quad?
> +                                     uint64_t clear, uint64_t mask)
> +{
> +    uint64_t *ptr = (uint64_t *)&s->csr[addr];
> +    uint64_t val = (*ptr & ~clear) | mask;
> +    *ptr = val;
> +    return val;
> +}
> +
> +

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
  2014-07-22 20:05   ` Michael S. Tsirkin
  2014-07-23  7:58   ` Paolo Bonzini
@ 2014-07-23 20:29   ` Stefan Weil
  2014-07-23 23:24     ` Le Tan
  2 siblings, 1 reply; 12+ messages in thread
From: Stefan Weil @ 2014-07-23 20:29 UTC (permalink / raw)
  To: Le Tan, qemu-devel
  Cc: Michael S. Tsirkin, Knut Omang, Alex Williamson, Jan Kiszka,
	Anthony Liguori, Paolo Bonzini

Am 22.07.2014 17:47, schrieb Le Tan:
> Add support for emulating Intel IOMMU according to the VT-d specification for
> the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
> PASID support. Use register-based invalidation for context-cache invalidation
> and IOTLB invalidation.
> Basic fault reporting and caching are not implemented yet.
>
> Signed-off-by: Le Tan <tamlokveer@gmail.com>
> ---
>  hw/i386/Makefile.objs         |    1 +
>  hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/i386/intel_iommu.h |  350 +++++++++++++
>  3 files changed, 1490 insertions(+)
>  create mode 100644 hw/i386/intel_iommu.c
>  create mode 100644 include/hw/i386/intel_iommu.h
>
[...]
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> new file mode 100644
> index 0000000..3ba0e1e
> --- /dev/null
> +++ b/hw/i386/intel_iommu.c
> @@ -0,0 +1,1139 @@
> +/*
> + * QEMU emulation of an Intel IOMMU (VT-d)
> + *   (DMA Remapping device)
> + *
> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + */
> +

I suggest replacing the FSF address here (and in other files) by the URL:

 * You should have received a copy of the GNU General Public License along
 * with this program; if not, see <http://www.gnu.org/licenses/>.

This is the standard used for most GPL text in QEMU source files.

Regards
Stefan W.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-23  7:58   ` Paolo Bonzini
@ 2014-07-23 23:21     ` Le Tan
  0 siblings, 0 replies; 12+ messages in thread
From: Le Tan @ 2014-07-23 23:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael S. Tsirkin, Knut Omang, qemu-devel, Alex Williamson,
	Jan Kiszka, Anthony Liguori

Hi Paolo,

2014-07-23 15:58 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> Il 22/07/2014 17:47, Le Tan ha scritto:
>> +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>> +                        uint64_t wmask, uint64_t w1cmask)
>> +{
>> +    *((uint64_t *)&s->csr[addr]) = val;
>
> All these casts are not endian-safe.  Please use ldl_le_p, ldq_le_p,
> stl_le_p, stq_le_p.

Thanks very much. Finally I got the idea here.:) Also thanks for your
renaming suggestions.

>> +    *((uint64_t *)&s->wmask[addr]) = wmask;
>> +    *((uint64_t *)&s->w1cmask[addr]) = w1cmask;
>> +}
>> +
>> +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
>> +                                  uint64_t mask)
>> +{
>> +    *((uint64_t *)&s->womask[addr]) = mask;
>> +}
>> +
>> +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
>> +                        uint32_t wmask, uint32_t w1cmask)
>> +{
>> +    *((uint32_t *)&s->csr[addr]) = val;
>> +    *((uint32_t *)&s->wmask[addr]) = wmask;
>> +    *((uint32_t *)&s->w1cmask[addr]) = w1cmask;
>> +}
>> +
>> +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
>> +                                  uint32_t mask)
>> +{
>> +    *((uint32_t *)&s->womask[addr]) = mask;
>> +}
>> +
>> +/* "External" get/set operations */
>> +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
>> +{
>> +    uint64_t oldval = *((uint64_t *)&s->csr[addr]);
>> +    uint64_t wmask = *((uint64_t *)&s->wmask[addr]);
>> +    uint64_t w1cmask = *((uint64_t *)&s->w1cmask[addr]);
>> +    *((uint64_t *)&s->csr[addr]) =
>> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
>> +}
>> +
>> +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
>> +{
>> +    uint32_t oldval = *((uint32_t *)&s->csr[addr]);
>> +    uint32_t wmask = *((uint32_t *)&s->wmask[addr]);
>> +    uint32_t w1cmask = *((uint32_t *)&s->w1cmask[addr]);
>> +    *((uint32_t *)&s->csr[addr]) =
>> +        ((oldval & ~wmask) | (val & wmask)) & ~(w1cmask & val);
>> +}
>> +
>> +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    uint64_t val = *((uint64_t *)&s->csr[addr]);
>> +    uint64_t womask = *((uint64_t *)&s->womask[addr]);
>> +    return val & ~womask;
>> +}
>> +
>> +
>> +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
>> +{
>> +    uint32_t val = *((uint32_t *)&s->csr[addr]);
>> +    uint32_t womask = *((uint32_t *)&s->womask[addr]);
>> +    return val & ~womask;
>> +}
>> +
>> +
>> +
>> +/* "Internal" get/set operations */
>> +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)
>
> get_quad_raw?
>
>> +{
>> +    return *((uint64_t *)&s->csr[addr]);
>> +}
>> +
>> +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)
>
> get_long_raw?
>
>> +{
>> +    return *((uint32_t *)&s->csr[addr]);
>> +}
>> +
>> +
>> +/* val = (val & ~clear) | mask */
>> +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,
>
> set_clear_long?
>
>> +                                     uint32_t clear, uint32_t mask)
>> +{
>> +    uint32_t *ptr = (uint32_t *)&s->csr[addr];
>> +    uint32_t val = (*ptr & ~clear) | mask;
>> +    *ptr = val;
>> +    return val;
>> +}
>> +
>> +/* val = (val & ~clear) | mask */
>> +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,
>
> set_clear_quad?
>> +                                     uint64_t clear, uint64_t mask)
>> +{
>> +    uint64_t *ptr = (uint64_t *)&s->csr[addr];
>> +    uint64_t val = (*ptr & ~clear) | mask;
>> +    *ptr = val;
>> +    return val;
>> +}
>> +
>> +
>

Regards,
Le

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-23 20:29   ` Stefan Weil
@ 2014-07-23 23:24     ` Le Tan
  2014-08-03 15:16       ` Knut Omang
  0 siblings, 1 reply; 12+ messages in thread
From: Le Tan @ 2014-07-23 23:24 UTC (permalink / raw)
  To: Stefan Weil
  Cc: Michael S. Tsirkin, Knut Omang, qemu-devel, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

Hi Stefan,

2014-07-24 4:29 GMT+08:00 Stefan Weil <sw@weilnetz.de>:
> Am 22.07.2014 17:47, schrieb Le Tan:
>> Add support for emulating Intel IOMMU according to the VT-d specification for
>> the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
>> PASID support. Use register-based invalidation for context-cache invalidation
>> and IOTLB invalidation.
>> Basic fault reporting and caching are not implemented yet.
>>
>> Signed-off-by: Le Tan <tamlokveer@gmail.com>
>> ---
>>  hw/i386/Makefile.objs         |    1 +
>>  hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
>>  include/hw/i386/intel_iommu.h |  350 +++++++++++++
>>  3 files changed, 1490 insertions(+)
>>  create mode 100644 hw/i386/intel_iommu.c
>>  create mode 100644 include/hw/i386/intel_iommu.h
>>
> [...]
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> new file mode 100644
>> index 0000000..3ba0e1e
>> --- /dev/null
>> +++ b/hw/i386/intel_iommu.c
>> @@ -0,0 +1,1139 @@
>> +/*
>> + * QEMU emulation of an Intel IOMMU (VT-d)
>> + *   (DMA Remapping device)
>> + *
>> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
>> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> + */
>> +
>
> I suggest replacing the FSF address here (and in other files) by the URL:
>
>  * You should have received a copy of the GNU General Public License along
>  * with this program; if not, see <http://www.gnu.org/licenses/>.
>
> This is the standard used for most GPL text in QEMU source files.

Get it. I copied it from the Linux kernel tree.
Thanks very much!

> Regards
> Stefan W.
>

Regards,
Le

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch
  2014-07-22 15:47 ` [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch Le Tan
@ 2014-07-26  8:47   ` Jan Kiszka
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kiszka @ 2014-07-26  8:47 UTC (permalink / raw)
  To: Le Tan, qemu-devel
  Cc: Paolo Bonzini, Alex Williamson, Knut Omang, Anthony Liguori,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On 2014-07-22 17:47, Le Tan wrote:
> Add Intel IOMMU emulation to q35 chipset and expose it to the guest.
> 1. Add a machine option. Users can use "-machine vtd=on|off" in the command
> line to enable/disable Intel IOMMU. The default is off.

Better call it "iommu". We could once reuse the switch when someone adds
an AMD IOMMU emulation (and a corresponding chipset as well, I suppose).
And maybe other target archs will find it useful as well.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
  2014-07-23 23:24     ` Le Tan
@ 2014-08-03 15:16       ` Knut Omang
  0 siblings, 0 replies; 12+ messages in thread
From: Knut Omang @ 2014-08-03 15:16 UTC (permalink / raw)
  To: Le Tan
  Cc: Michael S. Tsirkin, Stefan Weil, qemu-devel, Alex Williamson,
	Jan Kiszka, Anthony Liguori, Paolo Bonzini

On Thu, 2014-07-24 at 07:24 +0800, Le Tan wrote:
> Hi Stefan,
> 
> 2014-07-24 4:29 GMT+08:00 Stefan Weil <sw@weilnetz.de>:
> > Am 22.07.2014 17:47, schrieb Le Tan:
> >> Add support for emulating Intel IOMMU according to the VT-d specification for
> >> the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
> >> PASID support. Use register-based invalidation for context-cache invalidation
> >> and IOTLB invalidation.
> >> Basic fault reporting and caching are not implemented yet.
> >>
> >> Signed-off-by: Le Tan <tamlokveer@gmail.com>
> >> ---
> >>  hw/i386/Makefile.objs         |    1 +
> >>  hw/i386/intel_iommu.c         | 1139 +++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/i386/intel_iommu.h |  350 +++++++++++++
> >>  3 files changed, 1490 insertions(+)
> >>  create mode 100644 hw/i386/intel_iommu.c
> >>  create mode 100644 include/hw/i386/intel_iommu.h
> >>
> > [...]
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> new file mode 100644
> >> index 0000000..3ba0e1e
> >> --- /dev/null
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -0,0 +1,1139 @@
> >> +/*
> >> + * QEMU emulation of an Intel IOMMU (VT-d)
> >> + *   (DMA Remapping device)
> >> + *
> >> + * Copyright (c) 2013 Knut Omang, Oracle <knut.omang@oracle.com>
> >> + * Copyright (C) 2014 Le Tan, <tamlokveer@gmail.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or
> >> + * (at your option) any later version.
> >> +
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> +
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> >> + */
> >> +
> >
> > I suggest replacing the FSF address here (and in other files) by the URL:
> >
> >  * You should have received a copy of the GNU General Public License along
> >  * with this program; if not, see <http://www.gnu.org/licenses/>.
> >
> > This is the standard used for most GPL text in QEMU source files.
> 
> Get it. I copied it from the Linux kernel tree.
> Thanks very much!

I have no problems with switching to the URL version of the license.

Thanks for bringing this forward, Le!

Knut

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-03 15:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-22 15:47 [Qemu-devel] [PATCH 0/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset Le Tan
2014-07-22 15:47 ` [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation Le Tan
2014-07-22 20:05   ` Michael S. Tsirkin
2014-07-23  1:25     ` Le Tan
2014-07-23  7:58   ` Paolo Bonzini
2014-07-23 23:21     ` Le Tan
2014-07-23 20:29   ` Stefan Weil
2014-07-23 23:24     ` Le Tan
2014-08-03 15:16       ` Knut Omang
2014-07-22 15:47 ` [Qemu-devel] [PATCH 2/3] intel-iommu: add DMAR table to ACPI tables Le Tan
2014-07-22 15:47 ` [Qemu-devel] [PATCH 3/3] intel-iommu: add Intel IOMMU emulation to q35 and add a machine option "vtd" as a switch Le Tan
2014-07-26  8:47   ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.