All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
@ 2017-07-09 20:51 Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class Eric Auger
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

This series implements the emulation code for ARM SMMUv3.
This is the continuation of Prem's work [1].

This v5 mainly brings VFIO integration in DT mode. On guest kernel
side, this requires a quirk [1] to force TLB invalidation on map.

The following changes also are noticeable:
- fix SMMU_CMDQ_CONS offset
- adds dma-coherent dt property which fixes the unhandled command
  opcode bug.
- implements block PTE

The smmu is instantiated when passing the smmu option to machvirt:
"-M virt-2.10,smmu"

As I haven't split the code yet so that it can be easily reviewable
I don't expect deep reviews at this stage. Also the implementation may
be largely sub-optimal.

Tested Use Cases:
- booted a guest in dt and acpi mode with an iommu_platform
  virtio-net-pci device (using dma ops). Tested with the following
  guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
  64K - 48b.
- booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
  - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
  - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)

Unfortunately I have not been able to run DPDK testpmd yet on guest side.
The problem I see is the user space driver dma-maps a huge area
and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
(tlbi-on-map) which are sent for each page whereas the dma-map covers a
huge page. I will work on this issue for next version.

Known limitations:
- no VMSAv8-32 suport
- no nested stage support (S1 + S2)
- no support for HYP mappings
- register fine emulation, commands, interrupts and errors were
  not accurately tested. Handling is sufficient to run use cases
  described hereafter though.

Best Regards

Eric

This series can be found at:
v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4

References:
[1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
[2] Prem's last iteration:
- https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html

History:
v4 -> v5:
- initial_level now part of SMMUTransCfg
- smmu_page_walk_64 takes into account the max input size
- implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
- smmuv3_translate: bug fix: don't walk on bypass
- smmu_update_qreg: fix PROD index update
- I did not yet address Peter's comments as the code is not mature enough
  to be split into sub patches.

v3 -> v4 [Eric]:
- page table walk rewritten to allow scan of the page table within a
  range of IOVA. This prepares for VFIO integration and replay.
- configuration parsing partially reworked.
- do not advertise unsupported/untested features: S2, S1 + S2, HYP,
  PRI, ATS, ..
- added ACPI table generation
- migrated to dynamic traces
- mingw compilation fix

v2 -> v3 [Eric]:
- rebased on 2.9
- mostly code and patch reorganization to ease the review process
- optional patches removed. They may be handled separately. I am currently
  working on ACPI enablement.
- optional instantiation of the smmu in mach-virt
- removed [2/9] (fdt functions) since not mandated
- start splitting main patch into base and derived object
- no new function feature added

v1 -> v2 [Prem]:
- Adopted review comments from Eric Auger
        - Make SMMU_DPRINTF to internally call qemu_log
            (since translation requests are too many, we need control
             on the type of log we want)
        - SMMUTransCfg modified to suite simplicity
        - Change RegInfo to uint64 register array
        - Code cleanup
        - Test cleanups
- Reshuffled patches

v0 -> v1 [Prem]:
- As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
- Reworked register access/update logic
- Factored out translation code for
        - single point bug fix
        - sharing/removal in future
- (optional) Unit tests added, with PCI test device
        - S1 with 4k/64k, S1+S2 with 4k/64k
        - (S1 or S2) only can be verified by Linux 4.7 driver
        - (optional) Priliminary ACPI support

v0 [Prem]:
- Implements SMMUv3 spec 11.0
- Supported for PCIe devices,
- Command Queue and Event Queue supported
- LPAE only, S1 is supported and Tested, S2 not tested
- BE mode Translation not supported
- IRQ support (legacy, no MSI)
- Tested with DPDK and e1000


Eric Auger (5):
  hw/arm/smmu-common: smmu base class
  hw/arm/virt: Add 2.10 machine type
  hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
  target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
  hw/arm/smmuv3: VFIO integration

Prem Mallappa (3):
  hw/arm/smmuv3: smmuv3 emulation model
  hw/arm/virt: Add SMMUv3 to the virt board
  hw/arm/virt-acpi-build: Add smmuv3 node in IORT table

 default-configs/aarch64-softmmu.mak |    1 +
 hw/arm/Makefile.objs                |    1 +
 hw/arm/smmu-common.c                |  474 +++++++++++++
 hw/arm/smmu-internal.h              |   89 +++
 hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
 hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
 hw/arm/trace-events                 |   54 ++
 hw/arm/virt-acpi-build.c            |   56 +-
 hw/arm/virt.c                       |  111 +++-
 include/hw/acpi/acpi-defs.h         |   15 +
 include/hw/arm/smmu-common.h        |  127 ++++
 include/hw/arm/smmuv3.h             |   87 +++
 include/hw/arm/virt.h               |    5 +
 target/arm/kvm.c                    |   28 +
 target/arm/trace-events             |    3 +
 15 files changed, 2949 insertions(+), 9 deletions(-)
 create mode 100644 hw/arm/smmu-common.c
 create mode 100644 hw/arm/smmu-internal.h
 create mode 100644 hw/arm/smmuv3-internal.h
 create mode 100644 hw/arm/smmuv3.c
 create mode 100644 include/hw/arm/smmu-common.h
 create mode 100644 include/hw/arm/smmuv3.h

-- 
2.5.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-25 12:12   ` Tomasz Nowicki
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model Eric Auger
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Introduces the base device and class for the ARM smmu.
Implements VMSAv8-64 table lookup and translation. VMSAv8-32
is not implemented.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>

---
v4 -> v5:
- add initial level in translation config
- implement block pte
- rename must_translate into nofail
- introduce call_entry_hook
- small changes to dynamic traces
- smmu_page_walk code moved from smmuv3.c to this file
- remove smmu_translate*

v3 -> v4:
- reworked page table walk to prepare for VFIO integration
  (capability to scan a range of IOVA). Same function is used
  for translate for a single iova. This is largely inspired
  from intel_iommu.c
- as the translate function was not straightforward to me,
  I tried to stick more closely to the VMSA spec.
- remove support of nested stage (kernel driver does not
  support it anyway)
- introduce smmu-internal.h to put page table definitions
- added smmu_find_as_from_bus_num
- SMMU_PCI_BUS_MAX and SMMU_PCI_DEVFN_MAX in smmu-common header
- new fields in SMMUState:
  - iommu_ops, smmu_as_by_busptr, smmu_as_by_bus_num
- use error_report and trace events
- add aa64[] field in SMMUTransCfg

v3:
- moved the base code in a separate patch to ease the review.
- clearer separation between base class and smmuv3 class
- translate_* only implemented as class methods
---
 default-configs/aarch64-softmmu.mak |   1 +
 hw/arm/Makefile.objs                |   1 +
 hw/arm/smmu-common.c                | 474 ++++++++++++++++++++++++++++++++++++
 hw/arm/smmu-internal.h              |  97 ++++++++
 hw/arm/trace-events                 |  14 ++
 include/hw/arm/smmu-common.h        | 127 ++++++++++
 6 files changed, 714 insertions(+)
 create mode 100644 hw/arm/smmu-common.c
 create mode 100644 hw/arm/smmu-internal.h
 create mode 100644 include/hw/arm/smmu-common.h

diff --git a/default-configs/aarch64-softmmu.mak b/default-configs/aarch64-softmmu.mak
index 2449483..83a2932 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -7,3 +7,4 @@ CONFIG_AUX=y
 CONFIG_DDC=y
 CONFIG_DPCD=y
 CONFIG_XLNX_ZYNQMP=y
+CONFIG_ARM_SMMUV3=y
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 4c5c4ee..6c7d4af 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -18,3 +18,4 @@ obj-$(CONFIG_FSL_IMX25) += fsl-imx25.o imx25_pdk.o
 obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
 obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
+obj-$(CONFIG_ARM_SMMUV3) += smmu-common.o
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
new file mode 100644
index 0000000..9f56232
--- /dev/null
+++ b/hw/arm/smmu-common.c
@@ -0,0 +1,474 @@
+/*
+ * Copyright (C) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Prem Mallappa, Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Author: Prem Mallappa <pmallapp@broadcom.com>
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+#include "qemu/error-report.h"
+#include "hw/arm/smmu-common.h"
+#include "smmu-internal.h"
+
+inline MemTxResult smmu_read_sysmem(dma_addr_t addr, void *buf, dma_addr_t len,
+                                    bool secure)
+{
+    MemTxAttrs attrs = {.unspecified = 1, .secure = secure};
+
+    switch (len) {
+    case 4:
+        *(uint32_t *)buf = ldl_le_phys(&address_space_memory, addr);
+        break;
+    case 8:
+        *(uint64_t *)buf = ldq_le_phys(&address_space_memory, addr);
+        break;
+    default:
+        return address_space_rw(&address_space_memory, addr,
+                                attrs, buf, len, false);
+    }
+    return MEMTX_OK;
+}
+
+inline void
+smmu_write_sysmem(dma_addr_t addr, void *buf, dma_addr_t len, bool secure)
+{
+    MemTxAttrs attrs = {.unspecified = 1, .secure = secure};
+
+    switch (len) {
+    case 4:
+        stl_le_phys(&address_space_memory, addr, *(uint32_t *)buf);
+        break;
+    case 8:
+        stq_le_phys(&address_space_memory, addr, *(uint64_t *)buf);
+        break;
+    default:
+        address_space_rw(&address_space_memory, addr,
+                         attrs, buf, len, true);
+    }
+}
+
+/*************************/
+/* VMSAv8-64 Translation */
+/*************************/
+
+/**
+ * get_pte - Get the content of a page table entry located in
+ * @base_addr[@index]
+ */
+static uint64_t get_pte(dma_addr_t baseaddr, uint32_t index)
+{
+    uint64_t pte;
+
+    if (smmu_read_sysmem(baseaddr + index * sizeof(pte),
+                         &pte, sizeof(pte), false)) {
+        error_report("can't read pte at address=0x%"PRIx64,
+                     baseaddr + index * sizeof(pte));
+        pte = (uint64_t)-1;
+        return pte;
+    }
+    trace_smmu_get_pte(baseaddr, index, baseaddr + index * sizeof(pte), pte);
+    /* TODO: handle endianness */
+    return pte;
+}
+
+/* VMSAv8-64 Translation Table Format Descriptor Decoding */
+
+#define PTE_ADDRESS(pte, shift) (extract64(pte, shift, 47 - shift) << shift)
+
+/**
+ * get_page_pte_address - returns the L3 descriptor output address,
+ * ie. the page frame
+ * ARM ARM spec: Figure D4-17 VMSAv8-64 level 3 descriptor format
+ */
+static inline hwaddr get_page_pte_address(uint64_t pte, int granule_sz)
+{
+    return PTE_ADDRESS(pte, granule_sz);
+}
+
+/**
+ * get_table_pte_address - return table descriptor output address,
+ * ie. address of next level table
+ * ARM ARM Figure D4-16 VMSAv8-64 level0, level1, and level 2 descriptor formats
+ */
+static inline hwaddr get_table_pte_address(uint64_t pte, int granule_sz)
+{
+    return PTE_ADDRESS(pte, granule_sz);
+}
+
+/**
+ * get_block_pte_address - return block descriptor output address and block size
+ * ARM ARM Figure D4-16 VMSAv8-64 level0, level1, and level 2 descriptor formats
+ */
+static hwaddr get_block_pte_address(uint64_t pte, int level, int granule_sz,
+                                    uint64_t *bsz)
+{
+    int n;
+
+    switch (granule_sz) {
+    case 12:
+        if (level == 1) {
+            n = 30;
+        } else if (level == 2) {
+            n = 21;
+        } else {
+            goto error_out;
+        }
+        break;
+    case 14:
+        if (level == 2) {
+            n = 25;
+        } else {
+            goto error_out;
+        }
+        break;
+    case 16:
+        if (level == 2) {
+            n = 29;
+        } else {
+            goto error_out;
+        }
+        break;
+    default:
+            goto error_out;
+    }
+    *bsz = 1 << n;
+    return PTE_ADDRESS(pte, n);
+
+error_out:
+
+    error_report("unexpected granule_sz=%d/level=%d for block pte",
+                 granule_sz, level);
+    *bsz = 0;
+    return (hwaddr)-1;
+}
+
+static int call_entry_hook(uint64_t iova, uint64_t mask, uint64_t gpa,
+                           int perm, smmu_page_walk_hook hook_fn, void *private)
+{
+    IOMMUTLBEntry entry;
+    int ret;
+
+    entry.target_as = &address_space_memory;
+    entry.iova = iova & mask;
+    entry.translated_addr = gpa;
+    entry.addr_mask = ~mask;
+    entry.perm = perm;
+
+    ret = hook_fn(&entry, private);
+    if (ret) {
+        error_report("%s hook returned %d", __func__, ret);
+    }
+    return ret;
+}
+
+/**
+ * smmu_page_walk_level_64 - Walk an IOVA range from a specific level
+ * @baseaddr: table base address corresponding to @level
+ * @level: level
+ * @cfg: translation config
+ * @start: end of the IOVA range
+ * @end: end of the IOVA range
+ * @hook_fn: the hook that to be called for each detected area
+ * @private: private data for the hook function
+ * @read: whether parent level has read permission
+ * @write: whether parent level has write permission
+ * @nofail: indicates whether each iova of the range
+ *  must be translated or whether failure is allowed
+ * @notify_unmap: whether we should notify invalid entries
+ *
+ * Return 0 on success, < 0 on errors not related to translation
+ * process, > 1 on errors related to translation process (only
+ * if nofail is set)
+ */
+static int
+smmu_page_walk_level_64(dma_addr_t baseaddr, int level,
+                        SMMUTransCfg *cfg, uint64_t start, uint64_t end,
+                        smmu_page_walk_hook hook_fn, void *private,
+                        bool read, bool write, bool nofail,
+                        bool notify_unmap)
+{
+    uint64_t subpage_size, subpage_mask, pte, iova = start;
+    bool read_cur, write_cur, entry_valid;
+    int ret, granule_sz, stage;
+    IOMMUTLBEntry entry;
+
+    granule_sz = cfg->granule_sz;
+    stage = cfg->stage;
+    subpage_size = 1ULL << level_shift(level, granule_sz);
+    subpage_mask = level_page_mask(level, granule_sz);
+
+    trace_smmu_page_walk_level_in(level, baseaddr, granule_sz,
+                                  start, end, subpage_size);
+
+    while (iova < end) {
+        dma_addr_t next_table_baseaddr;
+        uint64_t iova_next, pte_addr;
+        uint32_t offset;
+
+        iova_next = (iova & subpage_mask) + subpage_size;
+        offset = iova_level_offset(iova, level, granule_sz);
+        pte_addr = baseaddr + offset * sizeof(pte);
+        pte = get_pte(baseaddr, offset);
+
+        trace_smmu_page_walk_level(level, iova, subpage_size,
+                                   baseaddr, offset, pte);
+
+        if (pte == (uint64_t)-1) {
+            if (nofail) {
+                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
+            }
+            goto next;
+        }
+        if (is_invalid_pte(pte) || is_reserved_pte(pte, level)) {
+            trace_smmu_page_walk_level_res_invalid_pte(stage, level, baseaddr,
+                                                       pte_addr, offset, pte);
+            if (nofail) {
+                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
+            }
+            goto next;
+        }
+
+        read_cur = read; /* TODO */
+        write_cur = write; /* TODO */
+        entry_valid = read_cur | write_cur; /* TODO */
+
+        if (is_page_pte(pte, level)) {
+            uint64_t gpa = get_page_pte_address(pte, granule_sz);
+            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+
+            trace_smmu_page_walk_level_page_pte(stage, level, entry.iova,
+                                                baseaddr, pte_addr, pte, gpa);
+            if (!entry_valid && !notify_unmap) {
+                printf("%s entry_valid=%d notify_unmap=%d\n", __func__,
+                       entry_valid, notify_unmap);
+                goto next;
+            }
+            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
+                                  hook_fn, private);
+            if (ret) {
+                return ret;
+            }
+            goto next;
+        }
+        if (is_block_pte(pte, level)) {
+            uint64_t block_size;
+            hwaddr gpa = get_block_pte_address(pte, level, granule_sz,
+                                               &block_size);
+            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+
+            if (gpa == -1) {
+                if (nofail) {
+                    return SMMU_TRANS_ERR_WALK_EXT_ABRT;
+                } else {
+                    goto next;
+                }
+            }
+            trace_smmu_page_walk_level_block_pte(stage, level, baseaddr,
+                                                 pte_addr, pte, iova, gpa,
+                                                 (int)(block_size >> 20));
+
+            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
+                                  hook_fn, private);
+            if (ret) {
+                return ret;
+            }
+            goto next;
+        }
+        if (level  == 3) {
+            goto next;
+        }
+        /* table pte */
+        next_table_baseaddr = get_table_pte_address(pte, granule_sz);
+        trace_smmu_page_walk_level_table_pte(stage, level, baseaddr, pte_addr,
+                                             pte, next_table_baseaddr);
+        ret = smmu_page_walk_level_64(next_table_baseaddr, level + 1, cfg,
+                                      iova, MIN(iova_next, end),
+                                      hook_fn, private, read_cur, write_cur,
+                                      nofail, notify_unmap);
+        if (!ret) {
+            return ret;
+        }
+
+next:
+        iova = iova_next;
+    }
+
+    return SMMU_TRANS_ERR_NONE;
+}
+
+/**
+ * smmu_page_walk_64 - walk a specific IOVA range from the initial
+ * lookup level, and call the hook for each valid entry
+ *
+ * @cfg: translation config
+ * @start: start of the IOVA range
+ * @end: end of the IOVA range
+ * @nofail: indicates whether each iova of the range
+ *  must be translated or whether failure is allowed
+ * @hook_fn: the hook that to be called for each detected area
+ * @private: private data for the hook function
+ */
+static int
+smmu_page_walk_64(SMMUTransCfg *cfg, uint64_t start, uint64_t end,
+                  bool nofail, smmu_page_walk_hook hook_fn,
+                  void *private)
+{
+    dma_addr_t ttbr;
+    int stage = cfg->stage;
+    uint64_t roof = MIN(end, (1ULL << (64 - cfg->tsz)) - 1);
+
+    if (!hook_fn) {
+        return 0;
+    }
+
+    ttbr = extract64(cfg->ttbr, 0, 48);
+
+    trace_smmu_page_walk_64(stage, cfg->ttbr, cfg->initial_level, start, roof);
+
+    return smmu_page_walk_level_64(ttbr, cfg->initial_level, cfg, start, roof,
+                                   hook_fn, private,
+                                   true /* read */, true /* write */,
+                                   nofail, false /* notify_unmap */);
+}
+
+static int set_translated_address(IOMMUTLBEntry *entry, void *private)
+{
+    SMMUTransCfg *cfg = (SMMUTransCfg *)private;
+    size_t offset = cfg->input - entry->iova;
+
+    cfg->output = entry->translated_addr + offset;
+
+    trace_smmu_set_translated_address(cfg->input, cfg->output);
+    return 0;
+}
+
+/**
+ * smmu_page_walk - Walk the page table for a given
+ * config and a given entry
+ *
+ * tlbe->iova must have been populated
+ */
+int smmu_page_walk(SMMUState *sys, SMMUTransCfg *cfg,
+                   IOMMUTLBEntry *tlbe, bool is_write)
+{
+    uint32_t page_size = 0, perm = 0;
+    int ret = 0;
+
+    trace_smmu_walk_pgtable(tlbe->iova, is_write);
+
+    if (cfg->bypassed || cfg->disabled) {
+        return 0;
+    }
+
+    cfg->input = tlbe->iova;
+
+    if (cfg->aa64) {
+        ret = smmu_page_walk_64(cfg, cfg->input, cfg->input + 1,
+                            true /* nofail */,
+                            set_translated_address, cfg);
+        page_size = 1 << cfg->granule_sz;
+    } else {
+        error_report("VMSAv8-32 translation is not yet implemented");
+        abort();
+    }
+
+    if (ret) {
+        error_report("PTW failed for iova=0x%"PRIx64" is_write=%d (%d)",
+                     cfg->input, is_write, ret);
+        goto exit;
+    }
+    tlbe->translated_addr = cfg->output;
+    tlbe->addr_mask = page_size - 1;
+    tlbe->perm = perm;
+
+    trace_smmu_walk_pgtable_out(tlbe->translated_addr,
+                                tlbe->addr_mask, tlbe->perm);
+exit:
+    return ret;
+}
+
+/*************************/
+/* VMSAv8-32 Translation */
+/*************************/
+
+static int
+smmu_page_walk_32(SMMUTransCfg *cfg, uint64_t start, uint64_t end,
+                  bool nofail, smmu_page_walk_hook hook_fn,
+                  void *private)
+{
+    error_report("VMSAv8-32 translation is not yet implemented");
+    abort();
+}
+
+/******************/
+/* Infrastructure */
+/******************/
+
+SMMUPciBus *smmu_find_as_from_bus_num(SMMUState *s, uint8_t bus_num)
+{
+    SMMUPciBus *smmu_pci_bus = s->smmu_as_by_bus_num[bus_num];
+
+    if (!smmu_pci_bus) {
+        GHashTableIter iter;
+
+        g_hash_table_iter_init(&iter, s->smmu_as_by_busptr);
+        while (g_hash_table_iter_next(&iter, NULL, (void **)&smmu_pci_bus)) {
+            if (pci_bus_num(smmu_pci_bus->bus) == bus_num) {
+                s->smmu_as_by_bus_num[bus_num] = smmu_pci_bus;
+                return smmu_pci_bus;
+            }
+        }
+    }
+    return smmu_pci_bus;
+}
+
+static void smmu_base_instance_init(Object *obj)
+{
+     /* Nothing much to do here as of now */
+}
+
+static void smmu_base_class_init(ObjectClass *klass, void *data)
+{
+    SMMUBaseClass *sbc = SMMU_DEVICE_CLASS(klass);
+
+    sbc->page_walk_64 = smmu_page_walk_64;
+
+    sbc->page_walk_32 = smmu_page_walk_32;
+}
+
+static const TypeInfo smmu_base_info = {
+    .name          = TYPE_SMMU_DEV_BASE,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(SMMUState),
+    .instance_init = smmu_base_instance_init,
+    .class_data    = NULL,
+    .class_size    = sizeof(SMMUBaseClass),
+    .class_init    = smmu_base_class_init,
+    .abstract      = true,
+};
+
+static void smmu_base_register_types(void)
+{
+    type_register_static(&smmu_base_info);
+}
+
+type_init(smmu_base_register_types)
+
diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
new file mode 100644
index 0000000..5e890bb
--- /dev/null
+++ b/hw/arm/smmu-internal.h
@@ -0,0 +1,97 @@
+/*
+ * ARM SMMU support - Internal API
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define ARM_LPAE_MAX_ADDR_BITS          48
+#define ARM_LPAE_MAX_LEVELS             4
+
+/* Page table bits */
+
+#ifndef HW_ARM_SMMU_INTERNAL_H
+#define HW_ARM_SMMU_INTERNAL_H
+
+#define ARM_LPAE_PTE_TYPE_SHIFT         0
+#define ARM_LPAE_PTE_TYPE_MASK          0x3
+
+#define ARM_LPAE_PTE_TYPE_BLOCK         1
+#define ARM_LPAE_PTE_TYPE_RESERVED      1
+#define ARM_LPAE_PTE_TYPE_TABLE         3
+#define ARM_LPAE_PTE_TYPE_PAGE          3
+
+#define ARM_LPAE_PTE_VALID              (1 << 0)
+
+static inline bool is_invalid_pte(uint64_t pte)
+{
+    return !(pte & ARM_LPAE_PTE_VALID);
+}
+
+static inline bool is_reserved_pte(uint64_t pte, int level)
+{
+    return ((level == 3) &&
+            ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_RESERVED));
+}
+
+static inline bool is_block_pte(uint64_t pte, int level)
+{
+    return ((level < 3) &&
+            ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_BLOCK));
+}
+
+static inline bool is_table_pte(uint64_t pte, int level)
+{
+    return ((level < 3) &&
+            ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_TABLE));
+}
+
+static inline bool is_page_pte(uint64_t pte, int level)
+{
+    return ((level == 3) &&
+            ((pte & ARM_LPAE_PTE_TYPE_MASK) == ARM_LPAE_PTE_TYPE_PAGE));
+}
+
+static inline int level_shift(int level, int granule_sz)
+{
+    return granule_sz + (3 - level) * (granule_sz - 3);
+}
+
+static inline uint64_t level_page_mask(int level, int granule_sz)
+{
+    return ~((1ULL << level_shift(level, granule_sz)) - 1);
+}
+
+/**
+ * TODO: handle the case where the level resolves less than
+ * granule_sz -3 IA bits.
+ */
+static inline
+uint64_t iova_level_offset(uint64_t iova, int level, int granule_sz)
+{
+    return (iova >> level_shift(level, granule_sz)) &
+            ((1ULL << (granule_sz - 3)) - 1);
+}
+
+/* TODO: check this for stage 2 and table concatenation */
+static inline int initial_lookup_level(int tnsz, int granule_sz)
+{
+    return 4 - (64 - tnsz - 4) / (granule_sz - 3);
+}
+
+
+
+#endif
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index d5f33a2..7a92f8c 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -2,3 +2,17 @@
 
 # hw/arm/virt-acpi-build.c
 virt_acpi_setup(void) "No fw cfg or ACPI disabled. Bailing out."
+
+# hw/arm/smmu-common.c
+
+smmu_page_walk_64(int stage, uint64_t baseaddr, int first_level, uint64_t start, uint64_t end) "stage=%d, baseaddr=0x%"PRIx64", first level=%d, start=0x%"PRIx64", end=0x%"PRIx64
+smmu_page_walk_level_in(int level, uint64_t baseaddr, int granule_sz, uint64_t start, uint64_t end, uint64_t subpage_size) "level=%d baseaddr=0x%"PRIx64" granule=%d, start=0x%"PRIx64" end=0x%"PRIx64", subpage_size=0x%lx"
+smmu_page_walk_level(int level, uint64_t iova, size_t subpage_size, uint64_t baseaddr, uint32_t offset, uint64_t pte) "level=%d iova=0x%lx subpage_sz=0x%lx baseaddr=0x%"PRIx64" offset=%d => pte=0x%lx"
+smmu_page_walk_level_res_invalid_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, uint32_t offset, uint64_t pte) "stage=%d level=%d base@=0x%"PRIx64" pte@=0x%"PRIx64" offset=%d pte=0x%lx"
+smmu_page_walk_level_page_pte(int stage, int level,  uint64_t iova, uint64_t baseaddr, uint64_t pteaddr, uint64_t pte, uint64_t address) "stage=%d level=%d iova=0x%"PRIx64" base@=0x%"PRIx64" pte@=0x%"PRIx64" pte=0x%"PRIx64" page address = 0x%"PRIx64
+smmu_page_walk_level_block_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, uint64_t pte, uint64_t iova, uint64_t gpa, int bsize_mb) "stage=%d level=%d base@=0x%"PRIx64" pte@=0x%"PRIx64" pte=0x%"PRIx64" iova=0x%"PRIx64" block address = 0x%"PRIx64" block size = %d MiB"
+smmu_page_walk_level_table_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, uint64_t pte, uint64_t address) "stage=%d, level=%d base@=0x%"PRIx64" pte@=0x%"PRIx64" pte=0x%"PRIx64" next table address = 0x%"PRIx64
+smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) "baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64
+smmu_set_translated_address(hwaddr iova, hwaddr pa) "iova = 0x%"PRIx64" -> pa = 0x%"PRIx64
+smmu_walk_pgtable(hwaddr iova, bool is_write) "Input addr: 0x%"PRIx64", is_write=%d"
+smmu_walk_pgtable_out(hwaddr addr, uint32_t mask, int perm) "DONE: o/p addr:0x%"PRIx64" mask:0x%x perm:%d"
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
new file mode 100644
index 0000000..8d681e8
--- /dev/null
+++ b/include/hw/arm/smmu-common.h
@@ -0,0 +1,127 @@
+/*
+ * ARM SMMU Support
+ *
+ * Copyright (C) 2015-2016 Broadcom Corporation
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Prem Mallappa, Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_ARM_SMMU_COMMON_H
+#define HW_ARM_SMMU_COMMON_H
+
+#include <hw/sysbus.h>
+#include "hw/pci/pci.h"
+
+#define SMMU_PCI_BUS_MAX      256
+#define SMMU_PCI_DEVFN_MAX    256
+
+typedef enum {
+    SMMU_TRANS_ERR_NONE          = 0x0,
+    SMMU_TRANS_ERR_WALK_EXT_ABRT = 0x1,  /* Translation walk external abort */
+    SMMU_TRANS_ERR_TRANS         = 0x10, /* Translation fault */
+    SMMU_TRANS_ERR_ADDR_SZ,              /* Address Size fault */
+    SMMU_TRANS_ERR_ACCESS,               /* Access fault */
+    SMMU_TRANS_ERR_PERM,                 /* Permission fault */
+    SMMU_TRANS_ERR_TLB_CONFLICT  = 0x20, /* TLB Conflict */
+} SMMUTransErr;
+
+/*
+ * Generic structure populated by derived SMMU devices
+ * after decoding the configuration information and used as
+ * input to the page table walk
+ */
+typedef struct SMMUTransCfg {
+    hwaddr   input;            /* input address */
+    hwaddr   output;           /* Output address */
+    int      stage;            /* translation stage */
+    uint32_t oas;              /* output address width */
+    uint32_t tsz;              /* input range, ie. 2^(64 -tnsz)*/
+    uint64_t ttbr;             /* TTBR address */
+    uint32_t granule_sz;       /* granule page shift */
+    bool     aa64;             /* arch64 or aarch32 translation table */
+    int      initial_level;    /* initial lookup level */
+    bool     disabled;         /* smmu is disabled */
+    bool     bypassed;         /* stage is bypassed */
+} SMMUTransCfg;
+
+typedef struct SMMUDevice {
+    void         *smmu;
+    PCIBus       *bus;
+    int           devfn;
+    MemoryRegion  iommu;
+    AddressSpace  as;
+} SMMUDevice;
+
+typedef struct SMMUNotifierNode {
+    SMMUDevice *sdev;
+    QLIST_ENTRY(SMMUNotifierNode) next;
+} SMMUNotifierNode;
+
+typedef struct SMMUPciBus {
+    PCIBus       *bus;
+    SMMUDevice   *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+} SMMUPciBus;
+
+typedef struct SMMUState {
+    /* <private> */
+    SysBusDevice  dev;
+
+    MemoryRegion iomem;
+
+    MemoryRegionIOMMUOps iommu_ops;
+    GHashTable *smmu_as_by_busptr;
+    SMMUPciBus *smmu_as_by_bus_num[SMMU_PCI_BUS_MAX];
+    QLIST_HEAD(, SMMUNotifierNode) notifiers_list;
+
+} SMMUState;
+
+typedef int (*smmu_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
+
+typedef struct {
+    /* <private> */
+    SysBusDeviceClass parent_class;
+
+    /* public */
+    int (*page_walk_32)(SMMUTransCfg *cfg, uint64_t start, uint64_t end,
+                        bool nofail, smmu_page_walk_hook hook_fn,
+                        void *private);
+    int (*page_walk_64)(SMMUTransCfg *cfg, uint64_t start, uint64_t end,
+                        bool nofail, smmu_page_walk_hook hook_fn,
+                        void *private);
+} SMMUBaseClass;
+
+#define TYPE_SMMU_DEV_BASE "smmu-base"
+#define SMMU_SYS_DEV(obj) OBJECT_CHECK(SMMUState, (obj), TYPE_SMMU_DEV_BASE)
+#define SMMU_DEVICE_GET_CLASS(obj)                              \
+    OBJECT_GET_CLASS(SMMUBaseClass, (obj), TYPE_SMMU_DEV_BASE)
+#define SMMU_DEVICE_CLASS(klass)                                    \
+    OBJECT_CLASS_CHECK(SMMUBaseClass, (klass), TYPE_SMMU_DEV_BASE)
+
+MemTxResult smmu_read_sysmem(dma_addr_t addr, void *buf,
+                             dma_addr_t len, bool secure);
+void smmu_write_sysmem(dma_addr_t addr, void *buf, dma_addr_t len, bool secure);
+
+SMMUPciBus *smmu_find_as_from_bus_num(SMMUState *s, uint8_t bus_num);
+
+static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
+{
+    return  ((pci_bus_num(sdev->bus) & 0xff) << 8) | sdev->devfn;
+}
+
+int smmu_page_walk(SMMUState *s, SMMUTransCfg *cfg,
+                   IOMMUTLBEntry *tlbe, bool is_write);
+
+#endif  /* HW_ARM_SMMU_COMMON */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-13 12:00   ` Tomasz Nowicki
  2017-07-13 12:57   ` Tomasz Nowicki
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 3/8] hw/arm/virt: Add SMMUv3 to the virt board Eric Auger
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

From: Prem Mallappa <prem.mallappa@broadcom.com>

Introduces the SMMUv3 derived model. This is based on
System MMUv3 specification (v17).

Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v4 -> v5:
- change smmuv3_translate proto (IOMMUAccessFlags flag)
- has_stagex replaced by is_ste_stagex
- smmu_cfg_populate removed
- added smmuv3_decode_config and reworked error management
- remwork the naming of IOMMU mrs
- fix SMMU_CMDQ_CONS offset

v3 -> v4
- smmu_irq_update
- fix hash key allocation
- set smmu_iommu_ops
- set SMMU_REG_CR0,
- smmuv3_translate: ret.perm not set in bypass mode
- use trace events
- renamed STM2U64 into L1STD_L2PTR and STMSPAN into L1STD_SPAN
- rework smmu_find_ste
- fix tg2granule in TT0/0b10 corresponds to 16kB

v2 -> v3:
- move creation of include/hw/arm/smmuv3.h to this patch to fix compil issue
- compilation allowed
- fix sbus allocation in smmu_init_pci_iommu
- restructure code into headers
- misc cleanups
---
 hw/arm/Makefile.objs     |    2 +-
 hw/arm/smmu-internal.h   |    8 -
 hw/arm/smmuv3-internal.h |  651 ++++++++++++++++++++++++++
 hw/arm/smmuv3.c          | 1133 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/arm/trace-events      |   34 ++
 include/hw/arm/smmuv3.h  |   87 ++++
 6 files changed, 1906 insertions(+), 9 deletions(-)
 create mode 100644 hw/arm/smmuv3-internal.h
 create mode 100644 hw/arm/smmuv3.c
 create mode 100644 include/hw/arm/smmuv3.h

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 6c7d4af..02cd23f 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -18,4 +18,4 @@ obj-$(CONFIG_FSL_IMX25) += fsl-imx25.o imx25_pdk.o
 obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
 obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
-obj-$(CONFIG_ARM_SMMUV3) += smmu-common.o
+obj-$(CONFIG_ARM_SMMUV3) += smmu-common.o smmuv3.o
diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h
index 5e890bb..3b1e222 100644
--- a/hw/arm/smmu-internal.h
+++ b/hw/arm/smmu-internal.h
@@ -86,12 +86,4 @@ uint64_t iova_level_offset(uint64_t iova, int level, int granule_sz)
             ((1ULL << (granule_sz - 3)) - 1);
 }
 
-/* TODO: check this for stage 2 and table concatenation */
-static inline int initial_lookup_level(int tnsz, int granule_sz)
-{
-    return 4 - (64 - tnsz - 4) / (granule_sz - 3);
-}
-
-
-
 #endif
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
new file mode 100644
index 0000000..740327c
--- /dev/null
+++ b/hw/arm/smmuv3-internal.h
@@ -0,0 +1,651 @@
+/*
+ * ARM SMMUv3 support - Internal API
+ *
+ * Copyright (C) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Prem Mallappa, Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_ARM_SMMU_V3_INTERNAL_H
+#define HW_ARM_SMMU_V3_INTERNAL_H
+
+#include "trace.h"
+#include "qemu/error-report.h"
+#include "hw/arm/smmu-common.h"
+
+/*****************************
+ * MMIO Register
+ *****************************/
+enum {
+    SMMU_REG_IDR0            = 0x0,
+
+/* IDR0 Field Values and supported features */
+
+#define SMMU_IDR0_S2P      1  /* stage 2 */
+#define SMMU_IDR0_S1P      1  /* stage 1 */
+#define SMMU_IDR0_TTF      2  /* Aarch64 only - not Aarch32 (LPAE) */
+#define SMMU_IDR0_COHACC   1  /* IO coherent access */
+#define SMMU_IDR0_HTTU     2  /* Access and Dirty flag update */
+#define SMMU_IDR0_HYP      0  /* Hypervisor Stage 1 contexts */
+#define SMMU_IDR0_ATS      0  /* PCIe RC ATS */
+#define SMMU_IDR0_ASID16   1  /* 16-bit ASID */
+#define SMMU_IDR0_PRI      0  /* Page Request Interface */
+#define SMMU_IDR0_VMID16   0  /* 16-bit VMID */
+#define SMMU_IDR0_CD2L     0  /* 2-level Context Descriptor table */
+#define SMMU_IDR0_STALL    1  /* Stalling fault model */
+#define SMMU_IDR0_TERM     1  /* Termination model behaviour */
+#define SMMU_IDR0_STLEVEL  1  /* Multi-level Stream Table */
+
+#define SMMU_IDR0_S2P_SHIFT      0
+#define SMMU_IDR0_S1P_SHIFT      1
+#define SMMU_IDR0_TTF_SHIFT      2
+#define SMMU_IDR0_COHACC_SHIFT   4
+#define SMMU_IDR0_HTTU_SHIFT     6
+#define SMMU_IDR0_HYP_SHIFT      9
+#define SMMU_IDR0_ATS_SHIFT      10
+#define SMMU_IDR0_ASID16_SHIFT   12
+#define SMMU_IDR0_PRI_SHIFT      16
+#define SMMU_IDR0_VMID16_SHIFT   18
+#define SMMU_IDR0_CD2L_SHIFT     19
+#define SMMU_IDR0_STALL_SHIFT    24
+#define SMMU_IDR0_TERM_SHIFT     26
+#define SMMU_IDR0_STLEVEL_SHIFT  27
+
+    SMMU_REG_IDR1            = 0x4,
+#define SMMU_IDR1_SIDSIZE 16
+    SMMU_REG_IDR2            = 0x8,
+    SMMU_REG_IDR3            = 0xc,
+    SMMU_REG_IDR4            = 0x10,
+    SMMU_REG_IDR5            = 0x14,
+#define SMMU_IDR5_GRAN_SHIFT 4
+#define SMMU_IDR5_GRAN       0b101 /* GRAN4K, GRAN64K */
+#define SMMU_IDR5_OAS        4     /* 44 bits */
+    SMMU_REG_IIDR            = 0x1c,
+    SMMU_REG_CR0             = 0x20,
+
+#define SMMU_CR0_SMMU_ENABLE (1 << 0)
+#define SMMU_CR0_PRIQ_ENABLE (1 << 1)
+#define SMMU_CR0_EVTQ_ENABLE (1 << 2)
+#define SMMU_CR0_CMDQ_ENABLE (1 << 3)
+#define SMMU_CR0_ATS_CHECK   (1 << 4)
+
+    SMMU_REG_CR0_ACK         = 0x24,
+    SMMU_REG_CR1             = 0x28,
+    SMMU_REG_CR2             = 0x2c,
+
+    SMMU_REG_STATUSR         = 0x40,
+
+    SMMU_REG_IRQ_CTRL        = 0x50,
+    SMMU_REG_IRQ_CTRL_ACK    = 0x54,
+
+#define SMMU_IRQ_CTRL_GERROR_EN (1 << 0)
+#define SMMU_IRQ_CTRL_EVENT_EN  (1 << 1)
+#define SMMU_IRQ_CTRL_PRI_EN    (1 << 2)
+
+    SMMU_REG_GERROR          = 0x60,
+
+#define SMMU_GERROR_CMDQ       (1 << 0)
+#define SMMU_GERROR_EVENTQ     (1 << 2)
+#define SMMU_GERROR_PRIQ       (1 << 3)
+#define SMMU_GERROR_MSI_CMDQ   (1 << 4)
+#define SMMU_GERROR_MSI_EVENTQ (1 << 5)
+#define SMMU_GERROR_MSI_PRIQ   (1 << 6)
+#define SMMU_GERROR_MSI_GERROR (1 << 7)
+#define SMMU_GERROR_SFM_ERR    (1 << 8)
+
+    SMMU_REG_GERRORN         = 0x64,
+    SMMU_REG_GERROR_IRQ_CFG0 = 0x68,
+    SMMU_REG_GERROR_IRQ_CFG1 = 0x70,
+    SMMU_REG_GERROR_IRQ_CFG2 = 0x74,
+
+    /* SMMU_BASE_RA Applies to STRTAB_BASE, CMDQ_BASE and EVTQ_BASE */
+#define SMMU_BASE_RA        (1ULL << 62)
+    SMMU_REG_STRTAB_BASE     = 0x80,
+    SMMU_REG_STRTAB_BASE_CFG = 0x88,
+
+    SMMU_REG_CMDQ_BASE       = 0x90,
+    SMMU_REG_CMDQ_PROD       = 0x98,
+    SMMU_REG_CMDQ_CONS       = 0x9c,
+    /* CMD Consumer (CONS) */
+#define SMMU_CMD_CONS_ERR_SHIFT        24
+#define SMMU_CMD_CONS_ERR_BITS         7
+
+    SMMU_REG_EVTQ_BASE       = 0xa0,
+    SMMU_REG_EVTQ_PROD       = 0xa8,
+    SMMU_REG_EVTQ_CONS       = 0xac,
+    SMMU_REG_EVTQ_IRQ_CFG0   = 0xb0,
+    SMMU_REG_EVTQ_IRQ_CFG1   = 0xb8,
+    SMMU_REG_EVTQ_IRQ_CFG2   = 0xbc,
+
+    SMMU_REG_PRIQ_BASE       = 0xc0,
+    SMMU_REG_PRIQ_PROD       = 0xc8,
+    SMMU_REG_PRIQ_CONS       = 0xcc,
+    SMMU_REG_PRIQ_IRQ_CFG0   = 0xd0,
+    SMMU_REG_PRIQ_IRQ_CFG1   = 0xd8,
+    SMMU_REG_PRIQ_IRQ_CFG2   = 0xdc,
+
+    SMMU_ID_REGS_OFFSET      = 0xfd0,
+
+    /* Secure registers are not used for now */
+    SMMU_SECURE_OFFSET       = 0x8000,
+};
+
+/**********************
+ * Data Structures
+ **********************/
+
+struct __smmu_data2 {
+    uint32_t word[2];
+};
+
+struct __smmu_data8 {
+    uint32_t word[8];
+};
+
+struct __smmu_data16 {
+    uint32_t word[16];
+};
+
+struct __smmu_data4 {
+    uint32_t word[4];
+};
+
+typedef struct __smmu_data2  STEDesc; /* STE Level 1 Descriptor */
+typedef struct __smmu_data16 Ste;     /* Stream Table Entry(STE) */
+typedef struct __smmu_data2  CDDesc;  /* CD Level 1 Descriptor */
+typedef struct __smmu_data16 Cd;      /* Context Descriptor(CD) */
+
+typedef struct __smmu_data4  Cmd; /* Command Entry */
+typedef struct __smmu_data8  Evt; /* Event Entry */
+typedef struct __smmu_data4  Pri; /* PRI entry */
+
+/*****************************
+ * STE fields
+ *****************************/
+
+#define STE_VALID(x)   extract32((x)->word[0], 0, 1) /* 0 */
+#define STE_CONFIG(x)  extract32((x)->word[0], 1, 3)
+enum {
+    STE_CONFIG_NONE      = 0,
+    STE_CONFIG_BYPASS    = 4,       /* S1 Bypass    , S2 Bypass */
+    STE_CONFIG_S1        = 5,       /* S1 Translate , S2 Bypass */
+    STE_CONFIG_S2        = 6,       /* S1 Bypass    , S2 Translate */
+    STE_CONFIG_NESTED    = 7,       /* S1 Translate , S2 Translate */
+};
+#define STE_S1FMT(x)   extract32((x)->word[0], 4, 2)
+#define STE_S1CDMAX(x) extract32((x)->word[1], 27, 5)
+#define STE_EATS(x)    extract32((x)->word[2], 28, 2)
+#define STE_STRW(x)    extract32((x)->word[2], 30, 2)
+#define STE_S2VMID(x)  extract32((x)->word[4], 0, 16)
+#define STE_S2T0SZ(x)  extract32((x)->word[5], 0, 6)
+#define STE_S2SL0(x)   extract32((x)->word[5], 6, 2)
+#define STE_S2TG(x)    extract32((x)->word[5], 14, 2)
+#define STE_S2PS(x)    extract32((x)->word[5], 16, 3)
+#define STE_S2AA64(x)  extract32((x)->word[5], 19, 1)
+#define STE_S2HD(x)    extract32((x)->word[5], 24, 1)
+#define STE_S2HA(x)    extract32((x)->word[5], 25, 1)
+#define STE_S2S(x)     extract32((x)->word[5], 26, 1)
+#define STE_CTXPTR(x)                                           \
+    ({                                                          \
+        unsigned long addr;                                     \
+        addr = (uint64_t)extract32((x)->word[1], 0, 16) << 32;  \
+        addr |= (uint64_t)((x)->word[0] & 0xffffffc0);          \
+        addr;                                                   \
+    })
+
+#define STE_S2TTB(x)                                            \
+    ({                                                          \
+        unsigned long addr;                                     \
+        addr = (uint64_t)extract32((x)->word[7], 0, 16) << 32;  \
+        addr |= (uint64_t)((x)->word[6] & 0xfffffff0);          \
+        addr;                                                   \
+    })
+
+static inline int is_ste_bypass(Ste *ste)
+{
+    return STE_CONFIG(ste) == STE_CONFIG_BYPASS;
+}
+
+static inline bool is_ste_stage1(Ste *ste)
+{
+    return STE_CONFIG(ste) == STE_CONFIG_S1;
+}
+
+static inline bool is_ste_stage2(Ste *ste)
+{
+    return STE_CONFIG(ste) == STE_CONFIG_S2;
+}
+
+/**
+ * is_s2granule_valid - Check the stage 2 translation granule size
+ * advertised in the STE matches any IDR5 supported value
+ */
+static inline bool is_s2granule_valid(Ste *ste)
+{
+    int idr5_format = 0;
+
+    switch (STE_S2TG(ste)) {
+    case 0: /* 4kB */
+        idr5_format = 0x1;
+        break;
+    case 1: /* 64 kB */
+        idr5_format = 0x4;
+        break;
+    case 2: /* 16 kB */
+        idr5_format = 0x2;
+        break;
+    case 3: /* reserved */
+        break;
+    }
+    idr5_format &= SMMU_IDR5_GRAN;
+    return idr5_format;
+}
+
+static inline int oas2bits(int oas_field)
+{
+    switch (oas_field) {
+    case 0b011:
+        return 42;
+    case 0b100:
+        return 44;
+    default:
+        return 32 + (1 << oas_field);
+   }
+}
+
+static inline int pa_range(Ste *ste)
+{
+    int oas_field = MIN(STE_S2PS(ste), SMMU_IDR5_OAS);
+
+    if (!STE_S2AA64(ste)) {
+        return 40;
+    }
+
+    return oas2bits(oas_field);
+}
+
+#define MAX_PA(ste) ((1 << pa_range(ste)) - 1)
+
+/*****************************
+ * CD fields
+ *****************************/
+#define CD_VALID(x)   extract32((x)->word[0], 30, 1)
+#define CD_ASID(x)    extract32((x)->word[1], 16, 16)
+#define CD_TTB(x, sel)                                      \
+    ({                                                      \
+        uint64_t hi, lo;                                    \
+        hi = extract32((x)->word[(sel) * 2 + 3], 0, 16);    \
+        hi <<= 32;                                          \
+        lo = (x)->word[(sel) * 2 + 2] & ~0xf;               \
+        hi | lo;                                            \
+    })
+
+#define CD_TSZ(x, sel)   extract32((x)->word[0], (16 * (sel)) + 0, 6)
+#define CD_TG(x, sel)    extract32((x)->word[0], (16 * (sel)) + 6, 2)
+#define CD_EPD(x, sel)   extract32((x)->word[0], (16 * (sel)) + 14, 1)
+
+#define CD_T0SZ(x)    CD_TSZ((x), 0)
+#define CD_T1SZ(x)    CD_TSZ((x), 1)
+#define CD_TG0(x)     CD_TG((x), 0)
+#define CD_TG1(x)     CD_TG((x), 1)
+#define CD_EPD0(x)    CD_EPD((x), 0)
+#define CD_EPD1(x)    CD_EPD((x), 1)
+#define CD_IPS(x)     extract32((x)->word[1], 0, 3)
+#define CD_AARCH64(x) extract32((x)->word[1], 9, 1)
+#define CD_TTB0(x)    CD_TTB((x), 0)
+#define CD_TTB1(x)    CD_TTB((x), 1)
+
+#define CDM_VALID(x)    ((x)->word[0] & 0x1)
+
+static inline int is_cd_valid(SMMUV3State *s, Ste *ste, Cd *cd)
+{
+    return CD_VALID(cd);
+}
+
+/*****************************
+ * Commands
+ *****************************/
+enum {
+    SMMU_CMD_PREFETCH_CONFIG = 0x01,
+    SMMU_CMD_PREFETCH_ADDR,
+    SMMU_CMD_CFGI_STE,
+    SMMU_CMD_CFGI_STE_RANGE,
+    SMMU_CMD_CFGI_CD,
+    SMMU_CMD_CFGI_CD_ALL,
+    SMMU_CMD_CFGI_ALL,
+    SMMU_CMD_TLBI_NH_ALL     = 0x10,
+    SMMU_CMD_TLBI_NH_ASID,
+    SMMU_CMD_TLBI_NH_VA,
+    SMMU_CMD_TLBI_NH_VAA,
+    SMMU_CMD_TLBI_EL3_ALL    = 0x18,
+    SMMU_CMD_TLBI_EL3_VA     = 0x1a,
+    SMMU_CMD_TLBI_EL2_ALL    = 0x20,
+    SMMU_CMD_TLBI_EL2_ASID,
+    SMMU_CMD_TLBI_EL2_VA,
+    SMMU_CMD_TLBI_EL2_VAA,  /* 0x23 */
+    SMMU_CMD_TLBI_S12_VMALL  = 0x28,
+    SMMU_CMD_TLBI_S2_IPA     = 0x2a,
+    SMMU_CMD_TLBI_NSNH_ALL   = 0x30,
+    SMMU_CMD_ATC_INV         = 0x40,
+    SMMU_CMD_PRI_RESP,
+    SMMU_CMD_RESUME          = 0x44,
+    SMMU_CMD_STALL_TERM,
+    SMMU_CMD_SYNC,          /* 0x46 */
+};
+
+static const char *cmd_stringify[] = {
+    [SMMU_CMD_PREFETCH_CONFIG] = "SMMU_CMD_PREFETCH_CONFIG",
+    [SMMU_CMD_PREFETCH_ADDR]   = "SMMU_CMD_PREFETCH_ADDR",
+    [SMMU_CMD_CFGI_STE]        = "SMMU_CMD_CFGI_STE",
+    [SMMU_CMD_CFGI_STE_RANGE]  = "SMMU_CMD_CFGI_STE_RANGE",
+    [SMMU_CMD_CFGI_CD]         = "SMMU_CMD_CFGI_CD",
+    [SMMU_CMD_CFGI_CD_ALL]     = "SMMU_CMD_CFGI_CD_ALL",
+    [SMMU_CMD_CFGI_ALL]        = "SMMU_CMD_CFGI_ALL",
+    [SMMU_CMD_TLBI_NH_ALL]     = "SMMU_CMD_TLBI_NH_ALL",
+    [SMMU_CMD_TLBI_NH_ASID]    = "SMMU_CMD_TLBI_NH_ASID",
+    [SMMU_CMD_TLBI_NH_VA]      = "SMMU_CMD_TLBI_NH_VA",
+    [SMMU_CMD_TLBI_NH_VAA]     = "SMMU_CMD_TLBI_NH_VAA",
+    [SMMU_CMD_TLBI_EL3_ALL]    = "SMMU_CMD_TLBI_EL3_ALL",
+    [SMMU_CMD_TLBI_EL3_VA]     = "SMMU_CMD_TLBI_EL3_VA",
+    [SMMU_CMD_TLBI_EL2_ALL]    = "SMMU_CMD_TLBI_EL2_ALL",
+    [SMMU_CMD_TLBI_EL2_ASID]   = "SMMU_CMD_TLBI_EL2_ASID",
+    [SMMU_CMD_TLBI_EL2_VA]     = "SMMU_CMD_TLBI_EL2_VA",
+    [SMMU_CMD_TLBI_EL2_VAA]    = "SMMU_CMD_TLBI_EL2_VAA",
+    [SMMU_CMD_TLBI_S12_VMALL]  = "SMMU_CMD_TLBI_S12_VMALL",
+    [SMMU_CMD_TLBI_S2_IPA]     = "SMMU_CMD_TLBI_S2_IPA",
+    [SMMU_CMD_TLBI_NSNH_ALL]   = "SMMU_CMD_TLBI_NSNH_ALL",
+    [SMMU_CMD_ATC_INV]         = "SMMU_CMD_ATC_INV",
+    [SMMU_CMD_PRI_RESP]        = "SMMU_CMD_PRI_RESP",
+    [SMMU_CMD_RESUME]          = "SMMU_CMD_RESUME",
+    [SMMU_CMD_STALL_TERM]      = "SMMU_CMD_STALL_TERM",
+    [SMMU_CMD_SYNC]            = "SMMU_CMD_SYNC",
+};
+
+/*****************************
+ *  Register Access Primitives
+ *****************************/
+
+static inline void smmu_write64_reg(SMMUV3State *s, uint32_t addr, uint64_t val)
+{
+    addr >>= 2;
+    s->regs[addr] = val & 0xFFFFFFFFULL;
+    s->regs[addr + 1] = val & ~0xFFFFFFFFULL;
+}
+
+static inline void smmu_write_reg(SMMUV3State *s, uint32_t addr, uint64_t val)
+{
+    s->regs[addr >> 2] = val;
+}
+
+static inline uint32_t smmu_read_reg(SMMUV3State *s, uint32_t addr)
+{
+    return s->regs[addr >> 2];
+}
+
+static inline uint64_t smmu_read64_reg(SMMUV3State *s, uint32_t addr)
+{
+    addr >>= 2;
+    return s->regs[addr] | (s->regs[addr + 1] << 32);
+}
+
+#define smmu_read32_reg smmu_read_reg
+#define smmu_write32_reg smmu_write_reg
+
+/*****************************
+ * CMDQ fields
+ *****************************/
+
+enum { /* Command Errors */
+    SMMU_CMD_ERR_NONE = 0,
+    SMMU_CMD_ERR_ILLEGAL,
+    SMMU_CMD_ERR_ABORT
+};
+
+enum { /* Command completion notification */
+    CMD_SYNC_SIG_NONE,
+    CMD_SYNC_SIG_IRQ,
+    CMD_SYNC_SIG_SEV,
+};
+
+#define CMD_TYPE(x)  extract32((x)->word[0], 0, 8)
+#define CMD_SEC(x)   extract32((x)->word[0], 9, 1)
+#define CMD_SEV(x)   extract32((x)->word[0], 10, 1)
+#define CMD_AC(x)    extract32((x)->word[0], 12, 1)
+#define CMD_AB(x)    extract32((x)->word[0], 13, 1)
+#define CMD_CS(x)    extract32((x)->word[0], 12, 2)
+#define CMD_SSID(x)  extract32((x)->word[0], 16, 16)
+#define CMD_SID(x)   ((x)->word[1])
+#define CMD_VMID(x)  extract32((x)->word[1], 0, 16)
+#define CMD_ASID(x)  extract32((x)->word[1], 16, 16)
+#define CMD_STAG(x)  extract32((x)->word[2], 0, 16)
+#define CMD_RESP(x)  extract32((x)->word[2], 11, 2)
+#define CMD_GRPID(x) extract32((x)->word[3], 0, 8)
+#define CMD_SIZE(x)  extract32((x)->word[3], 0, 16)
+#define CMD_LEAF(x)  extract32((x)->word[3], 0, 1)
+#define CMD_SPAN(x)  extract32((x)->word[3], 0, 5)
+#define CMD_ADDR(x) ({                                  \
+            uint64_t addr = (uint64_t)(x)->word[3];     \
+            addr <<= 32;                                \
+            addr |=  extract32((x)->word[3], 12, 20);   \
+            addr;                                       \
+        })
+
+/***************************
+ * Queue Handling
+ ***************************/
+
+typedef enum {
+    CMD_Q_EMPTY,
+    CMD_Q_FULL,
+    CMD_Q_INUSE,
+} SMMUQStatus;
+
+#define Q_ENTRY(q, idx)  (q->base + q->ent_size * idx)
+#define Q_WRAP(q, pc)    ((pc) >> (q)->shift)
+#define Q_IDX(q, pc)     ((pc) & ((1 << (q)->shift) - 1))
+
+static inline SMMUQStatus __smmu_queue_status(SMMUV3State *s, SMMUQueue *q)
+{
+    uint32_t prod = Q_IDX(q, q->prod);
+    uint32_t cons = Q_IDX(q, q->cons);
+
+    if ((prod == cons) && (q->wrap.prod != q->wrap.cons)) {
+        return CMD_Q_FULL;
+    } else if ((prod == cons) && (q->wrap.prod == q->wrap.cons)) {
+        return CMD_Q_EMPTY;
+    }
+    return CMD_Q_INUSE;
+}
+#define smmu_is_q_full(s, q) (__smmu_queue_status(s, q) == CMD_Q_FULL)
+#define smmu_is_q_empty(s, q) (__smmu_queue_status(s, q) == CMD_Q_EMPTY)
+
+static inline int __smmu_q_enabled(SMMUV3State *s, uint32_t q)
+{
+    return smmu_read32_reg(s, SMMU_REG_CR0) & q;
+}
+#define smmu_cmd_q_enabled(s) __smmu_q_enabled(s, SMMU_CR0_CMDQ_ENABLE)
+#define smmu_evt_q_enabled(s) __smmu_q_enabled(s, SMMU_CR0_EVTQ_ENABLE)
+
+#define SMMU_CMDQ_ERR(s) ((smmu_read32_reg(s, SMMU_REG_GERROR) ^    \
+                           smmu_read32_reg(s, SMMU_REG_GERRORN)) &  \
+                          SMMU_GERROR_CMDQ)
+
+static inline void smmuv3_init_queues(SMMUV3State *s)
+{
+    s->cmdq.prod = 0;
+    s->cmdq.cons = 0;
+    s->cmdq.wrap.prod = 0;
+    s->cmdq.wrap.cons = 0;
+
+    s->evtq.prod = 0;
+    s->evtq.cons = 0;
+    s->evtq.wrap.prod = 0;
+    s->evtq.wrap.cons = 0;
+
+    s->priq.prod = 0;
+    s->priq.cons = 0;
+    s->priq.wrap.prod = 0;
+    s->priq.wrap.cons = 0;
+}
+
+/*****************************
+ * EVTQ fields
+ *****************************/
+
+#define EVT_Q_OVERFLOW        (1 << 31)
+
+#define EVT_SET_TYPE(x, t)    deposit32((x)->word[0], 0, 8, t)
+#define EVT_SET_SID(x, s)     ((x)->word[1] =  s)
+#define EVT_SET_INPUT_ADDR(x, addr) ({                    \
+            (x)->word[5] = (uint32_t)(addr >> 32);        \
+            (x)->word[4] = (uint32_t)(addr & 0xffffffff); \
+            addr;                                         \
+        })
+
+/*****************************
+ * Events
+ *****************************/
+
+enum evt_err {
+    SMMU_EVT_F_UUT    = 0x1,
+    SMMU_EVT_C_BAD_SID,
+    SMMU_EVT_F_STE_FETCH,
+    SMMU_EVT_C_BAD_STE,
+    SMMU_EVT_F_BAD_ATS_REQ,
+    SMMU_EVT_F_STREAM_DISABLED,
+    SMMU_EVT_F_TRANS_FORBIDDEN,
+    SMMU_EVT_C_BAD_SSID,
+    SMMU_EVT_F_CD_FETCH,
+    SMMU_EVT_C_BAD_CD,
+    SMMU_EVT_F_WALK_EXT_ABRT,
+    SMMU_EVT_F_TRANS        = 0x10,
+    SMMU_EVT_F_ADDR_SZ,
+    SMMU_EVT_F_ACCESS,
+    SMMU_EVT_F_PERM,
+    SMMU_EVT_F_TLB_CONFLICT = 0x20,
+    SMMU_EVT_F_CFG_CONFLICT = 0x21,
+    SMMU_EVT_E_PAGE_REQ     = 0x24,
+};
+
+typedef enum evt_err SMMUEvtErr;
+
+/*****************************
+ * Interrupts
+ *****************************/
+
+static inline int __smmu_irq_enabled(SMMUV3State *s, uint32_t q)
+{
+    return smmu_read64_reg(s, SMMU_REG_IRQ_CTRL) & q;
+}
+#define smmu_evt_irq_enabled(s)                   \
+    __smmu_irq_enabled(s, SMMU_IRQ_CTRL_EVENT_EN)
+#define smmu_gerror_irq_enabled(s)                  \
+    __smmu_irq_enabled(s, SMMU_IRQ_CTRL_GERROR_EN)
+#define smmu_pri_irq_enabled(s)                 \
+    __smmu_irq_enabled(s, SMMU_IRQ_CTRL_PRI_EN)
+
+static inline bool
+smmu_is_irq_pending(SMMUV3State *s, int irq)
+{
+    return smmu_read32_reg(s, SMMU_REG_GERROR) ^
+        smmu_read32_reg(s, SMMU_REG_GERRORN);
+}
+
+/*****************************
+ * Hash Table
+ *****************************/
+
+static inline gboolean smmu_uint64_equal(gconstpointer v1, gconstpointer v2)
+{
+    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+}
+
+static inline guint smmu_uint64_hash(gconstpointer v)
+{
+    return (guint)*(const uint64_t *)v;
+}
+
+/*****************************
+ * Misc
+ *****************************/
+
+/**
+ * tg2granule - Decodes the CD translation granule size field according
+ * to the TT in use
+ * @bits: TG0/1 fiels
+ * @tg1: if set, @bits belong to TG1, otherwise belong to TG0
+ */
+static inline int tg2granule(int bits, bool tg1)
+{
+    switch (bits) {
+    case 1:
+        return tg1 ? 14 : 16;
+    case 2:
+        return tg1 ? 12 : 14;
+    case 3:
+        return tg1 ? 16 : 12;
+    default:
+        return 12;
+    }
+}
+
+#define L1STD_L2PTR(stm) ({                                 \
+            uint64_t hi, lo;                            \
+            hi = (stm)->word[1];                        \
+            lo = (stm)->word[0] & ~(uint64_t)0x1f;      \
+            hi << 32 | lo;                              \
+        })
+
+#define L1STD_SPAN(stm) (extract32((stm)->word[0], 0, 4))
+
+/*****************************
+ * Debug
+ *****************************/
+#define ARM_SMMU_DEBUG
+
+#ifdef ARM_SMMU_DEBUG
+static inline void dump_ste(Ste *ste)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(ste->word); i += 2) {
+        trace_smmuv3_dump_ste(i, ste->word[i], i + 1, ste->word[i + 1]);
+    }
+}
+
+static inline void dump_cd(Cd *cd)
+{
+    int i;
+    for (i = 0; i < ARRAY_SIZE(cd->word); i += 2) {
+        trace_smmuv3_dump_cd(i, cd->word[i], i + 1, cd->word[i + 1]);
+    }
+}
+
+static inline void dump_cmd(Cmd *cmd)
+{
+    int i;
+    for (i = 0; i < ARRAY_SIZE(cmd->word); i += 2) {
+        trace_smmuv3_dump_cmd(i, cmd->word[i], i + 1, cmd->word[i + 1]);
+    }
+}
+
+#else
+#define dump_ste(...) do {} while (0)
+#define dump_cd(...) do {} while (0)
+#define dump_cmd(...) do {} while (0)
+#endif /* ARM_SMMU_DEBUG */
+
+#endif
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
new file mode 100644
index 0000000..639f682
--- /dev/null
+++ b/hw/arm/smmuv3.c
@@ -0,0 +1,1133 @@
+/*
+ * Copyright (C) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Prem Mallappa, Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/boards.h"
+#include "sysemu/sysemu.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+#include "qemu/error-report.h"
+
+#include "hw/arm/smmuv3.h"
+#include "smmuv3-internal.h"
+
+static inline int smmu_enabled(SMMUV3State *s)
+{
+    return smmu_read32_reg(s, SMMU_REG_CR0) & SMMU_CR0_SMMU_ENABLE;
+}
+
+/**
+ * smmu_irq_update - update the GERROR register according to
+ * the IRQ and the enable state
+ *
+ * return > 0 when IRQ is supposed to be raised
+ */
+static int smmu_irq_update(SMMUV3State *s, int irq, uint64_t data)
+{
+    uint32_t error = 0;
+
+    if (!smmu_gerror_irq_enabled(s)) {
+        return 0;
+    }
+
+    switch (irq) {
+    case SMMU_IRQ_EVTQ:
+        if (smmu_evt_irq_enabled(s)) {
+            error = SMMU_GERROR_EVENTQ;
+        }
+        break;
+    case SMMU_IRQ_CMD_SYNC:
+        if (smmu_gerror_irq_enabled(s)) {
+            uint32_t err_type = (uint32_t)data;
+
+            if (err_type) {
+                uint32_t regval = smmu_read32_reg(s, SMMU_REG_CMDQ_CONS);
+                smmu_write32_reg(s, SMMU_REG_CMDQ_CONS,
+                                 regval | err_type << SMMU_CMD_CONS_ERR_SHIFT);
+            }
+            error = SMMU_GERROR_CMDQ;
+        }
+        break;
+    case SMMU_IRQ_PRIQ:
+        if (smmu_pri_irq_enabled(s)) {
+            error = SMMU_GERROR_PRIQ;
+        }
+        break;
+    }
+
+    if (error) {
+        uint32_t gerror = smmu_read32_reg(s, SMMU_REG_GERROR);
+        uint32_t gerrorn = smmu_read32_reg(s, SMMU_REG_GERRORN);
+
+        trace_smmuv3_irq_update(error, gerror, gerrorn);
+
+        /* only toggle GERROR if the interrupt is not active */
+        if (!((gerror ^ gerrorn) & error)) {
+            smmu_write32_reg(s, SMMU_REG_GERROR, gerror ^ error);
+        }
+    }
+
+    return error;
+}
+
+static void smmu_irq_raise(SMMUV3State *s, int irq, uint64_t data)
+{
+    trace_smmuv3_irq_raise(irq);
+    if (smmu_irq_update(s, irq, data)) {
+            qemu_irq_raise(s->irq[irq]);
+    }
+}
+
+static MemTxResult smmu_q_read(SMMUQueue *q, void *data)
+{
+    uint64_t addr = Q_ENTRY(q, Q_IDX(q, q->cons));
+    MemTxResult ret;
+
+    ret = smmu_read_sysmem(addr, data, q->ent_size, false);
+    /* TODO if (ret != MEMTX_OK ) handle error */
+
+    q->cons++;
+    if (q->cons == q->entries) {
+        q->cons = 0;
+        q->wrap.cons++;     /* this will toggle */
+    }
+
+    return ret;
+}
+
+static MemTxResult smmu_q_write(SMMUQueue *q, void *data)
+{
+    uint64_t addr = Q_ENTRY(q, Q_IDX(q, q->prod));
+
+    if (q->prod == q->entries) {
+        q->prod = 0;
+        q->wrap.prod++;     /* this will toggle */
+    }
+
+    q->prod++;
+
+    smmu_write_sysmem(addr, data, q->ent_size, false);
+
+    return MEMTX_OK;
+}
+
+static MemTxResult smmu_read_cmdq(SMMUV3State *s, Cmd *cmd)
+{
+    SMMUQueue *q = &s->cmdq;
+    MemTxResult ret = smmu_q_read(q, cmd);
+    uint32_t val = 0;
+
+    val |= (q->wrap.cons << q->shift) | q->cons;
+
+    /* Update consumer pointer */
+    smmu_write32_reg(s, SMMU_REG_CMDQ_CONS, val);
+
+    return ret;
+}
+
+static int smmu_cmdq_consume(SMMUV3State *s)
+{
+    uint32_t error = SMMU_CMD_ERR_NONE;
+
+    trace_smmuv3_cmdq_consume(SMMU_CMDQ_ERR(s), smmu_cmd_q_enabled(s),
+                              s->cmdq.prod, s->cmdq.cons,
+                              s->cmdq.wrap.prod, s->cmdq.wrap.cons);
+
+    if (!smmu_cmd_q_enabled(s)) {
+        return 0;
+    }
+
+    while (!SMMU_CMDQ_ERR(s) && !smmu_is_q_empty(s, &s->cmdq)) {
+        uint32_t type;
+        Cmd cmd;
+
+        if (smmu_read_cmdq(s, &cmd) != MEMTX_OK) {
+            error = SMMU_CMD_ERR_ABORT;
+            break;
+        }
+
+        type = CMD_TYPE(&cmd);
+
+        trace_smmuv3_cmdq_opcode(cmd_stringify[type]);
+
+        switch (CMD_TYPE(&cmd)) {
+        case SMMU_CMD_SYNC:     /* Fallthrough */
+            if (CMD_CS(&cmd) & CMD_SYNC_SIG_IRQ) {
+                smmu_irq_raise(s, SMMU_IRQ_CMD_SYNC, SMMU_CMD_ERR_NONE);
+            } else if (CMD_CS(&cmd) & CMD_SYNC_SIG_SEV) {
+                trace_smmuv3_cmdq_consume_sev();
+            }
+            break;
+        case SMMU_CMD_PREFETCH_CONFIG:
+        case SMMU_CMD_PREFETCH_ADDR:
+        case SMMU_CMD_CFGI_STE:
+        {
+             uint32_t streamid = cmd.word[1];
+
+             trace_smmuv3_cmdq_cfgi_ste(streamid);
+            break;
+        }
+        case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
+        {
+            uint32_t start = cmd.word[1], range, end;
+
+            range = extract32(cmd.word[2], 0, 5);
+            end = start + (1 << (range + 1)) - 1;
+            trace_smmuv3_cmdq_cfgi_ste_range(start, end);
+            break;
+        }
+        case SMMU_CMD_CFGI_CD:
+        case SMMU_CMD_CFGI_CD_ALL:
+            break;
+        case SMMU_CMD_TLBI_NH_ALL:
+        case SMMU_CMD_TLBI_NH_ASID:
+            printf("%s TLBI* replay\n", __func__);
+            break;
+        case SMMU_CMD_TLBI_NH_VA:
+        {
+            int asid = extract32(cmd.word[1], 16, 16);
+            int vmid = extract32(cmd.word[1], 0, 16);
+            uint64_t low = extract32(cmd.word[2], 12, 20);
+            uint64_t high = cmd.word[3];
+            uint64_t addr = high << 32 | (low << 12);
+
+            trace_smmuv3_cmdq_tlbi_nh_va(asid, vmid, addr);
+            break;
+        }
+        case SMMU_CMD_TLBI_NH_VAA:
+        case SMMU_CMD_TLBI_EL3_ALL:
+        case SMMU_CMD_TLBI_EL3_VA:
+        case SMMU_CMD_TLBI_EL2_ALL:
+        case SMMU_CMD_TLBI_EL2_ASID:
+        case SMMU_CMD_TLBI_EL2_VA:
+        case SMMU_CMD_TLBI_EL2_VAA:
+        case SMMU_CMD_TLBI_S12_VMALL:
+        case SMMU_CMD_TLBI_S2_IPA:
+        case SMMU_CMD_TLBI_NSNH_ALL:
+            break;
+        case SMMU_CMD_ATC_INV:
+        case SMMU_CMD_PRI_RESP:
+        case SMMU_CMD_RESUME:
+        case SMMU_CMD_STALL_TERM:
+            trace_smmuv3_unhandled_cmd(type);
+            break;
+        default:
+            error = SMMU_CMD_ERR_ILLEGAL;
+            error_report("Illegal command type: %d, ignoring", CMD_TYPE(&cmd));
+            dump_cmd(&cmd);
+            break;
+        }
+
+        if (error != SMMU_CMD_ERR_NONE) {
+            error_report("CMD Error");
+            break;
+        }
+    }
+
+    if (error) {
+        smmu_irq_raise(s, SMMU_IRQ_GERROR, error);
+    }
+
+    trace_smmuv3_cmdq_consume_out(s->cmdq.wrap.prod, s->cmdq.prod,
+                                  s->cmdq.wrap.cons, s->cmdq.cons);
+
+    return 0;
+}
+
+/**
+ * GERROR is updated when raising an interrupt, GERRORN will be updated
+ * by SW and should match GERROR before normal operation resumes.
+ */
+static void smmu_irq_clear(SMMUV3State *s, uint64_t gerrorn)
+{
+    int irq = SMMU_IRQ_GERROR;
+    uint32_t toggled;
+
+    toggled = smmu_read32_reg(s, SMMU_REG_GERRORN) ^ gerrorn;
+
+    while (toggled) {
+        irq = ctz32(toggled);
+
+        qemu_irq_lower(s->irq[irq]);
+
+        toggled &= toggled - 1;
+    }
+}
+
+static int smmu_evtq_update(SMMUV3State *s)
+{
+    if (!smmu_enabled(s)) {
+        return 0;
+    }
+
+    if (!smmu_is_q_empty(s, &s->evtq)) {
+        if (smmu_evt_irq_enabled(s)) {
+            smmu_irq_raise(s, SMMU_IRQ_EVTQ, 0);
+        }
+    }
+
+    if (smmu_is_q_empty(s, &s->evtq)) {
+        smmu_irq_clear(s, SMMU_GERROR_EVENTQ);
+    }
+
+    return 1;
+}
+
+static void smmu_create_event(SMMUV3State *s, hwaddr iova,
+                              uint32_t sid, bool is_write, int error);
+
+static void smmu_update(SMMUV3State *s)
+{
+    int error = 0;
+
+    /* SMMU starts processing commands even when not enabled */
+    if (!smmu_enabled(s)) {
+        goto check_cmdq;
+    }
+
+    /* EVENT Q updates takes more priority */
+    if ((smmu_evt_q_enabled(s)) && !smmu_is_q_empty(s, &s->evtq)) {
+        trace_smmuv3_update(smmu_is_q_empty(s, &s->evtq), s->evtq.prod,
+                            s->evtq.cons, s->evtq.wrap.prod, s->evtq.wrap.cons);
+        error = smmu_evtq_update(s);
+    }
+
+    if (error) {
+        /* TODO: May be in future we create proper event queue entry */
+        /* an error condition is not a recoverable event, like other devices */
+        error_report("An unfavourable condition");
+        smmu_create_event(s, 0, 0, 0, error);
+    }
+
+check_cmdq:
+    if (smmu_cmd_q_enabled(s) && !SMMU_CMDQ_ERR(s)) {
+        smmu_cmdq_consume(s);
+    } else {
+        trace_smmuv3_update_check_cmd(SMMU_CMDQ_ERR(s));
+    }
+
+}
+
+static void smmu_update_irq(SMMUV3State *s, uint64_t addr, uint64_t val)
+{
+    smmu_irq_clear(s, val);
+
+    smmu_write32_reg(s, SMMU_REG_GERRORN, val);
+
+    trace_smmuv3_update_irq(smmu_is_irq_pending(s, 0),
+                          smmu_read32_reg(s, SMMU_REG_GERROR),
+                          smmu_read32_reg(s, SMMU_REG_GERRORN));
+
+    /* Clear only when no more left */
+    if (!smmu_is_irq_pending(s, 0)) {
+        qemu_irq_lower(s->irq[0]);
+    }
+}
+
+#define SMMU_ID_REG_INIT(s, reg, d) do {        \
+    s->regs[reg >> 2] = d;                      \
+    } while (0)
+
+static void smmuv3_id_reg_init(SMMUV3State *s)
+{
+    uint32_t data =
+        SMMU_IDR0_STLEVEL << SMMU_IDR0_STLEVEL_SHIFT |
+        SMMU_IDR0_TERM    << SMMU_IDR0_TERM_SHIFT    |
+        SMMU_IDR0_STALL   << SMMU_IDR0_STALL_SHIFT   |
+        SMMU_IDR0_VMID16  << SMMU_IDR0_VMID16_SHIFT  |
+        SMMU_IDR0_PRI     << SMMU_IDR0_PRI_SHIFT     |
+        SMMU_IDR0_ASID16  << SMMU_IDR0_ASID16_SHIFT  |
+        SMMU_IDR0_ATS     << SMMU_IDR0_ATS_SHIFT     |
+        SMMU_IDR0_HYP     << SMMU_IDR0_HYP_SHIFT     |
+        SMMU_IDR0_HTTU    << SMMU_IDR0_HTTU_SHIFT    |
+        SMMU_IDR0_COHACC  << SMMU_IDR0_COHACC_SHIFT  |
+        SMMU_IDR0_TTF     << SMMU_IDR0_TTF_SHIFT     |
+        SMMU_IDR0_S1P     << SMMU_IDR0_S1P_SHIFT     |
+        SMMU_IDR0_S2P     << SMMU_IDR0_S2P_SHIFT;
+
+    SMMU_ID_REG_INIT(s, SMMU_REG_IDR0, data);
+
+#define SMMU_QUEUE_SIZE_LOG2  19
+    data =
+        1 << 27 |                    /* Attr Types override */
+        SMMU_QUEUE_SIZE_LOG2 << 21 | /* Cmd Q size */
+        SMMU_QUEUE_SIZE_LOG2 << 16 | /* Event Q size */
+        SMMU_QUEUE_SIZE_LOG2 << 11 | /* PRI Q size */
+        0  << 6 |                    /* SSID not supported */
+        SMMU_IDR1_SIDSIZE;
+
+    SMMU_ID_REG_INIT(s, SMMU_REG_IDR1, data);
+
+    data =
+        SMMU_IDR5_GRAN << SMMU_IDR5_GRAN_SHIFT | SMMU_IDR5_OAS;
+
+    SMMU_ID_REG_INIT(s, SMMU_REG_IDR5, data);
+
+}
+
+static void smmuv3_init(SMMUV3State *s)
+{
+    smmuv3_id_reg_init(s);      /* Update ID regs alone */
+
+    s->sid_size = SMMU_IDR1_SIDSIZE;
+
+    s->cmdq.entries = (smmu_read32_reg(s, SMMU_REG_IDR1) >> 21) & 0x1f;
+    s->cmdq.ent_size = sizeof(Cmd);
+    s->evtq.entries = (smmu_read32_reg(s, SMMU_REG_IDR1) >> 16) & 0x1f;
+    s->evtq.ent_size = sizeof(Evt);
+}
+
+/*
+ * All SMMU data structures are little endian, and are aligned to 8 bytes
+ * L1STE/STE/L1CD/CD, Queue entries in CMDQ/EVTQ/PRIQ
+ */
+static inline int smmu_get_ste(SMMUV3State *s, hwaddr addr, Ste *buf)
+{
+    int ret;
+
+    trace_smmuv3_get_ste(addr);
+    ret = dma_memory_read(&address_space_memory, addr, buf, sizeof(*buf));
+    dump_ste(buf);
+    return ret;
+}
+
+/*
+ * For now we only support CD with a single entry, 'ssid' is used to identify
+ * otherwise
+ */
+static inline int smmu_get_cd(SMMUV3State *s, Ste *ste, uint32_t ssid, Cd *buf)
+{
+    hwaddr addr = STE_CTXPTR(ste);
+    int ret;
+
+    if (STE_S1CDMAX(ste) != 0) {
+        error_report("Multilevel Ctx Descriptor not supported yet");
+    }
+
+    ret = dma_memory_read(&address_space_memory, addr, buf, sizeof(*buf));
+
+    trace_smmuv3_get_cd(addr);
+    dump_cd(buf);
+
+    return ret;
+}
+
+/**
+ * is_ste_consistent - Check validity of STE
+ * according to 6.2.1 Validty of STE
+ * TODO: check the relevance of each check and compliance
+ * with this spec chapter
+ */
+static int is_ste_consistent(SMMUV3State *s, Ste *ste)
+{
+    uint32_t _config = STE_CONFIG(ste);
+    uint32_t ste_vmid, ste_eats, ste_s2s, ste_s1fmt, ste_s2aa64, ste_s1cdmax;
+    uint32_t ste_strw;
+    bool strw_unused, addr_out_of_range, granule_supported;
+    bool config[] = {_config & 0x1, _config & 0x2, _config & 0x3};
+
+    ste_vmid = STE_S2VMID(ste);
+    ste_eats = STE_EATS(ste); /* Enable PCIe ATS trans */
+    ste_s2s = STE_S2S(ste);
+    ste_s1fmt = STE_S1FMT(ste);
+    ste_s2aa64 = STE_S2AA64(ste);
+    ste_s1cdmax = STE_S1CDMAX(ste); /*CD bit # S1ContextPtr */
+    ste_strw = STE_STRW(ste); /* stream world control */
+
+    if (!STE_VALID(ste)) {
+        error_report("STE NOT valid");
+        return false;
+    }
+
+    granule_supported = is_s2granule_valid(ste);
+
+    /* As S1/S2 combinations are supported do not check
+     * corresponding STE config values */
+
+    if (!config[2]) {
+        /* Report abort to device, no event recorded */
+        error_report("STE config 0b000 not implemented");
+        return false;
+    }
+
+    if (!SMMU_IDR1_SIDSIZE && ste_s1cdmax && config[0] &&
+        !SMMU_IDR0_CD2L && (ste_s1fmt == 1 || ste_s1fmt == 2)) {
+        error_report("STE inconsistant, CD mismatch");
+        return false;
+    }
+    if (SMMU_IDR0_ATS && ((_config & 0x3) == 0) &&
+        ((ste_eats == 2 && (_config != 0x7 || ste_s2s)) ||
+        (ste_eats == 1 && !ste_s2s))) {
+        error_report("STE inconsistant, EATS/S2S mismatch");
+        return false;
+    }
+    if (config[0] && (SMMU_IDR1_SIDSIZE &&
+        (ste_s1cdmax > SMMU_IDR1_SIDSIZE))) {
+        error_report("STE inconsistant, SSID out of range");
+        return false;
+    }
+
+    strw_unused = (!SMMU_IDR0_S1P || !SMMU_IDR0_HYP || (_config == 4));
+
+    addr_out_of_range = STE_S2TTB(ste) > MAX_PA(ste);
+
+    if (is_ste_stage2(ste)) {
+        if ((ste_s2aa64 && !is_s2granule_valid(ste)) ||
+            (!ste_s2aa64 && !(SMMU_IDR0_TTF & 0x1)) ||
+            (ste_s2aa64 && !(SMMU_IDR0_TTF & 0x2))  ||
+            ((STE_S2HA(ste) || STE_S2HD(ste)) && !ste_s2aa64) ||
+            ((STE_S2HA(ste) || STE_S2HD(ste)) && !SMMU_IDR0_HTTU) ||
+            (STE_S2HD(ste) && (SMMU_IDR0_HTTU == 1)) || addr_out_of_range) {
+            error_report("STE inconsistant");
+            trace_smmuv3_is_ste_consistent(config[1], granule_supported,
+                                           addr_out_of_range, ste_s2aa64,
+                                           STE_S2HA(ste), STE_S2HD(ste),
+                                           STE_S2TTB(ste));
+            return false;
+        }
+    }
+    if (SMMU_IDR0_S2P && (config[0] == 0 && config[1]) &&
+        (strw_unused || !ste_strw) && !SMMU_IDR0_VMID16 && !(ste_vmid >> 8)) {
+        error_report("STE inconsistant, VMID out of range");
+        return false;
+    }
+
+    return true;
+}
+
+/**
+ * smmu_find_ste - Return the stream table entry associated
+ * to the sid
+ *
+ * @s: smmuv3 handle
+ * @sid: stream ID
+ * @ste: returned stream table entry
+ * Supports linear and 2-level stream table
+ */
+static int smmu_find_ste(SMMUV3State *s, uint16_t sid, Ste *ste)
+{
+    hwaddr addr;
+
+    trace_smmuv3_find_ste(sid, s->features, s->sid_split);
+    /* Check SID range */
+    if (sid > (1 << s->sid_size)) {
+        return SMMU_EVT_C_BAD_SID;
+    }
+    if (s->features & SMMU_FEATURE_2LVL_STE) {
+        int l1_ste_offset, l2_ste_offset, max_l2_ste, span;
+        hwaddr l1ptr, l2ptr;
+        STEDesc l1std;
+
+        l1_ste_offset = sid >> s->sid_split;
+        l2_ste_offset = sid & ((1 << s->sid_split) - 1);
+        l1ptr = (hwaddr)(s->strtab_base + l1_ste_offset * sizeof(l1std));
+        smmu_read_sysmem(l1ptr, &l1std, sizeof(l1std), false);
+        span = L1STD_SPAN(&l1std);
+
+        if (!span) {
+            /* l2ptr is not valid */
+            error_report("invalid sid=%d (L1STD span=0)", sid);
+            return SMMU_EVT_C_BAD_SID;
+        }
+        max_l2_ste = (1 << span) - 1;
+        l2ptr = L1STD_L2PTR(&l1std);
+        trace_smmuv3_find_ste_2lvl(s->strtab_base, l1ptr, l1_ste_offset,
+                                   l2ptr, l2_ste_offset, max_l2_ste);
+        if (l2_ste_offset > max_l2_ste) {
+            error_report("l2_ste_offset=%d > max_l2_ste=%d",
+                         l2_ste_offset, max_l2_ste);
+            return SMMU_EVT_C_BAD_STE;
+        }
+        addr = L1STD_L2PTR(&l1std) + l2_ste_offset * sizeof(*ste);
+    } else {
+        addr = s->strtab_base + sid * sizeof(*ste);
+    }
+
+    if (smmu_get_ste(s, addr, ste)) {
+        error_report("Unable to Fetch STE");
+        return SMMU_EVT_F_UUT;
+    }
+
+    return 0;
+}
+
+/**
+ * smmu_cfg_populate_s1 - Populate the stage 1 translation config
+ * from the context descriptor
+ */
+static int smmu_cfg_populate_s1(SMMUTransCfg *cfg, Cd *cd)
+{
+    bool s1a64 = CD_AARCH64(cd);
+    int epd0 = CD_EPD0(cd);
+    int tg;
+
+    cfg->stage   = 1;
+    tg           = epd0 ? CD_TG1(cd) : CD_TG0(cd);
+    cfg->tsz     = epd0 ? CD_T1SZ(cd) : CD_T0SZ(cd);
+    cfg->ttbr    = epd0 ? CD_TTB1(cd) : CD_TTB0(cd);
+    cfg->oas     = oas2bits(CD_IPS(cd));
+
+    if (s1a64) {
+        cfg->tsz = MIN(cfg->tsz, 39);
+        cfg->tsz = MAX(cfg->tsz, 16);
+    }
+    cfg->granule_sz = tg2granule(tg, epd0);
+
+    cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
+    /* fix ttbr - make top bits zero*/
+    cfg->ttbr = extract64(cfg->ttbr, 0, cfg->oas);
+    cfg->aa64 = s1a64;
+    cfg->initial_level  = 4 - (64 - cfg->tsz - 4) / (cfg->granule_sz - 3);
+
+    trace_smmuv3_cfg_stage(cfg->stage, cfg->oas, cfg->tsz, cfg->ttbr,
+                           cfg->aa64, cfg->granule_sz, cfg->initial_level);
+
+    return 0;
+}
+
+/**
+ * smmu_cfg_populate_s2 - Populate the stage 2 translation config
+ * from the Stream Table Entry
+ */
+static int smmu_cfg_populate_s2(SMMUTransCfg *cfg, Ste *ste)
+{
+    bool s2a64 = STE_S2AA64(ste);
+    int default_initial_level;
+    int tg;
+
+    cfg->stage = 2;
+
+    tg           = STE_S2TG(ste);
+    cfg->tsz     = STE_S2T0SZ(ste);
+    cfg->ttbr    = STE_S2TTB(ste);
+    cfg->oas     = pa_range(ste);
+
+    cfg->aa64    = s2a64;
+
+    if (s2a64) {
+        cfg->tsz = MIN(cfg->tsz, 39);
+        cfg->tsz = MAX(cfg->tsz, 16);
+    }
+    cfg->granule_sz = tg2granule(tg, 0);
+
+    cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
+    /* fix ttbr - make top bits zero*/
+    cfg->ttbr = extract64(cfg->ttbr, 0, cfg->oas);
+
+    default_initial_level = 4 - (64 - cfg->tsz - 4) / (cfg->granule_sz - 3);
+    cfg->initial_level = ~STE_S2SL0(ste);
+    if (cfg->initial_level  != default_initial_level) {
+        error_report("%s concatenated translation tables at initial S2 lookup"
+                     " not supported", __func__);
+        return -1;
+    }
+
+    trace_smmuv3_cfg_stage(cfg->stage, cfg->oas, cfg->tsz, cfg->ttbr,
+                           cfg->aa64, cfg->granule_sz, cfg->initial_level);
+
+    return 0;
+}
+
+static MemTxResult smmu_write_evtq(SMMUV3State *s, Evt *evt)
+{
+    SMMUQueue *q = &s->evtq;
+    int ret = smmu_q_write(q, evt);
+    uint32_t val = 0;
+
+    val |= (q->wrap.prod << q->shift) | q->prod;
+
+    smmu_write32_reg(s, SMMU_REG_EVTQ_PROD, val);
+
+    return ret;
+}
+
+/*
+ * Events created on the EventQ
+ */
+static void smmu_create_event(SMMUV3State *s, hwaddr iova,
+                              uint32_t sid, bool is_write, int error)
+{
+    SMMUQueue *q = &s->evtq;
+    uint64_t head;
+    Evt evt;
+
+    if (!smmu_evt_q_enabled(s)) {
+        return;
+    }
+
+    EVT_SET_TYPE(&evt, error);
+    EVT_SET_SID(&evt, sid);
+
+    switch (error) {
+    case SMMU_EVT_F_UUT:
+    case SMMU_EVT_C_BAD_STE:
+        break;
+    case SMMU_EVT_C_BAD_CD:
+    case SMMU_EVT_F_CD_FETCH:
+        break;
+    case SMMU_EVT_F_TRANS_FORBIDDEN:
+    case SMMU_EVT_F_WALK_EXT_ABRT:
+        EVT_SET_INPUT_ADDR(&evt, iova);
+    default:
+        break;
+    }
+
+    smmu_write_evtq(s, &evt);
+
+    head = Q_IDX(q, q->prod);
+
+    if (smmu_is_q_full(s, &s->evtq)) {
+        head = q->prod ^ (1 << 31);     /* Set overflow */
+    }
+
+    smmu_write32_reg(s, SMMU_REG_EVTQ_PROD, head);
+
+    smmu_irq_raise(s, SMMU_IRQ_EVTQ, 0);
+}
+
+/**
+ * smmuv3_config_config - Prepare the translation configuration
+ * for the @mr iommu region
+ * @mr: iommu memory region the translation config must be prepared for
+ * @cfg: output translation configuration
+ *
+ * return 0 on success or error code on failure
+ */
+static int smmuv3_decode_config(MemoryRegion *mr, SMMUTransCfg *cfg)
+{
+    SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
+    int sid = smmu_get_sid(sdev);
+    SMMUV3State *s = sdev->smmu;
+    Ste ste;
+    Cd cd;
+    int ret = 0;
+
+    if (!smmu_enabled(s)) {
+        cfg->disabled = true;
+        return 0;
+    }
+    ret = smmu_find_ste(s, sid, &ste);
+    if (ret) {
+        return ret;
+    }
+
+    if (!STE_VALID(&ste)) {
+        return SMMU_EVT_C_BAD_STE;
+    }
+
+    switch (STE_CONFIG(&ste)) {
+    case STE_CONFIG_BYPASS:
+        cfg->bypassed = true;
+        return 0;
+    case STE_CONFIG_S1:
+         break;
+    case STE_CONFIG_S2:
+         break;
+    default: /* reserved, abort, nested */
+        return -1;
+    }
+
+    /* S1 or S2 */
+
+    if (!is_ste_consistent(s, &ste)) {
+        return SMMU_EVT_C_BAD_STE;
+    }
+
+    if (is_ste_stage1(&ste)) {
+        ret = smmu_get_cd(s, &ste, 0, &cd); /* We dont have SSID yet */
+        if (ret) {
+            return ret;
+        }
+
+        if (!is_cd_valid(s, &ste, &cd)) {
+            return SMMU_EVT_C_BAD_CD;
+        }
+        return smmu_cfg_populate_s1(cfg, &cd);
+    }
+
+    return smmu_cfg_populate_s2(cfg, &ste);
+}
+
+static IOMMUTLBEntry smmuv3_translate(MemoryRegion *mr, hwaddr addr,
+                                      IOMMUAccessFlags flag)
+{
+    SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
+    SMMUV3State *s = sdev->smmu;
+    SMMUState *sys = SMMU_SYS_DEV(s);
+    bool is_write = flag & IOMMU_WO;
+    uint16_t sid = 0;
+    SMMUEvtErr ret;
+    SMMUTransCfg cfg = {};
+    IOMMUTLBEntry entry = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = addr,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    ret = smmuv3_decode_config(mr, &cfg);
+    if (ret || cfg.disabled || cfg.bypassed) {
+        goto out;
+    }
+
+    ret = smmu_page_walk(sys, &cfg, &entry, is_write);
+
+    entry.perm = is_write ? IOMMU_RW : IOMMU_RO;
+
+    trace_smmuv3_translate_ok(mr->name, sid, addr,
+                              entry.translated_addr, entry.perm);
+out:
+    if (ret) {
+        error_report("%s translation failed for iova=0x%"PRIx64,
+                     mr->name, addr);
+        smmu_create_event(s, entry.iova, sid, is_write, ret);
+    }
+    return entry;
+}
+
+
+static inline void smmu_update_base_reg(SMMUV3State *s, uint64_t *base,
+                                        uint64_t val)
+{
+    *base = val & ~(SMMU_BASE_RA | 0x3fULL);
+}
+
+static void smmu_update_qreg(SMMUV3State *s, SMMUQueue *q, hwaddr reg,
+                             uint32_t off, uint64_t val, unsigned size)
+{
+   if (size == 8 && off == 0) {
+        smmu_write64_reg(s, reg, val);
+    } else {
+        smmu_write_reg(s, reg, val);
+    }
+
+    switch (off) {
+    case 0:                             /* BASE register */
+        val = smmu_read64_reg(s, reg);
+        q->shift = val & 0x1f;
+        q->entries = 1 << (q->shift);
+        smmu_update_base_reg(s, &q->base, val);
+        break;
+
+    case 8:                             /* PROD */
+        q->prod = Q_IDX(q, val);
+        q->wrap.prod = val >> q->shift;
+    break;
+
+    case 12:                             /* CONS */
+        q->cons = Q_IDX(q, val);
+        q->wrap.cons = val >> q->shift;
+        trace_smmuv3_update_qreg(q->cons, val);
+        break;
+
+    }
+
+    switch (reg) {
+    case SMMU_REG_CMDQ_PROD:            /* should be only for CMDQ_PROD */
+    case SMMU_REG_CMDQ_CONS:            /* but we do it anyway */
+        smmu_update(s);
+        break;
+    }
+}
+
+static void smmu_write_mmio_fixup(SMMUV3State *s, hwaddr *addr)
+{
+    switch (*addr) {
+    case 0x100a8: case 0x100ac:         /* Aliasing => page0 registers */
+    case 0x100c8: case 0x100cc:
+        *addr ^= (hwaddr)0x10000;
+    }
+}
+
+static void smmu_write_mmio(void *opaque, hwaddr addr,
+                            uint64_t val, unsigned size)
+{
+    SMMUState *sys = opaque;
+    SMMUV3State *s = SMMU_V3_DEV(sys);
+    bool update = false;
+
+    smmu_write_mmio_fixup(s, &addr);
+
+    trace_smmuv3_write_mmio(addr, val);
+
+    switch (addr) {
+    case 0xFDC ... 0xFFC:
+    case SMMU_REG_IDR0 ... SMMU_REG_IDR5:
+        trace_smmuv3_write_mmio_idr(addr, val);
+        return;
+
+    case SMMU_REG_GERRORN:
+        smmu_update_irq(s, addr, val);
+        return;
+
+    case SMMU_REG_CR0:
+        smmu_write32_reg(s, SMMU_REG_CR0, val);
+        smmu_write32_reg(s, SMMU_REG_CR0_ACK, val);
+        update = true;
+        break;
+
+    case SMMU_REG_IRQ_CTRL:
+        smmu_write32_reg(s, SMMU_REG_IRQ_CTRL_ACK, val);
+        update = true;
+        break;
+
+    case SMMU_REG_STRTAB_BASE:
+        smmu_update_base_reg(s, &s->strtab_base, val);
+        return;
+
+    case SMMU_REG_STRTAB_BASE_CFG:
+        if (((val >> 16) & 0x3) == 0x1) {
+            s->sid_split = (val >> 6) & 0x1f;
+            s->features |= SMMU_FEATURE_2LVL_STE;
+        }
+        break;
+
+    case SMMU_REG_CMDQ_PROD:
+    case SMMU_REG_CMDQ_CONS:
+    case SMMU_REG_CMDQ_BASE:
+    case SMMU_REG_CMDQ_BASE + 4:
+        smmu_update_qreg(s, &s->cmdq, addr, addr - SMMU_REG_CMDQ_BASE,
+                         val, size);
+        return;
+
+    case SMMU_REG_EVTQ_CONS:            /* fallthrough */
+    {
+        SMMUQueue *evtq = &s->evtq;
+        evtq->cons = Q_IDX(evtq, val);
+        evtq->wrap.cons = Q_WRAP(evtq, val);
+
+        trace_smmuv3_write_mmio_evtq_cons_bef_clear(evtq->prod, evtq->cons,
+                                                    evtq->wrap.prod,
+                                                    evtq->wrap.cons);
+        if (smmu_is_q_empty(s, &s->evtq)) {
+            trace_smmuv3_write_mmio_evtq_cons_after_clear(evtq->prod,
+                                                          evtq->cons,
+                                                          evtq->wrap.prod,
+                                                          evtq->wrap.cons);
+            qemu_irq_lower(s->irq[SMMU_IRQ_EVTQ]);
+        }
+    }
+    case SMMU_REG_EVTQ_BASE:
+    case SMMU_REG_EVTQ_BASE + 4:
+    case SMMU_REG_EVTQ_PROD:
+        smmu_update_qreg(s, &s->evtq, addr, addr - SMMU_REG_EVTQ_BASE,
+                         val, size);
+        return;
+
+    case SMMU_REG_PRIQ_CONS:
+    case SMMU_REG_PRIQ_BASE:
+    case SMMU_REG_PRIQ_BASE + 4:
+    case SMMU_REG_PRIQ_PROD:
+        smmu_update_qreg(s, &s->priq, addr, addr - SMMU_REG_PRIQ_BASE,
+                         val, size);
+        return;
+    }
+
+    if (size == 8) {
+        smmu_write_reg(s, addr, val);
+    } else {
+        smmu_write32_reg(s, addr, (uint32_t)val);
+    }
+
+    if (update) {
+        smmu_update(s);
+    }
+}
+
+static uint64_t smmu_read_mmio(void *opaque, hwaddr addr, unsigned size)
+{
+    SMMUState *sys = opaque;
+    SMMUV3State *s = SMMU_V3_DEV(sys);
+    uint64_t val;
+
+    smmu_write_mmio_fixup(s, &addr);
+
+    /* Primecell/Corelink ID registers */
+    switch (addr) {
+    case 0xFF0 ... 0xFFC:
+    case 0xFDC ... 0xFE4:
+        val = 0;
+        error_report("addr:0x%"PRIx64" val:0x%"PRIx64, addr, val);
+        break;
+
+    default:
+        val = (uint64_t)smmu_read32_reg(s, addr);
+        break;
+
+    case SMMU_REG_STRTAB_BASE ... SMMU_REG_CMDQ_BASE:
+    case SMMU_REG_EVTQ_BASE:
+    case SMMU_REG_PRIQ_BASE ... SMMU_REG_PRIQ_IRQ_CFG1:
+        val = smmu_read64_reg(s, addr);
+        break;
+    }
+
+    trace_smmuv3_read_mmio(addr, val, s->cmdq.cons);
+    return val;
+}
+
+static const MemoryRegionOps smmu_mem_ops = {
+    .read = smmu_read_mmio,
+    .write = smmu_write_mmio,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+};
+
+static void smmu_init_irq(SMMUV3State *s, SysBusDevice *dev)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
+        sysbus_init_irq(dev, &s->irq[i]);
+    }
+}
+
+static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+{
+    SMMUState *s = opaque;
+    uintptr_t key = (uintptr_t)bus;
+    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_as_by_busptr, &key);
+    SMMUDevice *sdev;
+
+    if (!sbus) {
+        uintptr_t *new_key = g_malloc(sizeof(*new_key));
+
+        *new_key = (uintptr_t)bus;
+        sbus = g_malloc0(sizeof(SMMUPciBus) +
+                         sizeof(SMMUDevice *) * SMMU_PCI_DEVFN_MAX);
+        sbus->bus = bus;
+        g_hash_table_insert(s->smmu_as_by_busptr, new_key, sbus);
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        char *name = g_strdup_printf("%s-%d-%d", TYPE_SMMU_V3_DEV,
+                                      pci_bus_num(bus), devfn);
+        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(SMMUDevice));
+
+        sdev->smmu = s;
+        sdev->bus = bus;
+        sdev->devfn = devfn;
+
+        memory_region_init_iommu(&sdev->iommu, OBJECT(s),
+                                 &s->iommu_ops, name, 1ULL << 48);
+        address_space_init(&sdev->as, &sdev->iommu, TYPE_SMMU_V3_DEV);
+    }
+
+    return &sdev->as;
+
+}
+
+static void smmu_init_iommu_as(SMMUV3State *sys)
+{
+    SMMUState *s = SMMU_SYS_DEV(sys);
+    PCIBus *pcibus = pci_find_primary_bus();
+
+    if (pcibus) {
+        pci_setup_iommu(pcibus, smmu_find_add_as, s);
+    } else {
+        error_report("No PCI bus, SMMU is not registered");
+    }
+}
+
+static void smmu_reset(DeviceState *dev)
+{
+    SMMUV3State *s = SMMU_V3_DEV(dev);
+    smmuv3_init(s);
+}
+
+static int smmu_populate_internal_state(void *opaque, int version_id)
+{
+    SMMUV3State *s = opaque;
+
+    smmu_update(s);
+    return 0;
+}
+
+static void smmu_realize(DeviceState *d, Error **errp)
+{
+    SMMUState *sys = SMMU_SYS_DEV(d);
+    SMMUV3State *s = SMMU_V3_DEV(sys);
+    SysBusDevice *dev = SYS_BUS_DEVICE(d);
+
+    sys->iommu_ops.translate = smmuv3_translate;
+    /* Register Access */
+    memset(sys->smmu_as_by_bus_num, 0, sizeof(sys->smmu_as_by_bus_num));
+    memory_region_init_io(&sys->iomem, OBJECT(s),
+                          &smmu_mem_ops, sys, TYPE_SMMU_V3_DEV, 0x20000);
+
+    sys->smmu_as_by_busptr = g_hash_table_new_full(smmu_uint64_hash,
+                                                   smmu_uint64_equal,
+                                                   g_free, g_free);
+    sysbus_init_mmio(dev, &sys->iomem);
+
+    smmuv3_init_queues(s);
+
+    smmu_init_irq(s, dev);
+
+    smmu_init_iommu_as(s);
+}
+
+static const VMStateDescription vmstate_smmuv3 = {
+    .name = "smmuv3",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = smmu_populate_internal_state,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64_ARRAY(regs, SMMUV3State, SMMU_NREGS),
+        VMSTATE_END_OF_LIST(),
+    },
+};
+
+static void smmuv3_instance_init(Object *obj)
+{
+    /* Nothing much to do here as of now */
+}
+
+static void smmuv3_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset   = smmu_reset;
+    dc->vmsd    = &vmstate_smmuv3;
+    dc->realize = smmu_realize;
+}
+
+static const TypeInfo smmuv3_type_info = {
+    .name          = TYPE_SMMU_V3_DEV,
+    .parent        = TYPE_SMMU_DEV_BASE,
+    .instance_size = sizeof(SMMUV3State),
+    .instance_init = smmuv3_instance_init,
+    .class_data    = NULL,
+    .class_size    = sizeof(SMMUV3Class),
+    .class_init    = smmuv3_class_init,
+};
+
+static void smmuv3_register_types(void)
+{
+    type_register(&smmuv3_type_info);
+}
+
+type_init(smmuv3_register_types)
+
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7a92f8c..30a817b 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -16,3 +16,37 @@ smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) "base
 smmu_set_translated_address(hwaddr iova, hwaddr pa) "iova = 0x%"PRIx64" -> pa = 0x%"PRIx64
 smmu_walk_pgtable(hwaddr iova, bool is_write) "Input addr: 0x%"PRIx64", is_write=%d"
 smmu_walk_pgtable_out(hwaddr addr, uint32_t mask, int perm) "DONE: o/p addr:0x%"PRIx64" mask:0x%x perm:%d"
+
+#hw/arm/smmuv3.c
+smmuv3_irq_update(uint32_t error, uint32_t gerror, uint32_t gerrorn) "<<<< error:0x%x gerror:0x%x gerrorn:0x%x"
+smmuv3_irq_raise(int irq) "irq:%d"
+smmuv3_unhandled_cmd(uint32_t type) "Unhandled command type=%d"
+smmuv3_cmdq_consume(int error, bool enabled, uint32_t prod, uint32_t cons, uint8_t wrap_prod, uint8_t wrap_cons) "error=%d, enabled=%d prod=%d cons=%d wrap.prod=%d wrap.cons=%d"
+smmuv3_cmdq_consume_details(hwaddr base, uint32_t cons, uint32_t prod, uint32_t word, uint8_t wrap_cons) "CMDQ base: 0x%"PRIx64" cons:%d prod:%d val:0x%x wrap:%d"
+smmuv3_cmdq_opcode(const char *opcode) "<--- %s"
+smmuv3_cmdq_cfgi_ste(int streamid) "     |_ streamid =%d"
+smmuv3_cmdq_cfgi_ste_range(int start, int end) "     |_ start=0x%d - end=0x%d"
+smmuv3_cmdq_tlbi_nh_va(int asid, int vmid, uint64_t addr) "     |_ asid =%d vmid =%d addr=0x%"PRIx64
+smmuv3_cmdq_consume_sev(void) "CMD_SYNC CS=SEV not supported, ignoring"
+smmuv3_cmdq_consume_out(uint8_t prod_wrap, uint32_t prod, uint8_t cons_wrap, uint32_t cons) "prod_wrap:%d, prod:0x%x cons_wrap:%d cons:0x%x"
+smmuv3_update(bool is_empty, uint32_t prod, uint32_t cons, uint8_t prod_wrap, uint8_t cons_wrap) "q empty:%d prod:%d cons:%d p.wrap:%d p.cons:%d"
+smmuv3_update_check_cmd(int error) "cmdq not enabled or error :0x%x"
+smmuv3_update_irq(bool is_pending, uint32_t gerror, uint32_t gerrorn) "irq pend: %d gerror:0x%x gerrorn:0x%x"
+smmuv3_is_ste_consistent(bool cfg, bool granule_supported, bool addr_oor, uint32_t aa64, int s2ha, int s2hd, uint64_t s2ttb ) "config[1]:%d gran:%d addr:%d aa64:%d s2ha:%d s2hd:%d s2ttb:0x%"PRIx64
+smmuv3_find_ste(uint16_t sid, uint32_t features, uint16_t sid_split) "SID:0x%x features:0x%x, sid_split:0x%x"
+smmuv3_find_ste_2lvl(uint64_t strtab_base, hwaddr l1ptr, int l1_ste_offset, hwaddr l2ptr, int l2_ste_offset, int max_l2_ste) "strtab_base:%lx l1ptr:0x%"PRIx64" l1_off:0x%x, l2ptr:0x%"PRIx64" l2_off:0x%x max_l2_ste:%d"
+smmuv3_get_ste(hwaddr addr) "STE addr: 0x%"PRIx64
+smmuv3_translate_bypass(const char *n, uint16_t sid, hwaddr addr, bool is_write) "%s sid=%d bypass iova:0x%"PRIx64" is_write=%d"
+smmuv3_translate_in(uint16_t sid, int pci_bus_num, hwaddr strtab_base) "SID:0x%x bus:%d strtab_base:0x%"PRIx64
+smmuv3_get_cd(hwaddr addr) "CD addr: 0x%"PRIx64
+smmuv3_translate_ok(const char *n, uint16_t sid, hwaddr iova, hwaddr translated, int perm) "%s sid=%d iova=0x%"PRIx64" translated=0x%"PRIx64" perm=0x%x"
+smmuv3_update_qreg(uint32_t cons, uint64_t val) "cons written : %d val:0x%"PRIx64
+smmuv3_write_mmio(hwaddr addr, uint64_t val) "addr: 0x%"PRIx64" val:0x%"PRIx64
+smmuv3_write_mmio_idr(hwaddr addr, uint64_t val) "write to RO/Unimpl reg %lx val64:%lx"
+smmuv3_write_mmio_evtq_cons_bef_clear(uint32_t prod, uint32_t cons, uint8_t prod_wrap, uint8_t cons_wrap) "Before clearing interrupt prod:0x%x cons:0x%x prod.w:%d cons.w:%d"
+smmuv3_write_mmio_evtq_cons_after_clear(uint32_t prod, uint32_t cons, uint8_t prod_wrap, uint8_t cons_wrap) "after clearing interrupt prod:0x%x cons:0x%x prod.w:%d cons.w:%d"
+smmuv3_read_mmio(hwaddr addr, uint64_t val, uint32_t cons) "addr: 0x%"PRIx64" val:0x%"PRIx64" cmdq cons:%d"
+smmuv3_dump_ste(int i, uint32_t word0, int j,  uint32_t word1) "STE[%2d]: %#010x\t STE[%2d]: %#010x"
+smmuv3_dump_cd(int i, uint32_t word0, int j,  uint32_t word1) "CD[%2d]: %#010x\t CD[%2d]: %#010x"
+smmuv3_dump_cmd(int i, uint32_t word0, int j,  uint32_t word1) "CMD[%2d]: %#010x\t CMD[%2d]: %#010x"
+smmuv3_cfg_stage(int s, uint32_t oas, uint32_t tsz, uint64_t ttbr, bool aa64, uint32_t granule_sz, int initial_level) "TransCFG stage:%d oas:%d tsz:%d ttbr:0x%"PRIx64"  aa64:%d granule_sz:%d, initial_level = %d"
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
new file mode 100644
index 0000000..1f7d78e
--- /dev/null
+++ b/include/hw/arm/smmuv3.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2014-2016 Broadcom Corporation
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Written by Prem Mallappa, Eric Auger
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_ARM_SMMUV3_H
+#define HW_ARM_SMMUV3_H
+
+#include "hw/arm/smmu-common.h"
+
+#define SMMU_NREGS            0x200
+
+typedef struct SMMUQueue {
+     hwaddr base;
+     uint32_t prod;
+     uint32_t cons;
+     union {
+          struct {
+               uint8_t prod:1;
+               uint8_t cons:1;
+          };
+          uint8_t unused;
+     } wrap;
+
+     uint16_t entries;           /* Number of entries */
+     uint8_t  ent_size;          /* Size of entry in bytes */
+     uint8_t  shift;             /* Size in log2 */
+} SMMUQueue;
+
+typedef struct SMMUV3State {
+    SMMUState     smmu_state;
+
+#define SMMU_FEATURE_2LVL_STE (1 << 0)
+    /* Local cache of most-frequently used register */
+    uint32_t     features;
+    uint16_t     sid_size;
+    uint16_t     sid_split;
+    uint64_t     strtab_base;
+
+    uint64_t    regs[SMMU_NREGS];
+
+    qemu_irq     irq[4];
+
+    SMMUQueue    cmdq, evtq, priq;
+
+    /* IOMMU Address space */
+    MemoryRegion iommu;
+    AddressSpace iommu_as;
+    /*
+     * Bus number is not populated in the beginning, hence we need
+     * a mechanism to retrieve the corresponding address space for each
+     * pci device.
+    */
+    GHashTable   *smmu_as_by_busptr;
+} SMMUV3State;
+
+typedef enum {
+    SMMU_IRQ_GERROR,
+    SMMU_IRQ_PRIQ,
+    SMMU_IRQ_EVTQ,
+    SMMU_IRQ_CMD_SYNC,
+} SMMUIrq;
+
+typedef struct {
+    SMMUBaseClass smmu_base_class;
+} SMMUV3Class;
+
+#define TYPE_SMMU_V3_DEV   "smmuv3"
+#define SMMU_V3_DEV(obj) OBJECT_CHECK(SMMUV3State, (obj), TYPE_SMMU_V3_DEV)
+#define SMMU_V3_DEVICE_GET_CLASS(obj)                              \
+    OBJECT_GET_CLASS(SMMUBaseClass, (obj), TYPE_SMMU_V3_DEV)
+
+#endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 3/8] hw/arm/virt: Add SMMUv3 to the virt board
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 4/8] hw/arm/virt: Add 2.10 machine type Eric Auger
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

From: Prem Mallappa <prem.mallappa@broadcom.com>

Add code to instantiate an smmu-v3 in mach-virt. A new boolean flag
is introduced in VirtMachineState to allow this instantiation. It
is currently false.

Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v4 -> v5:
- add dma-coherent property

v2 -> v3:
- vbi was removed. Use vms instead
- migrate to new smmu binding format (iommu-map)
- don't use appendprop anymore
- add vms->smmu and guard instantiation with this latter
- interrupts type changed to edge
---
 hw/arm/virt.c         | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/arm/virt.h |  4 ++++
 2 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 010f724..d3848ae 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -56,6 +56,7 @@
 #include "hw/smbios/smbios.h"
 #include "qapi/visitor.h"
 #include "standard-headers/linux/input.h"
+#include "hw/arm/smmuv3.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -139,6 +140,7 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_SECURE_UART] =        { 0x09040000, 0x00001000 },
+    [VIRT_SMMU] =               { 0x09050000, 0x00020000 }, /* 128K, needed */
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
@@ -159,6 +161,7 @@ static const int a15irqmap[] = {
     [VIRT_SECURE_UART] = 8,
     [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
     [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
+    [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
     [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
 };
 
@@ -991,6 +994,53 @@ static void create_pcie_irq_map(const VirtMachineState *vms,
                            0x7           /* PCI irq */);
 }
 
+static void alloc_smmu_phandle(VirtMachineState *vms)
+{
+    if (vms->smmu && !vms->smmu_phandle) {
+        vms->smmu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+    }
+}
+
+static void create_smmu(const VirtMachineState *vms, qemu_irq *pic)
+{
+    char *smmu;
+    const char compat[] = "arm,smmu-v3";
+    int irq =  vms->irqmap[VIRT_SMMU];
+    hwaddr base = vms->memmap[VIRT_SMMU].base;
+    hwaddr size = vms->memmap[VIRT_SMMU].size;
+    const char irq_names[] = "eventq\0priq\0cmdq-sync\0gerror";
+
+    if (!vms->smmu) {
+        return;
+    }
+
+    sysbus_create_varargs("smmuv3", base, pic[irq], pic[irq + 1],
+                          pic[irq + 2], pic[irq + 3], NULL);
+
+    smmu = g_strdup_printf("/smmuv3@%" PRIx64, base);
+    qemu_fdt_add_subnode(vms->fdt, smmu);
+    qemu_fdt_setprop(vms->fdt, smmu, "compatible", compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vms->fdt, smmu, "reg", 2, base, 2, size);
+
+    qemu_fdt_setprop_cells(vms->fdt, smmu, "interrupts",
+            GIC_FDT_IRQ_TYPE_SPI, irq    , GIC_FDT_IRQ_FLAGS_EDGE_LO_HI,
+            GIC_FDT_IRQ_TYPE_SPI, irq + 1, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI,
+            GIC_FDT_IRQ_TYPE_SPI, irq + 2, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI,
+            GIC_FDT_IRQ_TYPE_SPI, irq + 3, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+
+    qemu_fdt_setprop(vms->fdt, smmu, "interrupt-names", irq_names,
+                     sizeof(irq_names));
+
+    qemu_fdt_setprop_cell(vms->fdt, smmu, "clocks", vms->clock_phandle);
+    qemu_fdt_setprop_string(vms->fdt, smmu, "clock-names", "apb_pclk");
+    qemu_fdt_setprop(vms->fdt, smmu, "dma-coherent", NULL, 0);
+
+    qemu_fdt_setprop_cell(vms->fdt, smmu, "#iommu-cells", 1);
+
+    qemu_fdt_setprop_cell(vms->fdt, smmu, "phandle", vms->smmu_phandle);
+    g_free(smmu);
+}
+
 static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
 {
     hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
@@ -1103,6 +1153,11 @@ static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
     qemu_fdt_setprop_cell(vms->fdt, nodename, "#interrupt-cells", 1);
     create_pcie_irq_map(vms, vms->gic_phandle, irq, nodename);
 
+    if (vms->smmu) {
+        qemu_fdt_setprop_cells(vms->fdt, nodename, "iommu-map",
+                               0x0, vms->smmu_phandle, 0x0, 0x10000);
+    }
+
     g_free(nodename);
 }
 
@@ -1448,8 +1503,12 @@ static void machvirt_init(MachineState *machine)
 
     create_rtc(vms, pic);
 
+    alloc_smmu_phandle(vms);
+
     create_pcie(vms, pic);
 
+    create_smmu(vms, pic);
+
     create_gpio(vms, pic);
 
     /* Create mmio transports, so the user can create virtio backends
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 33b0ff3..164a531 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -38,6 +38,7 @@
 
 #define NUM_GICV2M_SPIS       64
 #define NUM_VIRTIO_TRANSPORTS 32
+#define NUM_SMMU_IRQS          4
 
 #define ARCH_GICV3_MAINT_IRQ  9
 
@@ -59,6 +60,7 @@ enum {
     VIRT_GIC_V2M,
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
+    VIRT_SMMU,
     VIRT_UART,
     VIRT_MMIO,
     VIRT_RTC,
@@ -95,6 +97,7 @@ typedef struct {
     bool highmem;
     bool its;
     bool virt;
+    bool smmu;
     int32_t gic_version;
     struct arm_boot_info bootinfo;
     const MemMapEntry *memmap;
@@ -105,6 +108,7 @@ typedef struct {
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
+    uint32_t smmu_phandle;
     int psci_conduit;
 } VirtMachineState;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 4/8] hw/arm/virt: Add 2.10 machine type
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (2 preceding siblings ...)
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 3/8] hw/arm/virt: Add SMMUv3 to the virt board Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 5/8] hw/arm/virt-acpi-build: Add smmuv3 node in IORT table Eric Auger
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

The new machine type allows smmuv3 instantiation. A new option
is introduced to turn the feature on/off (off by default).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

Another alternative would be to use the -device option as
done on x86. As the smmu is a sysbus device, we would need to
use the platform bus framework. This would work fine
for the dt generation. However the feasibility needs to be
studied for ACPI table generation.

a Veuillez saisir le message de validation pour vos modifications. Les lignes
---
 hw/arm/virt.c         | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
 include/hw/arm/virt.h |  1 +
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d3848ae..3651e41 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1543,6 +1543,20 @@ static void machvirt_init(MachineState *machine)
     create_platform_bus(vms, pic);
 }
 
+static bool virt_get_smmu(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->smmu;
+}
+
+static void virt_set_smmu(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->smmu = value;
+}
+
 static bool virt_get_secure(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -1698,7 +1712,7 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
-static void virt_2_9_instance_init(Object *obj)
+static void virt_2_10_instance_init(Object *obj)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
@@ -1754,14 +1768,47 @@ static void virt_2_9_instance_init(Object *obj)
                                         NULL);
     }
 
+    if (vmc->no_smmu) {
+        vms->smmu = false;
+    } else {
+        /* Default disallows smmu instantiation */
+        vms->smmu = false;
+        object_property_add_bool(obj, "smmu", virt_get_smmu,
+                                 virt_set_smmu, NULL);
+        object_property_set_description(obj, "smmu",
+                                        "Set on/off to enable/disable "
+                                        "smmu instantiation (default off)",
+                                        NULL);
+    }
+
     vms->memmap = a15memmap;
     vms->irqmap = a15irqmap;
 }
 
+static void virt_machine_2_10_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(2, 10)
+
+#define VIRT_COMPAT_2_9 \
+    HW_COMPAT_2_9
+
+static void virt_2_9_instance_init(Object *obj)
+{
+    virt_2_10_instance_init(obj);
+}
+
 static void virt_machine_2_9_options(MachineClass *mc)
 {
+    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+    virt_machine_2_10_options(mc);
+    SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_9);
+
+    vmc->no_smmu = true;
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(2, 9)
+DEFINE_VIRT_MACHINE(2, 9)
+
 
 #define VIRT_COMPAT_2_8 \
     HW_COMPAT_2_8
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 164a531..cd2c82e 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -86,6 +86,7 @@ typedef struct {
     bool disallow_affinity_adjustment;
     bool no_its;
     bool no_pmu;
+    bool no_smmu;
     bool claim_edge_triggered_timers;
 } VirtMachineClass;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 5/8] hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (3 preceding siblings ...)
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 4/8] hw/arm/virt: Add 2.10 machine type Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 6/8] hw/arm/virt: Add tlbi-on-map property to the smmuv3 node Eric Auger
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

From: Prem Mallappa <prem.mallappa@broadcom.com>

This patch builds the smmuv3 node in the ACPI IORT table.

The RID space of the root complex, which spans 0x0-0x10000
maps to streamid space 0x0-0x10000 in smmuv3, which in turn
maps to deviceid space 0x0-0x10000 in the ITS group.

The guest must feature the IOMMU probe deferral series
(https://lkml.org/lkml/2017/4/10/214) wich fixes streamid
multiple lookup. This bug is not related to the SMMU emulation.

Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- integrate into the existing IORT table made up of ITS, RC nodes
- take into account vms->smmu
- match linux actbl2.h acpi_iort_smmu_v3 field names
---
 hw/arm/virt-acpi-build.c    | 56 +++++++++++++++++++++++++++++++++++++++------
 include/hw/acpi/acpi-defs.h | 15 ++++++++++++
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3d78ff6..ac2cd3e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -393,19 +393,26 @@ build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned xsdt_tbl_offset)
 }
 
 static void
-build_iort(GArray *table_data, BIOSLinker *linker)
+build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
-    int iort_start = table_data->len;
+    int nb_nodes, iort_start = table_data->len;
     AcpiIortIdMapping *idmap;
     AcpiIortItsGroup *its;
     AcpiIortTable *iort;
-    size_t node_size, iort_length;
+    AcpiIortSmmu3 *smmu;
+    size_t node_size, iort_length, smmu_offset = 0;
     AcpiIortRC *rc;
 
     iort = acpi_data_push(table_data, sizeof(*iort));
 
+    if (vms->smmu) {
+        nb_nodes = 3; /* RC, ITS, SMMUv3 */
+    } else {
+        nb_nodes = 2; /* RC, ITS */
+    }
+
     iort_length = sizeof(*iort);
-    iort->node_count = cpu_to_le32(2); /* RC and ITS nodes */
+    iort->node_count = cpu_to_le32(nb_nodes);
     iort->node_offset = cpu_to_le32(sizeof(*iort));
 
     /* ITS group node */
@@ -418,6 +425,35 @@ build_iort(GArray *table_data, BIOSLinker *linker)
     its->its_count = cpu_to_le32(1);
     its->identifiers[0] = 0; /* MADT translation_id */
 
+    if (vms->smmu) {
+        int irq =  vms->irqmap[VIRT_SMMU];
+
+        /* SMMUv3 node */
+        smmu_offset = cpu_to_le32(iort->node_offset + node_size);
+        node_size = sizeof(*smmu) + sizeof(*idmap);
+        iort_length += node_size;
+        smmu = acpi_data_push(table_data, node_size);
+
+
+        smmu->type = ACPI_IORT_NODE_SMMU_V3;
+        smmu->length = cpu_to_le16(node_size);
+        smmu->mapping_count = cpu_to_le32(1);
+        smmu->mapping_offset = cpu_to_le32(sizeof(*smmu));
+        smmu->base_address = cpu_to_le64(vms->memmap[VIRT_SMMU].base);
+        smmu->event_gsiv = cpu_to_le32(irq);
+        smmu->pri_gsiv = cpu_to_le32(irq + 1);
+        smmu->gerr_gsiv = cpu_to_le32(irq + 2);
+        smmu->sync_gsiv = cpu_to_le32(irq + 3);
+
+        /* Identity RID mapping covering the whole input RID range */
+        idmap = &smmu->id_mapping_array[0];
+        idmap->input_base = 0;
+        idmap->id_count = cpu_to_le32(0xFFFF);
+        idmap->output_base = 0;
+        /* output IORT node is the ITS group node (the first node) */
+        idmap->output_reference = cpu_to_le32(iort->node_offset);
+    }
+
     /* Root Complex Node */
     node_size = sizeof(*rc) + sizeof(*idmap);
     iort_length += node_size;
@@ -438,8 +474,14 @@ build_iort(GArray *table_data, BIOSLinker *linker)
     idmap->input_base = 0;
     idmap->id_count = cpu_to_le32(0xFFFF);
     idmap->output_base = 0;
-    /* output IORT node is the ITS group node (the first node) */
-    idmap->output_reference = cpu_to_le32(iort->node_offset);
+
+    if (vms->smmu) {
+        /* output IORT node is the smmuv3 node */
+        idmap->output_reference = cpu_to_le32(smmu_offset);
+    } else {
+        /* output IORT node is the ITS group node (the first node) */
+        idmap->output_reference = cpu_to_le32(iort->node_offset);
+    }
 
     iort->length = cpu_to_le32(iort_length);
 
@@ -782,7 +824,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 
     if (its_class_name() && !vmc->no_its) {
         acpi_add_table(table_offsets, tables_blob);
-        build_iort(tables_blob, tables->linker);
+        build_iort(tables_blob, tables->linker, vms);
     }
 
     /* XSDT is pointed to by RSDP */
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 72be675..69307b7 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -697,6 +697,21 @@ struct AcpiIortItsGroup {
 } QEMU_PACKED;
 typedef struct AcpiIortItsGroup AcpiIortItsGroup;
 
+struct AcpiIortSmmu3 {
+    ACPI_IORT_NODE_HEADER_DEF
+    uint64_t base_address;
+    uint32_t flags;
+    uint32_t reserved2;
+    uint64_t vatos_address;
+    uint32_t model;
+    uint32_t event_gsiv;
+    uint32_t pri_gsiv;
+    uint32_t gerr_gsiv;
+    uint32_t sync_gsiv;
+    AcpiIortIdMapping id_mapping_array[0];
+} QEMU_PACKED;
+typedef struct AcpiIortSmmu3 AcpiIortSmmu3;
+
 struct AcpiIortRC {
     ACPI_IORT_NODE_HEADER_DEF
     AcpiIortMemoryAccess memory_properties;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 6/8] hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (4 preceding siblings ...)
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 5/8] hw/arm/virt-acpi-build: Add smmuv3 node in IORT table Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 7/8] target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route Eric Auger
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

For VFIO integration we need to update physical IOMMU mappings
each time the guest updates the vIOMMU translation structures.
For that, we rely on a special smmuv3 option, "tlbi-on-map"
which forces TLB invalidations on map (this mode is similar to
the Intel VTD caching Mode). The smmuv3 driver then sends
SMMU_CMD_TLBI_NH_VA commands, upon which we will update the physical
mappings.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3651e41..9b72e8a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1034,6 +1034,7 @@ static void create_smmu(const VirtMachineState *vms, qemu_irq *pic)
     qemu_fdt_setprop_cell(vms->fdt, smmu, "clocks", vms->clock_phandle);
     qemu_fdt_setprop_string(vms->fdt, smmu, "clock-names", "apb_pclk");
     qemu_fdt_setprop(vms->fdt, smmu, "dma-coherent", NULL, 0);
+    qemu_fdt_setprop(vms->fdt, smmu, "tlbi-on-map", NULL, 0);
 
     qemu_fdt_setprop_cell(vms->fdt, smmu, "#iommu-cells", 1);
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 7/8] target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (5 preceding siblings ...)
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 6/8] hw/arm/virt: Add tlbi-on-map property to the smmuv3 node Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 8/8] hw/arm/smmuv3: VFIO integration Eric Auger
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

In case the MSI is translated by an IOMMU we need to fixup the
MSI route with the translated address.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

It is still unclear to me if we need to register an IOMMUNotifier
to handle any change in the MSI doorbell which would occur behind
the scene and would not lead to any call to kvm_arch_fixup_msi_route().
---
 target/arm/kvm.c        | 28 ++++++++++++++++++++++++++++
 target/arm/trace-events |  3 +++
 2 files changed, 31 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 4555468..630754f 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -20,8 +20,13 @@
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 #include "cpu.h"
+#include "trace.h"
 #include "internals.h"
 #include "hw/arm/arm.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "hw/arm/smmu-common.h"
+#include "hw/arm/smmuv3.h"
 #include "exec/memattrs.h"
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
@@ -611,6 +616,29 @@ int kvm_arm_vgic_probe(void)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
                              uint64_t address, uint32_t data, PCIDevice *dev)
 {
+    AddressSpace *as = pci_device_iommu_address_space(dev);
+    IOMMUTLBEntry entry;
+    SMMUDevice *sdev;
+    SMMUV3State *s3;
+    SMMUState *s;
+
+    if (as == &address_space_memory) {
+        return 0;
+    }
+
+    /* MSI doorbell address is translated by an IOMMU */
+    sdev = container_of(as, SMMUDevice, as);
+    s3 = sdev->smmu;
+    s = &s3->smmu_state;
+
+    entry = s->iommu_ops.translate(&sdev->iommu, address, IOMMU_WO);
+
+    route->u.msi.address_lo = entry.translated_addr;
+    route->u.msi.address_hi = entry.translated_addr >> 32;
+
+    trace_kvm_arm_fixup_msi_route(address, sdev->devfn, sdev->iommu.name,
+                                  entry.translated_addr);
+
     return 0;
 }
 
diff --git a/target/arm/trace-events b/target/arm/trace-events
index e21c84f..eff2822 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -8,3 +8,6 @@ arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value %" P
 arm_gt_ctl_write(int timer, uint64_t value) "gt_ctl_write: timer %d value %" PRIx64
 arm_gt_imask_toggle(int timer, int irqstate) "gt_ctl_write: timer %d IMASK toggle, new irqstate %d"
 arm_gt_cntvoff_write(uint64_t value) "gt_cntvoff_write: value %" PRIx64
+
+# target/arm/kvm.c
+kvm_arm_fixup_msi_route(uint64_t iova, uint32_t devid, const char *name, uint64_t gpa) "MSI addr = 0x%"PRIx64" is translated for devfn=%d through %s into 0x%"PRIx64
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v5 8/8] hw/arm/smmuv3: VFIO integration
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (6 preceding siblings ...)
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 7/8] target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route Eric Auger
@ 2017-07-09 20:51 ` Eric Auger
       [not found] ` <CACJhume2HkAXVQ8kSCpGEfQV4NOP_=HrZCHXBNLnbm0B8dGQvw@mail.gmail.com>
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2017-07-09 20:51 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, tn, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

This patch allows doing PCIe passthrough with a guest exposed
with a vSMMUv3. It implements the replay and notify_flag_changed
iommu ops. Also on TLB and data structure invalidation commands,
we replay the mappings so that the physical IOMMU implements
updated stage 1 settings (Guest IOVA -> Guest PA) + stage 2 settings.

This works only if the guest smmuv3 driver implements the
"tlbi-on-map" option.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

at the moment the "tlbi-on-map" option only is set in DT mode.
---
 hw/arm/smmuv3.c     | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/arm/trace-events |   6 +++
 2 files changed, 129 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 639f682..1ff77f7 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -143,6 +143,32 @@ static MemTxResult smmu_read_cmdq(SMMUV3State *s, Cmd *cmd)
     return ret;
 }
 
+static void smmu_replay_all(SMMUState *s)
+{
+    SMMUNotifierNode *node;
+
+    QLIST_FOREACH(node, &s->notifiers_list, next) {
+        memory_region_iommu_replay_all(&node->sdev->iommu);
+    }
+}
+
+static void smmuv3_replay_single(MemoryRegion *mr, IOMMUNotifier *n,
+                                 uint64_t iova);
+
+static void smmu_notify_all(SMMUState *s, uint64_t iova)
+{
+    SMMUNotifierNode *node;
+
+    QLIST_FOREACH(node, &s->notifiers_list, next) {
+        MemoryRegion *mr = &node->sdev->iommu;
+        IOMMUNotifier *n;
+
+        IOMMU_NOTIFIER_FOREACH(n, mr) {
+            smmuv3_replay_single(mr, n, iova);
+        }
+    }
+}
+
 static int smmu_cmdq_consume(SMMUV3State *s)
 {
     uint32_t error = SMMU_CMD_ERR_NONE;
@@ -183,6 +209,7 @@ static int smmu_cmdq_consume(SMMUV3State *s)
              uint32_t streamid = cmd.word[1];
 
              trace_smmuv3_cmdq_cfgi_ste(streamid);
+             smmu_replay_all(&s->smmu_state);
             break;
         }
         case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -192,14 +219,17 @@ static int smmu_cmdq_consume(SMMUV3State *s)
             range = extract32(cmd.word[2], 0, 5);
             end = start + (1 << (range + 1)) - 1;
             trace_smmuv3_cmdq_cfgi_ste_range(start, end);
+            smmu_replay_all(&s->smmu_state);
             break;
         }
         case SMMU_CMD_CFGI_CD:
         case SMMU_CMD_CFGI_CD_ALL:
+            smmu_replay_all(&s->smmu_state);
             break;
         case SMMU_CMD_TLBI_NH_ALL:
         case SMMU_CMD_TLBI_NH_ASID:
             printf("%s TLBI* replay\n", __func__);
+            smmu_replay_all(&s->smmu_state);
             break;
         case SMMU_CMD_TLBI_NH_VA:
         {
@@ -210,6 +240,7 @@ static int smmu_cmdq_consume(SMMUV3State *s)
             uint64_t addr = high << 32 | (low << 12);
 
             trace_smmuv3_cmdq_tlbi_nh_va(asid, vmid, addr);
+            smmu_notify_all(&s->smmu_state, addr);
             break;
         }
         case SMMU_CMD_TLBI_NH_VAA:
@@ -222,6 +253,7 @@ static int smmu_cmdq_consume(SMMUV3State *s)
         case SMMU_CMD_TLBI_S12_VMALL:
         case SMMU_CMD_TLBI_S2_IPA:
         case SMMU_CMD_TLBI_NSNH_ALL:
+            smmu_replay_all(&s->smmu_state);
             break;
         case SMMU_CMD_ATC_INV:
         case SMMU_CMD_PRI_RESP:
@@ -804,6 +836,95 @@ out:
     return entry;
 }
 
+static int smmuv3_replay_hook(IOMMUTLBEntry *entry, void *private)
+{
+    int perm = entry->perm;
+
+    trace_smmuv3_replay_hook(entry->iova, entry->translated_addr,
+                             entry->addr_mask, entry->perm);
+    entry->perm = IOMMU_NONE;
+    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    entry->perm = perm;
+    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    return 0;
+}
+
+static void smmuv3_replay(MemoryRegion *mr, IOMMUNotifier *n)
+{
+    SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
+    SMMUV3State *s = sdev->smmu;
+    SMMUBaseClass *sbc = SMMU_DEVICE_GET_CLASS(s);
+    SMMUTransCfg cfg = {};
+    int ret;
+
+    ret = smmuv3_decode_config(mr, &cfg);
+    if (ret) {
+        error_report("%s error decoding the configuration for iommu mr=%s",
+                     __func__, mr->name);
+    }
+
+    if (cfg.disabled || cfg.bypassed) {
+        return;
+    }
+    /* is the smmu enabled */
+    sbc->page_walk_64(&cfg, 0, (1ULL << (64 - cfg.tsz)) - 1, false,
+                      smmuv3_replay_hook, n);
+}
+
+static void smmuv3_replay_single(MemoryRegion *mr, IOMMUNotifier *n,
+                                 uint64_t iova)
+{
+    SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
+    SMMUV3State *s = sdev->smmu;
+    SMMUBaseClass *sbc = SMMU_DEVICE_GET_CLASS(s);
+    SMMUTransCfg cfg = {};
+    int ret;
+
+    trace_smmuv3_replay_single(mr->name, iova, n);
+    ret = smmuv3_decode_config(mr, &cfg);
+    if (ret) {
+        error_report("%s error decoding the configuration for iommu mr=%s",
+                     __func__, mr->name);
+    }
+
+    if (cfg.disabled || cfg.bypassed) {
+        return;
+    }
+    /* is the smmu enabled */
+    sbc->page_walk_64(&cfg, iova, iova + 1, false,
+                      smmuv3_replay_hook, n);
+}
+
+static void smmuv3_notify_flag_changed(MemoryRegion *iommu,
+                                       IOMMUNotifierFlag old,
+                                       IOMMUNotifierFlag new)
+{
+    SMMUDevice *sdev = container_of(iommu, SMMUDevice, iommu);
+    SMMUV3State *s3 = sdev->smmu;
+    SMMUState *s = &(s3->smmu_state);
+    SMMUNotifierNode *node = NULL;
+    SMMUNotifierNode *next_node = NULL;
+
+    if (old == IOMMU_NOTIFIER_NONE) {
+        trace_smmuv3_notify_flag_add(iommu->name);
+        node = g_malloc0(sizeof(*node));
+        node->sdev = sdev;
+        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
+        return;
+    }
+
+    /* update notifier node with new flags */
+    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+        if (node->sdev == sdev) {
+            if (new == IOMMU_NOTIFIER_NONE) {
+                trace_smmuv3_notify_flag_del(iommu->name);
+                QLIST_REMOVE(node, next);
+                g_free(node);
+            }
+            return;
+        }
+    }
+}
 
 static inline void smmu_update_base_reg(SMMUV3State *s, uint64_t *base,
                                         uint64_t val)
@@ -1072,6 +1193,8 @@ static void smmu_realize(DeviceState *d, Error **errp)
     SysBusDevice *dev = SYS_BUS_DEVICE(d);
 
     sys->iommu_ops.translate = smmuv3_translate;
+    sys->iommu_ops.notify_flag_changed = smmuv3_notify_flag_changed;
+    sys->iommu_ops.replay = smmuv3_replay;
     /* Register Access */
     memset(sys->smmu_as_by_bus_num, 0, sizeof(sys->smmu_as_by_bus_num));
     memory_region_init_io(&sys->iomem, OBJECT(s),
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 30a817b..6c143be 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -50,3 +50,9 @@ smmuv3_dump_ste(int i, uint32_t word0, int j,  uint32_t word1) "STE[%2d]: %#010x
 smmuv3_dump_cd(int i, uint32_t word0, int j,  uint32_t word1) "CD[%2d]: %#010x\t CD[%2d]: %#010x"
 smmuv3_dump_cmd(int i, uint32_t word0, int j,  uint32_t word1) "CMD[%2d]: %#010x\t CMD[%2d]: %#010x"
 smmuv3_cfg_stage(int s, uint32_t oas, uint32_t tsz, uint64_t ttbr, bool aa64, uint32_t granule_sz, int initial_level) "TransCFG stage:%d oas:%d tsz:%d ttbr:0x%"PRIx64"  aa64:%d granule_sz:%d, initial_level = %d"
+
+smmuv3_replay(uint16_t sid, bool enabled) "sid=%d, enabled=%d"
+smmuv3_replay_hook(hwaddr iova, hwaddr pa, hwaddr mask, int perm) "iova=0x%"PRIx64" pa=0x%" PRIx64" mask=0x%"PRIx64" perm=%d"
+smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu mr=%s"
+smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s"
+smmuv3_replay_single(const char *name, uint64_t iova, void *n) "iommu mr=%s iova=0x%"PRIx64" n=%p"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
       [not found] ` <CACJhume2HkAXVQ8kSCpGEfQV4NOP_=HrZCHXBNLnbm0B8dGQvw@mail.gmail.com>
@ 2017-07-12 17:24   ` Geetha Akula
  2017-07-25 14:33     ` Auger Eric
  0 siblings, 1 reply; 22+ messages in thread
From: Geetha Akula @ 2017-07-12 17:24 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, Alex Williamson, peter.maydell,
	qemu-arm, qemu-devel, prem.mallappa
  Cc: mohun106, drjones, tcain, Radha.Chintakuntla, Sunil.Goutham, mst,
	jean-philippe.brucker, tn, Will Deacon, Robin Murphy, peterx,
	Edgar E. Iglesias, Bharat Bhushan, Christoffer Dall

Hi Eric


> This series implements the emulation code for ARM SMMUv3.
> This is the continuation of Prem's work [1].
>
> This v5 mainly brings VFIO integration in DT mode. On guest kernel
> side, this requires a quirk [1] to force TLB invalidation on map.
>
> The following changes also are noticeable:
> - fix SMMU_CMDQ_CONS offset
> - adds dma-coherent dt property which fixes the unhandled command
>   opcode bug.
> - implements block PTE
>
> The smmu is instantiated when passing the smmu option to machvirt:
> "-M virt-2.10,smmu"
>
> As I haven't split the code yet so that it can be easily reviewable
> I don't expect deep reviews at this stage. Also the implementation may
> be largely sub-optimal.
>
> Tested Use Cases:
> - booted a guest in dt and acpi mode with an iommu_platform
>   virtio-net-pci device (using dma ops). Tested with the following
>   guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>   64K - 48b.

Verified patches with virtio-net-pci. Not observed any command queue
issues like in V4 patch series. There is a huge difference in iperf numbers
on guest with and without SMMUv3 emulation i.e (1.5 Gbps with viommu
and 9.0 Gbps without viommu). I think this is expected behaviour.

Is there any plan to add device-iotlb and iotlb support in SMMUv3 emulation ?


Thank you,
Geetha.


> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>   - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>   - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
>
> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
> The problem I see is the user space driver dma-maps a huge area
> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
> huge page. I will work on this issue for next version.
>
> Known limitations:
> - no VMSAv8-32 suport
> - no nested stage support (S1 + S2)
> - no support for HYP mappings
> - register fine emulation, commands, interrupts and errors were
>   not accurately tested. Handling is sufficient to run use cases
>   described hereafter though.
>
> Best Regards
>
> Eric
>
> This series can be found at:
> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
>
> References:
> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
> [2] Prem's last iteration:
> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
>
> History:
> v4 -> v5:
> - initial_level now part of SMMUTransCfg
> - smmu_page_walk_64 takes into account the max input size
> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
> - smmuv3_translate: bug fix: don't walk on bypass
> - smmu_update_qreg: fix PROD index update
> - I did not yet address Peter's comments as the code is not mature enough
>   to be split into sub patches.
>
> v3 -> v4 [Eric]:
> - page table walk rewritten to allow scan of the page table within a
>   range of IOVA. This prepares for VFIO integration and replay.
> - configuration parsing partially reworked.
> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>   PRI, ATS, ..
> - added ACPI table generation
> - migrated to dynamic traces
> - mingw compilation fix
>
> v2 -> v3 [Eric]:
> - rebased on 2.9
> - mostly code and patch reorganization to ease the review process
> - optional patches removed. They may be handled separately. I am currently
>   working on ACPI enablement.
> - optional instantiation of the smmu in mach-virt
> - removed [2/9] (fdt functions) since not mandated
> - start splitting main patch into base and derived object
> - no new function feature added
>
> v1 -> v2 [Prem]:
> - Adopted review comments from Eric Auger
>         - Make SMMU_DPRINTF to internally call qemu_log
>             (since translation requests are too many, we need control
>              on the type of log we want)
>         - SMMUTransCfg modified to suite simplicity
>         - Change RegInfo to uint64 register array
>         - Code cleanup
>         - Test cleanups
> - Reshuffled patches
>
> v0 -> v1 [Prem]:
> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> - Reworked register access/update logic
> - Factored out translation code for
>         - single point bug fix
>         - sharing/removal in future
> - (optional) Unit tests added, with PCI test device
>         - S1 with 4k/64k, S1+S2 with 4k/64k
>         - (S1 or S2) only can be verified by Linux 4.7 driver
>         - (optional) Priliminary ACPI support
>
> v0 [Prem]:
> - Implements SMMUv3 spec 11.0
> - Supported for PCIe devices,
> - Command Queue and Event Queue supported
> - LPAE only, S1 is supported and Tested, S2 not tested
> - BE mode Translation not supported
> - IRQ support (legacy, no MSI)
> - Tested with DPDK and e1000
>
>
> Eric Auger (5):
>   hw/arm/smmu-common: smmu base class
>   hw/arm/virt: Add 2.10 machine type
>   hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>   target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>   hw/arm/smmuv3: VFIO integration
>
> Prem Mallappa (3):
>   hw/arm/smmuv3: smmuv3 emulation model
>   hw/arm/virt: Add SMMUv3 to the virt board
>   hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
>
>  default-configs/aarch64-softmmu.mak |    1 +
>  hw/arm/Makefile.objs                |    1 +
>  hw/arm/smmu-common.c                |  474 +++++++++++++
>  hw/arm/smmu-internal.h              |   89 +++
>  hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>  hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>  hw/arm/trace-events                 |   54 ++
>  hw/arm/virt-acpi-build.c            |   56 +-
>  hw/arm/virt.c                       |  111 +++-
>  include/hw/acpi/acpi-defs.h         |   15 +
>  include/hw/arm/smmu-common.h        |  127 ++++
>  include/hw/arm/smmuv3.h             |   87 +++
>  include/hw/arm/virt.h               |    5 +
>  target/arm/kvm.c                    |   28 +
>  target/arm/trace-events             |    3 +
>  15 files changed, 2949 insertions(+), 9 deletions(-)
>  create mode 100644 hw/arm/smmu-common.c
>  create mode 100644 hw/arm/smmu-internal.h
>  create mode 100644 hw/arm/smmuv3-internal.h
>  create mode 100644 hw/arm/smmuv3.c
>  create mode 100644 include/hw/arm/smmu-common.h
>  create mode 100644 include/hw/arm/smmuv3.h
>
> --
> 2.5.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model Eric Auger
@ 2017-07-13 12:00   ` Tomasz Nowicki
  2017-07-27 20:26     ` Auger Eric
  2017-07-13 12:57   ` Tomasz Nowicki
  1 sibling, 1 reply; 22+ messages in thread
From: Tomasz Nowicki @ 2017-07-13 12:00 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Eric,

On 09.07.2017 22:51, Eric Auger wrote:
> From: Prem Mallappa <prem.mallappa@broadcom.com>
> 
> Introduces the SMMUv3 derived model. This is based on
> System MMUv3 specification (v17).
> 
> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v4 -> v5:
> - change smmuv3_translate proto (IOMMUAccessFlags flag)
> - has_stagex replaced by is_ste_stagex
> - smmu_cfg_populate removed
> - added smmuv3_decode_config and reworked error management
> - remwork the naming of IOMMU mrs
> - fix SMMU_CMDQ_CONS offset
> 

[...]

> +
> +/*****************************
> + *  Register Access Primitives
> + *****************************/
> +
> +static inline void smmu_write64_reg(SMMUV3State *s, uint32_t addr, uint64_t val)
> +{
> +    addr >>= 2;
> +    s->regs[addr] = val & 0xFFFFFFFFULL;
> +    s->regs[addr + 1] = val & ~0xFFFFFFFFULL;
> +}
> +
> +static inline void smmu_write_reg(SMMUV3State *s, uint32_t addr, uint64_t val)
> +{
> +    s->regs[addr >> 2] = val;
> +}
> +
> +static inline uint32_t smmu_read_reg(SMMUV3State *s, uint32_t addr)
> +{
> +    return s->regs[addr >> 2];
> +}
> +
> +static inline uint64_t smmu_read64_reg(SMMUV3State *s, uint32_t addr)
> +{
> +    addr >>= 2;
> +    return s->regs[addr] | (s->regs[addr + 1] << 32);

To be consistent with smmu_write64_reg() we should not shift here second 
half of register, instead simply:

return s->regs[addr] | s->regs[addr + 1];

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model Eric Auger
  2017-07-13 12:00   ` Tomasz Nowicki
@ 2017-07-13 12:57   ` Tomasz Nowicki
  2017-07-27 20:25     ` Auger Eric
  1 sibling, 1 reply; 22+ messages in thread
From: Tomasz Nowicki @ 2017-07-13 12:57 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Eric,

On 09.07.2017 22:51, Eric Auger wrote:
> From: Prem Mallappa <prem.mallappa@broadcom.com>
> 
> Introduces the SMMUv3 derived model. This is based on
> System MMUv3 specification (v17).
> 
> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v4 -> v5:
> - change smmuv3_translate proto (IOMMUAccessFlags flag)
> - has_stagex replaced by is_ste_stagex
> - smmu_cfg_populate removed
> - added smmuv3_decode_config and reworked error management
> - remwork the naming of IOMMU mrs
> - fix SMMU_CMDQ_CONS offset
> 

[...]

> +
> +static void smmu_update_qreg(SMMUV3State *s, SMMUQueue *q, hwaddr reg,
> +                             uint32_t off, uint64_t val, unsigned size)
> +{
> +   if (size == 8 && off == 0) {
> +        smmu_write64_reg(s, reg, val);

Based on my observation we never get here.

If I read the code correctly, 
memory_region_dispatch_{write|read}()->memory_region_{write|read}_accessor() 
will cut all 8-bytes accesses into 4-bytes slices. However, this makes 
my SMMUv3 register handling happy:

  static const MemoryRegionOps smmu_mem_ops = {
      .read = smmu_read_mmio,
      .write = smmu_write_mmio,
      .endianness = DEVICE_LITTLE_ENDIAN,
      .valid = {
          .min_access_size = 4,
          .max_access_size = 8,
      },
+    .impl = {
+        .min_access_size = 4,
+	.max_access_size = 8,
+    },
  };

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (8 preceding siblings ...)
       [not found] ` <CACJhume2HkAXVQ8kSCpGEfQV4NOP_=HrZCHXBNLnbm0B8dGQvw@mail.gmail.com>
@ 2017-07-14  7:19 ` Tomasz Nowicki
  2017-08-01 11:01 ` Tomasz Nowicki
  10 siblings, 0 replies; 22+ messages in thread
From: Tomasz Nowicki @ 2017-07-14  7:19 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Eric,

With fixes in comments that I made, I was able to run VM with 
virtio-blk-pci and virtio-net-pci devices.

I have tried vhost-net as well but I am seeing outgoing packets payload 
corrupted from host perspective, tcpdump on host tun i/f shows zeroes in 
packet payload. However, tcpdump in VM shows that packets are fine. Have 
you seen anything like that? Packets incoming to VM are fine though. I 
will keep debugging on my side too.

Thanks,
Tomasz

On 09.07.2017 22:51, Eric Auger wrote:
> This series implements the emulation code for ARM SMMUv3.
> This is the continuation of Prem's work [1].
> 
> This v5 mainly brings VFIO integration in DT mode. On guest kernel
> side, this requires a quirk [1] to force TLB invalidation on map.
> 
> The following changes also are noticeable:
> - fix SMMU_CMDQ_CONS offset
> - adds dma-coherent dt property which fixes the unhandled command
>    opcode bug.
> - implements block PTE
> 
> The smmu is instantiated when passing the smmu option to machvirt:
> "-M virt-2.10,smmu"
> 
> As I haven't split the code yet so that it can be easily reviewable
> I don't expect deep reviews at this stage. Also the implementation may
> be largely sub-optimal.
> 
> Tested Use Cases:
> - booted a guest in dt and acpi mode with an iommu_platform
>    virtio-net-pci device (using dma ops). Tested with the following
>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>    64K - 48b.
> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
> 
> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
> The problem I see is the user space driver dma-maps a huge area
> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
> huge page. I will work on this issue for next version.
> 
> Known limitations:
> - no VMSAv8-32 suport
> - no nested stage support (S1 + S2)
> - no support for HYP mappings
> - register fine emulation, commands, interrupts and errors were
>    not accurately tested. Handling is sufficient to run use cases
>    described hereafter though.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
> 
> References:
> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
> [2] Prem's last iteration:
> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
> 
> History:
> v4 -> v5:
> - initial_level now part of SMMUTransCfg
> - smmu_page_walk_64 takes into account the max input size
> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
> - smmuv3_translate: bug fix: don't walk on bypass
> - smmu_update_qreg: fix PROD index update
> - I did not yet address Peter's comments as the code is not mature enough
>    to be split into sub patches.
> 
> v3 -> v4 [Eric]:
> - page table walk rewritten to allow scan of the page table within a
>    range of IOVA. This prepares for VFIO integration and replay.
> - configuration parsing partially reworked.
> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>    PRI, ATS, ..
> - added ACPI table generation
> - migrated to dynamic traces
> - mingw compilation fix
> 
> v2 -> v3 [Eric]:
> - rebased on 2.9
> - mostly code and patch reorganization to ease the review process
> - optional patches removed. They may be handled separately. I am currently
>    working on ACPI enablement.
> - optional instantiation of the smmu in mach-virt
> - removed [2/9] (fdt functions) since not mandated
> - start splitting main patch into base and derived object
> - no new function feature added
> 
> v1 -> v2 [Prem]:
> - Adopted review comments from Eric Auger
>          - Make SMMU_DPRINTF to internally call qemu_log
>              (since translation requests are too many, we need control
>               on the type of log we want)
>          - SMMUTransCfg modified to suite simplicity
>          - Change RegInfo to uint64 register array
>          - Code cleanup
>          - Test cleanups
> - Reshuffled patches
> 
> v0 -> v1 [Prem]:
> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> - Reworked register access/update logic
> - Factored out translation code for
>          - single point bug fix
>          - sharing/removal in future
> - (optional) Unit tests added, with PCI test device
>          - S1 with 4k/64k, S1+S2 with 4k/64k
>          - (S1 or S2) only can be verified by Linux 4.7 driver
>          - (optional) Priliminary ACPI support
> 
> v0 [Prem]:
> - Implements SMMUv3 spec 11.0
> - Supported for PCIe devices,
> - Command Queue and Event Queue supported
> - LPAE only, S1 is supported and Tested, S2 not tested
> - BE mode Translation not supported
> - IRQ support (legacy, no MSI)
> - Tested with DPDK and e1000
> 
> 
> Eric Auger (5):
>    hw/arm/smmu-common: smmu base class
>    hw/arm/virt: Add 2.10 machine type
>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>    hw/arm/smmuv3: VFIO integration
> 
> Prem Mallappa (3):
>    hw/arm/smmuv3: smmuv3 emulation model
>    hw/arm/virt: Add SMMUv3 to the virt board
>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
> 
>   default-configs/aarch64-softmmu.mak |    1 +
>   hw/arm/Makefile.objs                |    1 +
>   hw/arm/smmu-common.c                |  474 +++++++++++++
>   hw/arm/smmu-internal.h              |   89 +++
>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>   hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>   hw/arm/trace-events                 |   54 ++
>   hw/arm/virt-acpi-build.c            |   56 +-
>   hw/arm/virt.c                       |  111 +++-
>   include/hw/acpi/acpi-defs.h         |   15 +
>   include/hw/arm/smmu-common.h        |  127 ++++
>   include/hw/arm/smmuv3.h             |   87 +++
>   include/hw/arm/virt.h               |    5 +
>   target/arm/kvm.c                    |   28 +
>   target/arm/trace-events             |    3 +
>   15 files changed, 2949 insertions(+), 9 deletions(-)
>   create mode 100644 hw/arm/smmu-common.c
>   create mode 100644 hw/arm/smmu-internal.h
>   create mode 100644 hw/arm/smmuv3-internal.h
>   create mode 100644 hw/arm/smmuv3.c
>   create mode 100644 include/hw/arm/smmu-common.h
>   create mode 100644 include/hw/arm/smmuv3.h
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class
  2017-07-09 20:51 ` [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class Eric Auger
@ 2017-07-25 12:12   ` Tomasz Nowicki
  2017-07-27 20:28     ` Auger Eric
  0 siblings, 1 reply; 22+ messages in thread
From: Tomasz Nowicki @ 2017-07-25 12:12 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Eric,

I found out what is going on regarding vhost-net outgoing packet's 
payload corruption. My packets were corrupted because of inconsistent 
IOVA to HVA translation in IOTLB. Please see below.

On 09.07.2017 22:51, Eric Auger wrote:
> Introduces the base device and class for the ARM smmu.
> Implements VMSAv8-64 table lookup and translation. VMSAv8-32
> is not implemented.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
> 
> ---

[...]

> +
> +/**
> + * smmu_page_walk_level_64 - Walk an IOVA range from a specific level
> + * @baseaddr: table base address corresponding to @level
> + * @level: level
> + * @cfg: translation config
> + * @start: end of the IOVA range
> + * @end: end of the IOVA range
> + * @hook_fn: the hook that to be called for each detected area
> + * @private: private data for the hook function
> + * @read: whether parent level has read permission
> + * @write: whether parent level has write permission
> + * @nofail: indicates whether each iova of the range
> + *  must be translated or whether failure is allowed
> + * @notify_unmap: whether we should notify invalid entries
> + *
> + * Return 0 on success, < 0 on errors not related to translation
> + * process, > 1 on errors related to translation process (only
> + * if nofail is set)
> + */
> +static int
> +smmu_page_walk_level_64(dma_addr_t baseaddr, int level,
> +                        SMMUTransCfg *cfg, uint64_t start, uint64_t end,
> +                        smmu_page_walk_hook hook_fn, void *private,
> +                        bool read, bool write, bool nofail,
> +                        bool notify_unmap)
> +{
> +    uint64_t subpage_size, subpage_mask, pte, iova = start;
> +    bool read_cur, write_cur, entry_valid;
> +    int ret, granule_sz, stage;
> +    IOMMUTLBEntry entry;
> +
> +    granule_sz = cfg->granule_sz;
> +    stage = cfg->stage;
> +    subpage_size = 1ULL << level_shift(level, granule_sz);
> +    subpage_mask = level_page_mask(level, granule_sz);
> +
> +    trace_smmu_page_walk_level_in(level, baseaddr, granule_sz,
> +                                  start, end, subpage_size);
> +
> +    while (iova < end) {
> +        dma_addr_t next_table_baseaddr;
> +        uint64_t iova_next, pte_addr;
> +        uint32_t offset;
> +
> +        iova_next = (iova & subpage_mask) + subpage_size;
> +        offset = iova_level_offset(iova, level, granule_sz);
> +        pte_addr = baseaddr + offset * sizeof(pte);
> +        pte = get_pte(baseaddr, offset);
> +
> +        trace_smmu_page_walk_level(level, iova, subpage_size,
> +                                   baseaddr, offset, pte);
> +
> +        if (pte == (uint64_t)-1) {
> +            if (nofail) {
> +                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
> +            }
> +            goto next;
> +        }
> +        if (is_invalid_pte(pte) || is_reserved_pte(pte, level)) {
> +            trace_smmu_page_walk_level_res_invalid_pte(stage, level, baseaddr,
> +                                                       pte_addr, offset, pte);
> +            if (nofail) {
> +                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
> +            }
> +            goto next;
> +        }


vhost maintains its IOTLB cache and when there is no IOVA->HVA 
translation, it asks QEMU for help. However, IOTLB entries invalidations 
are guest initiative so for any DMA unmap at guest side we trap to SMMU 
emulation code and call:
smmu_notify_all -> smmuv3_replay_single -> smmu_page_walk_64 -> 
smmu_page_walk_level_64 -> smmuv3_replay_hook -> vhost_iommu_unmap_notify

The thing is that smmuv3_replay_hook() is never called because guest 
zeros PTE before we trap to QEMU so that smmu_page_walk_level_64() fails 
on ^^^ is_invalid_pte(pte) check. This way we keep old IOTLB entry in 
vhost and subsequent translations may be broken.

> +
> +        read_cur = read; /* TODO */
> +        write_cur = write; /* TODO */
> +        entry_valid = read_cur | write_cur; /* TODO */
> +
> +        if (is_page_pte(pte, level)) {
> +            uint64_t gpa = get_page_pte_address(pte, granule_sz);
> +            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +
> +            trace_smmu_page_walk_level_page_pte(stage, level, entry.iova,
> +                                                baseaddr, pte_addr, pte, gpa);
> +            if (!entry_valid && !notify_unmap) {
> +                printf("%s entry_valid=%d notify_unmap=%d\n", __func__,
> +                       entry_valid, notify_unmap);
> +                goto next;
> +            }
> +            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
> +                                  hook_fn, private);
> +            if (ret) {
> +                return ret;
> +            }
> +            goto next;
> +        }
> +        if (is_block_pte(pte, level)) {
> +            uint64_t block_size;
> +            hwaddr gpa = get_block_pte_address(pte, level, granule_sz,
> +                                               &block_size);
> +            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +
> +            if (gpa == -1) {
> +                if (nofail) {
> +                    return SMMU_TRANS_ERR_WALK_EXT_ABRT;
> +                } else {
> +                    goto next;
> +                }
> +            }
> +            trace_smmu_page_walk_level_block_pte(stage, level, baseaddr,
> +                                                 pte_addr, pte, iova, gpa,
> +                                                 (int)(block_size >> 20));
> +
> +            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
> +                                  hook_fn, private);
> +            if (ret) {
> +                return ret;
> +            }
> +            goto next;
> +        }
> +        if (level  == 3) {
> +            goto next;
> +        }
> +        /* table pte */
> +        next_table_baseaddr = get_table_pte_address(pte, granule_sz);
> +        trace_smmu_page_walk_level_table_pte(stage, level, baseaddr, pte_addr,
> +                                             pte, next_table_baseaddr);
> +        ret = smmu_page_walk_level_64(next_table_baseaddr, level + 1, cfg,
> +                                      iova, MIN(iova_next, end),
> +                                      hook_fn, private, read_cur, write_cur,
> +                                      nofail, notify_unmap);
> +        if (!ret) {
> +            return ret;
> +        }
> +
> +next:
> +        iova = iova_next;
> +    }
> +
> +    return SMMU_TRANS_ERR_NONE;
> +}

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-07-12 17:24   ` [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Geetha Akula
@ 2017-07-25 14:33     ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2017-07-25 14:33 UTC (permalink / raw)
  To: Geetha Akula, eric.auger.pro, Alex Williamson, peter.maydell,
	qemu-arm, qemu-devel, prem.mallappa
  Cc: mohun106, drjones, tcain, Radha.Chintakuntla, Sunil.Goutham, mst,
	jean-philippe.brucker, tn, Will Deacon, Robin Murphy, peterx,
	Edgar E. Iglesias, Bharat Bhushan, Christoffer Dall

Hi Geetha, Tomasz,

On 12/07/2017 19:24, Geetha Akula wrote:
> Hi Eric
> 
> 
>> This series implements the emulation code for ARM SMMUv3.
>> This is the continuation of Prem's work [1].
>>
>> This v5 mainly brings VFIO integration in DT mode. On guest kernel
>> side, this requires a quirk [1] to force TLB invalidation on map.
>>
>> The following changes also are noticeable:
>> - fix SMMU_CMDQ_CONS offset
>> - adds dma-coherent dt property which fixes the unhandled command
>>   opcode bug.
>> - implements block PTE
>>
>> The smmu is instantiated when passing the smmu option to machvirt:
>> "-M virt-2.10,smmu"
>>
>> As I haven't split the code yet so that it can be easily reviewable
>> I don't expect deep reviews at this stage. Also the implementation may
>> be largely sub-optimal.
>>
>> Tested Use Cases:
>> - booted a guest in dt and acpi mode with an iommu_platform
>>   virtio-net-pci device (using dma ops). Tested with the following
>>   guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>>   64K - 48b.
> 
> Verified patches with virtio-net-pci. Not observed any command queue
> issues like in V4 patch series.
Thank you for your comments and sorry for the delay, I was off.

Good to hear it fixes the command queue issues.

 There is a huge difference in iperf numbers
> on guest with and without SMMUv3 emulation i.e (1.5 Gbps with viommu
> and 9.0 Gbps without viommu). I think this is expected behaviour.
The perf can be definitively improved by setting the tlbi-on-map mode
for guest smmuv3 only when needed, as it is done with x86 qemu
intel-iommu,cache-mode=true explicit option.

> 
> Is there any plan to add device-iotlb and iotlb support in SMMUv3 emulation ?

What does both comprise exactly? do you mean PCI ATS support or vhost
device iotlb (ie. the use case exercised by Tomasz). Please can you
elaborate?

Thanks

Eric
> 
> 
> Thank you,
> Geetha.
> 
> 
>> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>>   - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>>   - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
>>
>> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
>> The problem I see is the user space driver dma-maps a huge area
>> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
>> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
>> huge page. I will work on this issue for next version.
>>
>> Known limitations:
>> - no VMSAv8-32 suport
>> - no nested stage support (S1 + S2)
>> - no support for HYP mappings
>> - register fine emulation, commands, interrupts and errors were
>>   not accurately tested. Handling is sufficient to run use cases
>>   described hereafter though.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
>> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
>>
>> References:
>> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
>> [2] Prem's last iteration:
>> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
>>
>> History:
>> v4 -> v5:
>> - initial_level now part of SMMUTransCfg
>> - smmu_page_walk_64 takes into account the max input size
>> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
>> - smmuv3_translate: bug fix: don't walk on bypass
>> - smmu_update_qreg: fix PROD index update
>> - I did not yet address Peter's comments as the code is not mature enough
>>   to be split into sub patches.
>>
>> v3 -> v4 [Eric]:
>> - page table walk rewritten to allow scan of the page table within a
>>   range of IOVA. This prepares for VFIO integration and replay.
>> - configuration parsing partially reworked.
>> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>>   PRI, ATS, ..
>> - added ACPI table generation
>> - migrated to dynamic traces
>> - mingw compilation fix
>>
>> v2 -> v3 [Eric]:
>> - rebased on 2.9
>> - mostly code and patch reorganization to ease the review process
>> - optional patches removed. They may be handled separately. I am currently
>>   working on ACPI enablement.
>> - optional instantiation of the smmu in mach-virt
>> - removed [2/9] (fdt functions) since not mandated
>> - start splitting main patch into base and derived object
>> - no new function feature added
>>
>> v1 -> v2 [Prem]:
>> - Adopted review comments from Eric Auger
>>         - Make SMMU_DPRINTF to internally call qemu_log
>>             (since translation requests are too many, we need control
>>              on the type of log we want)
>>         - SMMUTransCfg modified to suite simplicity
>>         - Change RegInfo to uint64 register array
>>         - Code cleanup
>>         - Test cleanups
>> - Reshuffled patches
>>
>> v0 -> v1 [Prem]:
>> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
>> - Reworked register access/update logic
>> - Factored out translation code for
>>         - single point bug fix
>>         - sharing/removal in future
>> - (optional) Unit tests added, with PCI test device
>>         - S1 with 4k/64k, S1+S2 with 4k/64k
>>         - (S1 or S2) only can be verified by Linux 4.7 driver
>>         - (optional) Priliminary ACPI support
>>
>> v0 [Prem]:
>> - Implements SMMUv3 spec 11.0
>> - Supported for PCIe devices,
>> - Command Queue and Event Queue supported
>> - LPAE only, S1 is supported and Tested, S2 not tested
>> - BE mode Translation not supported
>> - IRQ support (legacy, no MSI)
>> - Tested with DPDK and e1000
>>
>>
>> Eric Auger (5):
>>   hw/arm/smmu-common: smmu base class
>>   hw/arm/virt: Add 2.10 machine type
>>   hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>>   target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>>   hw/arm/smmuv3: VFIO integration
>>
>> Prem Mallappa (3):
>>   hw/arm/smmuv3: smmuv3 emulation model
>>   hw/arm/virt: Add SMMUv3 to the virt board
>>   hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
>>
>>  default-configs/aarch64-softmmu.mak |    1 +
>>  hw/arm/Makefile.objs                |    1 +
>>  hw/arm/smmu-common.c                |  474 +++++++++++++
>>  hw/arm/smmu-internal.h              |   89 +++
>>  hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>>  hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>>  hw/arm/trace-events                 |   54 ++
>>  hw/arm/virt-acpi-build.c            |   56 +-
>>  hw/arm/virt.c                       |  111 +++-
>>  include/hw/acpi/acpi-defs.h         |   15 +
>>  include/hw/arm/smmu-common.h        |  127 ++++
>>  include/hw/arm/smmuv3.h             |   87 +++
>>  include/hw/arm/virt.h               |    5 +
>>  target/arm/kvm.c                    |   28 +
>>  target/arm/trace-events             |    3 +
>>  15 files changed, 2949 insertions(+), 9 deletions(-)
>>  create mode 100644 hw/arm/smmu-common.c
>>  create mode 100644 hw/arm/smmu-internal.h
>>  create mode 100644 hw/arm/smmuv3-internal.h
>>  create mode 100644 hw/arm/smmuv3.c
>>  create mode 100644 include/hw/arm/smmu-common.h
>>  create mode 100644 include/hw/arm/smmuv3.h
>>
>> --
>> 2.5.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model
  2017-07-13 12:57   ` Tomasz Nowicki
@ 2017-07-27 20:25     ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2017-07-27 20:25 UTC (permalink / raw)
  To: Tomasz Nowicki, eric.auger.pro, peter.maydell, qemu-arm,
	qemu-devel, alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Tomasz,

On 13/07/2017 14:57, Tomasz Nowicki wrote:
> Hi Eric,
> 
> On 09.07.2017 22:51, Eric Auger wrote:
>> From: Prem Mallappa <prem.mallappa@broadcom.com>
>>
>> Introduces the SMMUv3 derived model. This is based on
>> System MMUv3 specification (v17).
>>
>> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v4 -> v5:
>> - change smmuv3_translate proto (IOMMUAccessFlags flag)
>> - has_stagex replaced by is_ste_stagex
>> - smmu_cfg_populate removed
>> - added smmuv3_decode_config and reworked error management
>> - remwork the naming of IOMMU mrs
>> - fix SMMU_CMDQ_CONS offset
>>
> 
> [...]
> 
>> +
>> +static void smmu_update_qreg(SMMUV3State *s, SMMUQueue *q, hwaddr reg,
>> +                             uint32_t off, uint64_t val, unsigned size)
>> +{
>> +   if (size == 8 && off == 0) {
>> +        smmu_write64_reg(s, reg, val);
> 
> Based on my observation we never get here.
> 
> If I read the code correctly,
> memory_region_dispatch_{write|read}()->memory_region_{write|read}_accessor()
> will cut all 8-bytes accesses into 4-bytes slices. However, this makes
> my SMMUv3 register handling happy:
> 
>  static const MemoryRegionOps smmu_mem_ops = {
>      .read = smmu_read_mmio,
>      .write = smmu_write_mmio,
>      .endianness = DEVICE_LITTLE_ENDIAN,
>      .valid = {
>          .min_access_size = 4,
>          .max_access_size = 8,
>      },
> +    .impl = {
> +        .min_access_size = 4,
> +    .max_access_size = 8,
> +    },
Yes indeed: without the .impl setting, only 4byte accesses are performed
by access_with_adjusted_size() which sets access_size_max to 4.

Thanks a lot.

Best regards

Eric
>  };
> 
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model
  2017-07-13 12:00   ` Tomasz Nowicki
@ 2017-07-27 20:26     ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2017-07-27 20:26 UTC (permalink / raw)
  To: Tomasz Nowicki, eric.auger.pro, peter.maydell, qemu-arm,
	qemu-devel, alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Tomasz,

On 13/07/2017 14:00, Tomasz Nowicki wrote:
> Hi Eric,
> 
> On 09.07.2017 22:51, Eric Auger wrote:
>> From: Prem Mallappa <prem.mallappa@broadcom.com>
>>
>> Introduces the SMMUv3 derived model. This is based on
>> System MMUv3 specification (v17).
>>
>> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v4 -> v5:
>> - change smmuv3_translate proto (IOMMUAccessFlags flag)
>> - has_stagex replaced by is_ste_stagex
>> - smmu_cfg_populate removed
>> - added smmuv3_decode_config and reworked error management
>> - remwork the naming of IOMMU mrs
>> - fix SMMU_CMDQ_CONS offset
>>
> 
> [...]
> 
>> +
>> +/*****************************
>> + *  Register Access Primitives
>> + *****************************/
>> +
>> +static inline void smmu_write64_reg(SMMUV3State *s, uint32_t addr,
>> uint64_t val)
>> +{
>> +    addr >>= 2;
>> +    s->regs[addr] = val & 0xFFFFFFFFULL;
>> +    s->regs[addr + 1] = val & ~0xFFFFFFFFULL;
>> +}
>> +
>> +static inline void smmu_write_reg(SMMUV3State *s, uint32_t addr,
>> uint64_t val)
>> +{
>> +    s->regs[addr >> 2] = val;
>> +}
>> +
>> +static inline uint32_t smmu_read_reg(SMMUV3State *s, uint32_t addr)
>> +{
>> +    return s->regs[addr >> 2];
>> +}
>> +
>> +static inline uint64_t smmu_read64_reg(SMMUV3State *s, uint32_t addr)
>> +{
>> +    addr >>= 2;
>> +    return s->regs[addr] | (s->regs[addr + 1] << 32);
> 
> To be consistent with smmu_write64_reg() we should not shift here second
> half of register, instead simply:
> 
> return s->regs[addr] | s->regs[addr + 1];

Thanks for spotting this. I think regs should be uint32_t instead and
extract64() could be used on write64 and shift would stay on read64().

Regards

Eric
> 
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class
  2017-07-25 12:12   ` Tomasz Nowicki
@ 2017-07-27 20:28     ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2017-07-27 20:28 UTC (permalink / raw)
  To: Tomasz Nowicki, eric.auger.pro, peter.maydell, qemu-arm,
	qemu-devel, alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias

Hi Tomasz,

On 25/07/2017 14:12, Tomasz Nowicki wrote:
> Hi Eric,
> 
> I found out what is going on regarding vhost-net outgoing packet's
> payload corruption. My packets were corrupted because of inconsistent
> IOVA to HVA translation in IOTLB. Please see below.
> 
> On 09.07.2017 22:51, Eric Auger wrote:
>> Introduces the base device and class for the ARM smmu.
>> Implements VMSAv8-64 table lookup and translation. VMSAv8-32
>> is not implemented.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Prem Mallappa <prem.mallappa@broadcom.com>
>>
>> ---
> 
> [...]
> 
>> +
>> +/**
>> + * smmu_page_walk_level_64 - Walk an IOVA range from a specific level
>> + * @baseaddr: table base address corresponding to @level
>> + * @level: level
>> + * @cfg: translation config
>> + * @start: end of the IOVA range
>> + * @end: end of the IOVA range
>> + * @hook_fn: the hook that to be called for each detected area
>> + * @private: private data for the hook function
>> + * @read: whether parent level has read permission
>> + * @write: whether parent level has write permission
>> + * @nofail: indicates whether each iova of the range
>> + *  must be translated or whether failure is allowed
>> + * @notify_unmap: whether we should notify invalid entries
>> + *
>> + * Return 0 on success, < 0 on errors not related to translation
>> + * process, > 1 on errors related to translation process (only
>> + * if nofail is set)
>> + */
>> +static int
>> +smmu_page_walk_level_64(dma_addr_t baseaddr, int level,
>> +                        SMMUTransCfg *cfg, uint64_t start, uint64_t end,
>> +                        smmu_page_walk_hook hook_fn, void *private,
>> +                        bool read, bool write, bool nofail,
>> +                        bool notify_unmap)
>> +{
>> +    uint64_t subpage_size, subpage_mask, pte, iova = start;
>> +    bool read_cur, write_cur, entry_valid;
>> +    int ret, granule_sz, stage;
>> +    IOMMUTLBEntry entry;
>> +
>> +    granule_sz = cfg->granule_sz;
>> +    stage = cfg->stage;
>> +    subpage_size = 1ULL << level_shift(level, granule_sz);
>> +    subpage_mask = level_page_mask(level, granule_sz);
>> +
>> +    trace_smmu_page_walk_level_in(level, baseaddr, granule_sz,
>> +                                  start, end, subpage_size);
>> +
>> +    while (iova < end) {
>> +        dma_addr_t next_table_baseaddr;
>> +        uint64_t iova_next, pte_addr;
>> +        uint32_t offset;
>> +
>> +        iova_next = (iova & subpage_mask) + subpage_size;
>> +        offset = iova_level_offset(iova, level, granule_sz);
>> +        pte_addr = baseaddr + offset * sizeof(pte);
>> +        pte = get_pte(baseaddr, offset);
>> +
>> +        trace_smmu_page_walk_level(level, iova, subpage_size,
>> +                                   baseaddr, offset, pte);
>> +
>> +        if (pte == (uint64_t)-1) {
>> +            if (nofail) {
>> +                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
>> +            }
>> +            goto next;
>> +        }
>> +        if (is_invalid_pte(pte) || is_reserved_pte(pte, level)) {
>> +            trace_smmu_page_walk_level_res_invalid_pte(stage, level,
>> baseaddr,
>> +                                                       pte_addr,
>> offset, pte);
>> +            if (nofail) {
>> +                return SMMU_TRANS_ERR_WALK_EXT_ABRT;
>> +            }
>> +            goto next;
>> +        }
> 
> 
> vhost maintains its IOTLB cache and when there is no IOVA->HVA
> translation, it asks QEMU for help. However, IOTLB entries invalidations
> are guest initiative so for any DMA unmap at guest side we trap to SMMU
> emulation code and call:
> smmu_notify_all -> smmuv3_replay_single -> smmu_page_walk_64 ->
> smmu_page_walk_level_64 -> smmuv3_replay_hook -> vhost_iommu_unmap_notify
> 
> The thing is that smmuv3_replay_hook() is never called because guest
> zeros PTE before we trap to QEMU so that smmu_page_walk_level_64() fails
> on ^^^ is_invalid_pte(pte) check. This way we keep old IOTLB entry in
> vhost and subsequent translations may be broken.

Thank you for the time you spent on this. I will work on this vhost use
case asap and will let you know.

Thanks

Eric
> 
>> +
>> +        read_cur = read; /* TODO */
>> +        write_cur = write; /* TODO */
>> +        entry_valid = read_cur | write_cur; /* TODO */
>> +
>> +        if (is_page_pte(pte, level)) {
>> +            uint64_t gpa = get_page_pte_address(pte, granule_sz);
>> +            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>> +
>> +            trace_smmu_page_walk_level_page_pte(stage, level,
>> entry.iova,
>> +                                                baseaddr, pte_addr,
>> pte, gpa);
>> +            if (!entry_valid && !notify_unmap) {
>> +                printf("%s entry_valid=%d notify_unmap=%d\n", __func__,
>> +                       entry_valid, notify_unmap);
>> +                goto next;
>> +            }
>> +            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
>> +                                  hook_fn, private);
>> +            if (ret) {
>> +                return ret;
>> +            }
>> +            goto next;
>> +        }
>> +        if (is_block_pte(pte, level)) {
>> +            uint64_t block_size;
>> +            hwaddr gpa = get_block_pte_address(pte, level, granule_sz,
>> +                                               &block_size);
>> +            int perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>> +
>> +            if (gpa == -1) {
>> +                if (nofail) {
>> +                    return SMMU_TRANS_ERR_WALK_EXT_ABRT;
>> +                } else {
>> +                    goto next;
>> +                }
>> +            }
>> +            trace_smmu_page_walk_level_block_pte(stage, level, baseaddr,
>> +                                                 pte_addr, pte, iova,
>> gpa,
>> +                                                 (int)(block_size >>
>> 20));
>> +
>> +            ret = call_entry_hook(iova, subpage_mask, gpa, perm,
>> +                                  hook_fn, private);
>> +            if (ret) {
>> +                return ret;
>> +            }
>> +            goto next;
>> +        }
>> +        if (level  == 3) {
>> +            goto next;
>> +        }
>> +        /* table pte */
>> +        next_table_baseaddr = get_table_pte_address(pte, granule_sz);
>> +        trace_smmu_page_walk_level_table_pte(stage, level, baseaddr,
>> pte_addr,
>> +                                             pte, next_table_baseaddr);
>> +        ret = smmu_page_walk_level_64(next_table_baseaddr, level + 1,
>> cfg,
>> +                                      iova, MIN(iova_next, end),
>> +                                      hook_fn, private, read_cur,
>> write_cur,
>> +                                      nofail, notify_unmap);
>> +        if (!ret) {
>> +            return ret;
>> +        }
>> +
>> +next:
>> +        iova = iova_next;
>> +    }
>> +
>> +    return SMMU_TRANS_ERR_NONE;
>> +}
> 
> Thanks,
> Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
                   ` (9 preceding siblings ...)
  2017-07-14  7:19 ` Tomasz Nowicki
@ 2017-08-01 11:01 ` Tomasz Nowicki
  2017-08-01 13:07   ` Auger Eric
  10 siblings, 1 reply; 22+ messages in thread
From: Tomasz Nowicki @ 2017-08-01 11:01 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias,
	Nair, Jayachandran

Hi Eric,

Just letting you know that I am facing another issue with the following 
setup:
1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device 
virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
2. On VM, I allocate some huge pages and run DPDK testpmd app:
# echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
# ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
# ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- 
--disable-hw-vlan-filter --disable-rss -i
EAL: Detected 14 lcore(s)
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:02.0 on NUMA socket -1
EAL:   probe driver: 1af4:1041 net_virtio
EAL:   using IOMMU type 1 (Type 1)
EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't write to PCI bar (0) : offset (14)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
EAL: Requested device 0000:00:02.0 cannot be used
EAL: No probed ethernet devices
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, 
socket=0

When VM uses *4K pages* the same setup works fine. I will work on this 
but please let me know in case you already know what is going on.

Thanks,
Tomasz


On 09.07.2017 22:51, Eric Auger wrote:
> This series implements the emulation code for ARM SMMUv3.
> This is the continuation of Prem's work [1].
> 
> This v5 mainly brings VFIO integration in DT mode. On guest kernel
> side, this requires a quirk [1] to force TLB invalidation on map.
> 
> The following changes also are noticeable:
> - fix SMMU_CMDQ_CONS offset
> - adds dma-coherent dt property which fixes the unhandled command
>    opcode bug.
> - implements block PTE
> 
> The smmu is instantiated when passing the smmu option to machvirt:
> "-M virt-2.10,smmu"
> 
> As I haven't split the code yet so that it can be easily reviewable
> I don't expect deep reviews at this stage. Also the implementation may
> be largely sub-optimal.
> 
> Tested Use Cases:
> - booted a guest in dt and acpi mode with an iommu_platform
>    virtio-net-pci device (using dma ops). Tested with the following
>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>    64K - 48b.
> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
> 
> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
> The problem I see is the user space driver dma-maps a huge area
> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
> huge page. I will work on this issue for next version.
> 
> Known limitations:
> - no VMSAv8-32 suport
> - no nested stage support (S1 + S2)
> - no support for HYP mappings
> - register fine emulation, commands, interrupts and errors were
>    not accurately tested. Handling is sufficient to run use cases
>    described hereafter though.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
> 
> References:
> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
> [2] Prem's last iteration:
> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
> 
> History:
> v4 -> v5:
> - initial_level now part of SMMUTransCfg
> - smmu_page_walk_64 takes into account the max input size
> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
> - smmuv3_translate: bug fix: don't walk on bypass
> - smmu_update_qreg: fix PROD index update
> - I did not yet address Peter's comments as the code is not mature enough
>    to be split into sub patches.
> 
> v3 -> v4 [Eric]:
> - page table walk rewritten to allow scan of the page table within a
>    range of IOVA. This prepares for VFIO integration and replay.
> - configuration parsing partially reworked.
> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>    PRI, ATS, ..
> - added ACPI table generation
> - migrated to dynamic traces
> - mingw compilation fix
> 
> v2 -> v3 [Eric]:
> - rebased on 2.9
> - mostly code and patch reorganization to ease the review process
> - optional patches removed. They may be handled separately. I am currently
>    working on ACPI enablement.
> - optional instantiation of the smmu in mach-virt
> - removed [2/9] (fdt functions) since not mandated
> - start splitting main patch into base and derived object
> - no new function feature added
> 
> v1 -> v2 [Prem]:
> - Adopted review comments from Eric Auger
>          - Make SMMU_DPRINTF to internally call qemu_log
>              (since translation requests are too many, we need control
>               on the type of log we want)
>          - SMMUTransCfg modified to suite simplicity
>          - Change RegInfo to uint64 register array
>          - Code cleanup
>          - Test cleanups
> - Reshuffled patches
> 
> v0 -> v1 [Prem]:
> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> - Reworked register access/update logic
> - Factored out translation code for
>          - single point bug fix
>          - sharing/removal in future
> - (optional) Unit tests added, with PCI test device
>          - S1 with 4k/64k, S1+S2 with 4k/64k
>          - (S1 or S2) only can be verified by Linux 4.7 driver
>          - (optional) Priliminary ACPI support
> 
> v0 [Prem]:
> - Implements SMMUv3 spec 11.0
> - Supported for PCIe devices,
> - Command Queue and Event Queue supported
> - LPAE only, S1 is supported and Tested, S2 not tested
> - BE mode Translation not supported
> - IRQ support (legacy, no MSI)
> - Tested with DPDK and e1000
> 
> 
> Eric Auger (5):
>    hw/arm/smmu-common: smmu base class
>    hw/arm/virt: Add 2.10 machine type
>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>    hw/arm/smmuv3: VFIO integration
> 
> Prem Mallappa (3):
>    hw/arm/smmuv3: smmuv3 emulation model
>    hw/arm/virt: Add SMMUv3 to the virt board
>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
> 
>   default-configs/aarch64-softmmu.mak |    1 +
>   hw/arm/Makefile.objs                |    1 +
>   hw/arm/smmu-common.c                |  474 +++++++++++++
>   hw/arm/smmu-internal.h              |   89 +++
>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>   hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>   hw/arm/trace-events                 |   54 ++
>   hw/arm/virt-acpi-build.c            |   56 +-
>   hw/arm/virt.c                       |  111 +++-
>   include/hw/acpi/acpi-defs.h         |   15 +
>   include/hw/arm/smmu-common.h        |  127 ++++
>   include/hw/arm/smmuv3.h             |   87 +++
>   include/hw/arm/virt.h               |    5 +
>   target/arm/kvm.c                    |   28 +
>   target/arm/trace-events             |    3 +
>   15 files changed, 2949 insertions(+), 9 deletions(-)
>   create mode 100644 hw/arm/smmu-common.c
>   create mode 100644 hw/arm/smmu-internal.h
>   create mode 100644 hw/arm/smmuv3-internal.h
>   create mode 100644 hw/arm/smmuv3.c
>   create mode 100644 include/hw/arm/smmu-common.h
>   create mode 100644 include/hw/arm/smmuv3.h
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-08-01 11:01 ` Tomasz Nowicki
@ 2017-08-01 13:07   ` Auger Eric
  2017-08-03 10:11     ` Tomasz Nowicki
  0 siblings, 1 reply; 22+ messages in thread
From: Auger Eric @ 2017-08-01 13:07 UTC (permalink / raw)
  To: Tomasz Nowicki, eric.auger.pro, peter.maydell, qemu-arm,
	qemu-devel, alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias,
	Nair, Jayachandran

Hi Tomasz,
On 01/08/2017 13:01, Tomasz Nowicki wrote:
> Hi Eric,
> 
> Just letting you know that I am facing another issue with the following
> setup:
> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
> 
> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
> --disable-hw-vlan-filter --disable-rss -i
> EAL: Detected 14 lcore(s)
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: PCI device 0000:00:02.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1041 net_virtio
> EAL:   using IOMMU type 1 (Type 1)
> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (0)
> EAL: Can't write to PCI bar (0) : offset (4)
> EAL: Can't write to PCI bar (0) : offset (14)
> EAL: Can't write to PCI bar (0) : offset (e)
> EAL: Can't read from PCI bar (0) : offset (c)
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: No probed ethernet devices
> Interactive-mode selected
> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
> socket=0
> 
> When VM uses *4K pages* the same setup works fine. I will work on this
> but please let me know in case you already know what is going on.

No I did not face that one. I was able to launch testpmd without such
early message. However I assigned an igbvf device to the guest and then
to DPDK. I've never tested your config.

However as stated in my cover letter at the moment DPDK is not working
for me because of storms of tlbi-on-maps. I intend to work on this as
soon as get some bandwidth, sorry.

Thanks

Eric
> 
> Thanks,
> Tomasz
> 
> 
> On 09.07.2017 22:51, Eric Auger wrote:
>> This series implements the emulation code for ARM SMMUv3.
>> This is the continuation of Prem's work [1].
>>
>> This v5 mainly brings VFIO integration in DT mode. On guest kernel
>> side, this requires a quirk [1] to force TLB invalidation on map.
>>
>> The following changes also are noticeable:
>> - fix SMMU_CMDQ_CONS offset
>> - adds dma-coherent dt property which fixes the unhandled command
>>    opcode bug.
>> - implements block PTE
>>
>> The smmu is instantiated when passing the smmu option to machvirt:
>> "-M virt-2.10,smmu"
>>
>> As I haven't split the code yet so that it can be easily reviewable
>> I don't expect deep reviews at this stage. Also the implementation may
>> be largely sub-optimal.
>>
>> Tested Use Cases:
>> - booted a guest in dt and acpi mode with an iommu_platform
>>    virtio-net-pci device (using dma ops). Tested with the following
>>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>>    64K - 48b.
>> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
>>
>> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
>> The problem I see is the user space driver dma-maps a huge area
>> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
>> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
>> huge page. I will work on this issue for next version.
>>
>> Known limitations:
>> - no VMSAv8-32 suport
>> - no nested stage support (S1 + S2)
>> - no support for HYP mappings
>> - register fine emulation, commands, interrupts and errors were
>>    not accurately tested. Handling is sufficient to run use cases
>>    described hereafter though.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
>> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
>>
>> References:
>> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
>> [2] Prem's last iteration:
>> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
>>
>> History:
>> v4 -> v5:
>> - initial_level now part of SMMUTransCfg
>> - smmu_page_walk_64 takes into account the max input size
>> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
>> - smmuv3_translate: bug fix: don't walk on bypass
>> - smmu_update_qreg: fix PROD index update
>> - I did not yet address Peter's comments as the code is not mature enough
>>    to be split into sub patches.
>>
>> v3 -> v4 [Eric]:
>> - page table walk rewritten to allow scan of the page table within a
>>    range of IOVA. This prepares for VFIO integration and replay.
>> - configuration parsing partially reworked.
>> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>>    PRI, ATS, ..
>> - added ACPI table generation
>> - migrated to dynamic traces
>> - mingw compilation fix
>>
>> v2 -> v3 [Eric]:
>> - rebased on 2.9
>> - mostly code and patch reorganization to ease the review process
>> - optional patches removed. They may be handled separately. I am
>> currently
>>    working on ACPI enablement.
>> - optional instantiation of the smmu in mach-virt
>> - removed [2/9] (fdt functions) since not mandated
>> - start splitting main patch into base and derived object
>> - no new function feature added
>>
>> v1 -> v2 [Prem]:
>> - Adopted review comments from Eric Auger
>>          - Make SMMU_DPRINTF to internally call qemu_log
>>              (since translation requests are too many, we need control
>>               on the type of log we want)
>>          - SMMUTransCfg modified to suite simplicity
>>          - Change RegInfo to uint64 register array
>>          - Code cleanup
>>          - Test cleanups
>> - Reshuffled patches
>>
>> v0 -> v1 [Prem]:
>> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
>> - Reworked register access/update logic
>> - Factored out translation code for
>>          - single point bug fix
>>          - sharing/removal in future
>> - (optional) Unit tests added, with PCI test device
>>          - S1 with 4k/64k, S1+S2 with 4k/64k
>>          - (S1 or S2) only can be verified by Linux 4.7 driver
>>          - (optional) Priliminary ACPI support
>>
>> v0 [Prem]:
>> - Implements SMMUv3 spec 11.0
>> - Supported for PCIe devices,
>> - Command Queue and Event Queue supported
>> - LPAE only, S1 is supported and Tested, S2 not tested
>> - BE mode Translation not supported
>> - IRQ support (legacy, no MSI)
>> - Tested with DPDK and e1000
>>
>>
>> Eric Auger (5):
>>    hw/arm/smmu-common: smmu base class
>>    hw/arm/virt: Add 2.10 machine type
>>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>>    hw/arm/smmuv3: VFIO integration
>>
>> Prem Mallappa (3):
>>    hw/arm/smmuv3: smmuv3 emulation model
>>    hw/arm/virt: Add SMMUv3 to the virt board
>>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
>>
>>   default-configs/aarch64-softmmu.mak |    1 +
>>   hw/arm/Makefile.objs                |    1 +
>>   hw/arm/smmu-common.c                |  474 +++++++++++++
>>   hw/arm/smmu-internal.h              |   89 +++
>>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>>   hw/arm/smmuv3.c                     | 1256
>> +++++++++++++++++++++++++++++++++++
>>   hw/arm/trace-events                 |   54 ++
>>   hw/arm/virt-acpi-build.c            |   56 +-
>>   hw/arm/virt.c                       |  111 +++-
>>   include/hw/acpi/acpi-defs.h         |   15 +
>>   include/hw/arm/smmu-common.h        |  127 ++++
>>   include/hw/arm/smmuv3.h             |   87 +++
>>   include/hw/arm/virt.h               |    5 +
>>   target/arm/kvm.c                    |   28 +
>>   target/arm/trace-events             |    3 +
>>   15 files changed, 2949 insertions(+), 9 deletions(-)
>>   create mode 100644 hw/arm/smmu-common.c
>>   create mode 100644 hw/arm/smmu-internal.h
>>   create mode 100644 hw/arm/smmuv3-internal.h
>>   create mode 100644 hw/arm/smmuv3.c
>>   create mode 100644 include/hw/arm/smmu-common.h
>>   create mode 100644 include/hw/arm/smmuv3.h
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-08-01 13:07   ` Auger Eric
@ 2017-08-03 10:11     ` Tomasz Nowicki
  2017-08-03 11:15       ` Auger Eric
  0 siblings, 1 reply; 22+ messages in thread
From: Tomasz Nowicki @ 2017-08-03 10:11 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, peter.maydell, qemu-arm, qemu-devel,
	alex.williamson, prem.mallappa
  Cc: drjones, christoffer.dall, Radha.Chintakuntla, Sunil.Goutham,
	mohun106, tcain, bharat.bhushan, mst, will.deacon,
	jean-philippe.brucker, robin.murphy, peterx, edgar.iglesias,
	Nair, Jayachandran

Hi Eric,

On 01.08.2017 15:07, Auger Eric wrote:
> Hi Tomasz,
> On 01/08/2017 13:01, Tomasz Nowicki wrote:
>> Hi Eric,
>>
>> Just letting you know that I am facing another issue with the following
>> setup:
>> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
>> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
>> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
>>
>> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
>> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
>> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
>> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
>> --disable-hw-vlan-filter --disable-rss -i
>> EAL: Detected 14 lcore(s)
>> EAL: Probing VFIO support...
>> EAL: VFIO support initialized
>> EAL: PCI device 0000:00:02.0 on NUMA socket -1
>> EAL:   probe driver: 1af4:1041 net_virtio
>> EAL:   using IOMMU type 1 (Type 1)
>> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (0)
>> EAL: Can't write to PCI bar (0) : offset (4)
>> EAL: Can't write to PCI bar (0) : offset (14)
>> EAL: Can't write to PCI bar (0) : offset (e)
>> EAL: Can't read from PCI bar (0) : offset (c)
>> EAL: Requested device 0000:00:02.0 cannot be used
>> EAL: No probed ethernet devices
>> Interactive-mode selected
>> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
>> socket=0
>>
>> When VM uses *4K pages* the same setup works fine. I will work on this
>> but please let me know in case you already know what is going on.
> 
> No I did not face that one. I was able to launch testpmd without such
> early message. However I assigned an igbvf device to the guest and then
> to DPDK. I've never tested your config.
> 
> However as stated in my cover letter at the moment DPDK is not working
> for me because of storms of tlbi-on-maps. I intend to work on this as
> soon as get some bandwidth, sorry.

I found what was the reason of failure.

QEMU creates BARs for VIRTIO PCI device. The size of it depends on what 
is necessary for VIRTIO protocol. In my case the BAR is 16K size which 
is too small to be mmapable for kernel with 64K pages:
vfio_pci_enable() -> vfio_pci_probe_mmaps() ->
here guest kernel checks that BAR size is smaller than current PAGE_SIZE 
and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being 
mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to 
enlarge BAR size to 8M and now testpmd works fine. I wonder how the same 
setup is working with e.g. Intel or AMD IOMMU.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
  2017-08-03 10:11     ` Tomasz Nowicki
@ 2017-08-03 11:15       ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2017-08-03 11:15 UTC (permalink / raw)
  To: Tomasz Nowicki, eric.auger.pro, peter.maydell, qemu-arm,
	qemu-devel, alex.williamson, prem.mallappa
  Cc: mohun106, drjones, tcain, Radha.Chintakuntla, Sunil.Goutham, mst,
	jean-philippe.brucker, robin.murphy, will.deacon, Nair,
	Jayachandran, peterx, edgar.iglesias, bharat.bhushan,
	christoffer.dall

Hi Tomasz,

On 03/08/2017 12:11, Tomasz Nowicki wrote:
> Hi Eric,
> 
> On 01.08.2017 15:07, Auger Eric wrote:
>> Hi Tomasz,
>> On 01/08/2017 13:01, Tomasz Nowicki wrote:
>>> Hi Eric,
>>>
>>> Just letting you know that I am facing another issue with the following
>>> setup:
>>> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
>>> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
>>> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
>>>
>>>
>>> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
>>> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
>>> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
>>> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
>>> --disable-hw-vlan-filter --disable-rss -i
>>> EAL: Detected 14 lcore(s)
>>> EAL: Probing VFIO support...
>>> EAL: VFIO support initialized
>>> EAL: PCI device 0000:00:02.0 on NUMA socket -1
>>> EAL:   probe driver: 1af4:1041 net_virtio
>>> EAL:   using IOMMU type 1 (Type 1)
>>> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (0)
>>> EAL: Can't write to PCI bar (0) : offset (4)
>>> EAL: Can't write to PCI bar (0) : offset (14)
>>> EAL: Can't write to PCI bar (0) : offset (e)
>>> EAL: Can't read from PCI bar (0) : offset (c)
>>> EAL: Requested device 0000:00:02.0 cannot be used
>>> EAL: No probed ethernet devices
>>> Interactive-mode selected
>>> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
>>> socket=0
>>>
>>> When VM uses *4K pages* the same setup works fine. I will work on this
>>> but please let me know in case you already know what is going on.
>>
>> No I did not face that one. I was able to launch testpmd without such
>> early message. However I assigned an igbvf device to the guest and then
>> to DPDK. I've never tested your config.
>>
>> However as stated in my cover letter at the moment DPDK is not working
>> for me because of storms of tlbi-on-maps. I intend to work on this as
>> soon as get some bandwidth, sorry.
> 
> I found what was the reason of failure.
> 
> QEMU creates BARs for VIRTIO PCI device. The size of it depends on what
> is necessary for VIRTIO protocol. In my case the BAR is 16K size which
> is too small to be mmapable for kernel with 64K pages:
> vfio_pci_enable() -> vfio_pci_probe_mmaps() ->
> here guest kernel checks that BAR size is smaller than current PAGE_SIZE
> and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being
> mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to
> enlarge BAR size to 8M and now testpmd works fine. I wonder how the same
> setup is working with e.g. Intel or AMD IOMMU.
Hum OK. Yet another thing to investigate! thank you for your efforts and
excellent news overall. Preparing a rebase ...

Thanks

Eric
> 
> Thanks,
> Tomasz
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-03 11:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-09 20:51 [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 1/8] hw/arm/smmu-common: smmu base class Eric Auger
2017-07-25 12:12   ` Tomasz Nowicki
2017-07-27 20:28     ` Auger Eric
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 2/8] hw/arm/smmuv3: smmuv3 emulation model Eric Auger
2017-07-13 12:00   ` Tomasz Nowicki
2017-07-27 20:26     ` Auger Eric
2017-07-13 12:57   ` Tomasz Nowicki
2017-07-27 20:25     ` Auger Eric
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 3/8] hw/arm/virt: Add SMMUv3 to the virt board Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 4/8] hw/arm/virt: Add 2.10 machine type Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 5/8] hw/arm/virt-acpi-build: Add smmuv3 node in IORT table Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 6/8] hw/arm/virt: Add tlbi-on-map property to the smmuv3 node Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 7/8] target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route Eric Auger
2017-07-09 20:51 ` [Qemu-devel] [RFC v5 8/8] hw/arm/smmuv3: VFIO integration Eric Auger
     [not found] ` <CACJhume2HkAXVQ8kSCpGEfQV4NOP_=HrZCHXBNLnbm0B8dGQvw@mail.gmail.com>
2017-07-12 17:24   ` [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support Geetha Akula
2017-07-25 14:33     ` Auger Eric
2017-07-14  7:19 ` Tomasz Nowicki
2017-08-01 11:01 ` Tomasz Nowicki
2017-08-01 13:07   ` Auger Eric
2017-08-03 10:11     ` Tomasz Nowicki
2017-08-03 11:15       ` Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.