All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/13] IOMMU support for ARM
@ 2014-03-11 15:49 Julien Grall
  2014-03-11 15:49 ` [PATCH v3 01/13] xen/common: grant-table: only call IOMMU if paging mode translate is disabled Julien Grall
                   ` (12 more replies)
  0 siblings, 13 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano.stabellini, Julien Grall, tim, ian.campbell

Hello,

This is the third version of the patch series to add support for IOMMU on
ARM. It also add ARM SMMU driver which is used for instance on Midway.

The IOMMU architecture for ARM is relying on the page table is shared between
the processor and each IOMMU.

The patch series is divided following:
    - #1: fixing grant-table with IOMMU. Will be necessary for ARM later
    - #2: Remove domain_id in hvm_iommu
    - #3-#4: Adding new device tree functions
    - #5-#7: Prepare IOMMU code to add support for ARM
    - #8: Adding basic device tree assignment support
    - #9-#12: Add IOMMU architecture for ARM
    - #13: Add SMMU drivers

For now the 1:1 workaround is not removed because a same platform can have
DMA-capable device which are under an IOMMU and some not.

This series has also dependency on:
    - early printk series :
    http://lists.xen.org/archives/html/xen-devel/2014-01/msg00288.html
    - interrupt series:
    http://lists.xen.org/archives/html/xen-devel/2014-01/msg02139.html
    - few bug fixes on the previous series

A working tree can be found here:
    git://xenbits.xen.org/people/julieng/xen-unstable.git branch smmu-v3

Any comments, questions are welcomed.

Sincerely yours,

Julien Grall (13):
  xen/common: grant-table: only call IOMMU if paging mode translate is
    disabled
  xen/passthrough: amd: Remove domain_id from hvm_iommu
  xen/dts: Add dt_property_read_bool
  xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle
  xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  xen/passthrough: iommu: Split generic IOMMU code
  xen/passthrough: iommu: Introduce arch specific code
  xen/passthrough: iommu: Basic support of device tree assignment
  xen/passthrough: Introduce IOMMU ARM architecture
  MAINTAINERS: Add drivers/passthrough/arm
  xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled
  xen/arm: Add the property "protected-devices" in the hypervisor node
  drivers/passthrough: arm: Add support for SMMU drivers

 MAINTAINERS                                 |    1 +
 xen/arch/arm/Rules.mk                       |    1 +
 xen/arch/arm/device.c                       |   15 +
 xen/arch/arm/domain.c                       |    7 +
 xen/arch/arm/domain_build.c                 |   78 +-
 xen/arch/arm/kernel.h                       |    3 +
 xen/arch/arm/p2m.c                          |    4 +
 xen/arch/arm/setup.c                        |    2 +
 xen/arch/x86/domctl.c                       |    6 +-
 xen/arch/x86/hvm/io.c                       |    2 +-
 xen/arch/x86/tboot.c                        |    3 +-
 xen/common/device_tree.c                    |  161 ++-
 xen/common/grant_table.c                    |    7 +-
 xen/drivers/passthrough/Makefile            |    6 +-
 xen/drivers/passthrough/amd/iommu_cmd.c     |    3 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |    8 +-
 xen/drivers/passthrough/amd/iommu_map.c     |   56 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   53 +-
 xen/drivers/passthrough/arm/Makefile        |    2 +
 xen/drivers/passthrough/arm/iommu.c         |   70 ++
 xen/drivers/passthrough/arm/smmu.c          | 1736 +++++++++++++++++++++++++++
 xen/drivers/passthrough/device_tree.c       |  106 ++
 xen/drivers/passthrough/iommu.c             |  524 +-------
 xen/drivers/passthrough/pci.c               |  452 +++++++
 xen/drivers/passthrough/vtd/iommu.c         |   80 +-
 xen/drivers/passthrough/x86/Makefile        |    1 +
 xen/drivers/passthrough/x86/iommu.c         |   91 ++
 xen/include/asm-arm/device.h                |   13 +-
 xen/include/asm-arm/domain.h                |    2 +
 xen/include/asm-arm/hvm/iommu.h             |   10 +
 xen/include/asm-arm/iommu.h                 |   36 +
 xen/include/asm-x86/hvm/iommu.h             |   28 +
 xen/include/asm-x86/iommu.h                 |   44 +
 xen/include/xen/device_tree.h               |   89 ++
 xen/include/xen/hvm/iommu.h                 |   33 +-
 xen/include/xen/iommu.h                     |   70 +-
 36 files changed, 3132 insertions(+), 671 deletions(-)
 create mode 100644 xen/drivers/passthrough/arm/Makefile
 create mode 100644 xen/drivers/passthrough/arm/iommu.c
 create mode 100644 xen/drivers/passthrough/arm/smmu.c
 create mode 100644 xen/drivers/passthrough/device_tree.c
 create mode 100644 xen/drivers/passthrough/x86/iommu.c
 create mode 100644 xen/include/asm-arm/hvm/iommu.h
 create mode 100644 xen/include/asm-arm/iommu.h
 create mode 100644 xen/include/asm-x86/iommu.h

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v3 01/13] xen/common: grant-table: only call IOMMU if paging mode translate is disabled
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 15:49 ` [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu Julien Grall
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: stefano.stabellini, Keir Fraser, Julien Grall, tim, ian.campbell

>From Xen point of view, ARM guests are PV guest with paging auto translate
enabled.

When IOMMU support will be added for ARM, mapping grant ref will always crash
Xen due to the BUG_ON in __gnttab_map_grant_ref.

On x86:
    - PV guests always have paging mode translate disabled
    - PVH and HVM guests have always paging mode translate enabled

It means that we can safely replace the check that the domain is a PV guests
by checking if the guest has paging mode translate enabled.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/common/grant_table.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 107b000..778bdb7 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -721,12 +721,10 @@ __gnttab_map_grant_ref(
 
     double_gt_lock(lgt, rgt);
 
-    if ( is_pv_domain(ld) && need_iommu(ld) )
+    if ( !paging_mode_translate(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
-        /* Shouldn't happen, because you can't use iommu in a HVM domain. */
-        BUG_ON(paging_mode_translate(ld));
         /* We're not translated, so we know that gmfns and mfns are
            the same things, so the IOMMU entry is always 1-to-1. */
         mapcount(lgt, rd, frame, &wrc, &rdc);
@@ -931,11 +929,10 @@ __gnttab_unmap_common(
             act->pin -= GNTPIN_hstw_inc;
     }
 
-    if ( is_pv_domain(ld) && need_iommu(ld) )
+    if ( !paging_mode_translate(ld) && need_iommu(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
-        BUG_ON(paging_mode_translate(ld));
         mapcount(lgt, rd, op->frame, &wrc, &rdc);
         if ( (wrc + rdc) == 0 )
             err = iommu_unmap_page(ld, op->frame);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
  2014-03-11 15:49 ` [PATCH v3 01/13] xen/common: grant-table: only call IOMMU if paging mode translate is disabled Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:19   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 03/13] xen/dts: Add dt_property_read_bool Julien Grall
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Suravee Suthikulpanit

The structure hvm_iommu contains a shadow value of domain->domain_id. There
is no reason to not directly use domain->domain_id.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jan Beulich <jbeulich@suse.com>

---
    Changes in v3:
        - Patch added
---
 xen/drivers/passthrough/amd/iommu_cmd.c     |    3 +--
 xen/drivers/passthrough/amd/iommu_map.c     |    2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |    8 +++-----
 xen/include/xen/hvm/iommu.h                 |    1 -
 4 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthrough/amd/iommu_cmd.c
index d27bd3c..4faa01b 100644
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
@@ -354,8 +354,7 @@ static void _amd_iommu_flush_pages(struct domain *d,
 {
     unsigned long flags;
     struct amd_iommu *iommu;
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    unsigned int dom_id = hd->domain_id;
+    unsigned int dom_id = d->domain_id;
 
     /* send INVALIDATE_IOMMU_PAGES command */
     for_each_amd_iommu ( iommu )
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 1294561..b79e470 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -614,7 +614,7 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
                 /* valid = 0 only works for dom0 passthrough mode */
                 amd_iommu_set_root_page_table((u32 *)device_entry,
                                               page_to_maddr(hd->root_table),
-                                              hd->domain_id,
+                                              d->domain_id,
                                               hd->paging_mode, 1);
 
                 amd_iommu_flush_device(iommu, req_id);
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index c26aabc..79f4a77 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -138,7 +138,7 @@ static void amd_iommu_setup_domain_device(
     {
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
-            (u32 *)dte, page_to_maddr(hd->root_table), hd->domain_id,
+            (u32 *)dte, page_to_maddr(hd->root_table), domain->domain_id,
             hd->paging_mode, valid);
 
         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
@@ -152,7 +152,7 @@ static void amd_iommu_setup_domain_device(
                         "domain = %d, paging mode = %d\n",
                         req_id, pdev->type,
                         page_to_maddr(hd->root_table),
-                        hd->domain_id, hd->paging_mode);
+                        domain->domain_id, hd->paging_mode);
     }
 
     spin_unlock_irqrestore(&iommu->lock, flags);
@@ -273,8 +273,6 @@ static int amd_iommu_domain_init(struct domain *d)
                       IOMMU_PAGING_MODE_LEVEL_2 :
                       get_paging_mode(max_page);
 
-    hd->domain_id = d->domain_id;
-
     guest_iommu_init(d);
 
     return 0;
@@ -333,7 +331,7 @@ void amd_iommu_disable_domain_device(struct domain *domain,
 
         AMD_IOMMU_DEBUG("Disable: device id = %#x, "
                         "domain = %d, paging mode = %d\n",
-                        req_id,  domain_hvm_iommu(domain)->domain_id,
+                        req_id,  domain->domain_id,
                         domain_hvm_iommu(domain)->paging_mode);
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
index 26539e0..6ab684e 100644
--- a/xen/include/xen/hvm/iommu.h
+++ b/xen/include/xen/hvm/iommu.h
@@ -44,7 +44,6 @@ struct hvm_iommu {
     struct list_head mapped_rmrrs;
 
     /* amd iommu support */
-    int domain_id;
     int paging_mode;
     struct page_info *root_table;
     struct guest_iommu *g_iommu;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 03/13] xen/dts: Add dt_property_read_bool
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
  2014-03-11 15:49 ` [PATCH v3 01/13] xen/common: grant-table: only call IOMMU if paging mode translate is disabled Julien Grall
  2014-03-11 15:49 ` [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 15:49 ` [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle Julien Grall
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano.stabellini, Julien Grall, tim, ian.campbell

The function check if a property exists in a specific node.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
    Changes in v2:
        - Fix typo in commit message
---
 xen/common/device_tree.c      |    6 ++----
 xen/include/xen/device_tree.h |   21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index c66d1d5..ccdb7ff 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -512,10 +512,8 @@ static void __init *unflatten_dt_alloc(unsigned long *mem, unsigned long size,
 }
 
 /* Find a property with a given name for a given node and return it. */
-static const struct dt_property *
-dt_find_property(const struct dt_device_node *np,
-                 const char *name,
-                 u32 *lenp)
+const struct dt_property *dt_find_property(const struct dt_device_node *np,
+                                           const char *name, u32 *lenp)
 {
     const struct dt_property *pp;
 
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 9a8c3de..7c075d9 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -15,6 +15,7 @@
 #include <xen/init.h>
 #include <xen/string.h>
 #include <xen/types.h>
+#include <xen/stdbool.h>
 
 #define DEVICE_TREE_MAX_DEPTH 16
 
@@ -347,6 +348,10 @@ struct dt_device_node *dt_find_compatible_node(struct dt_device_node *from,
 const void *dt_get_property(const struct dt_device_node *np,
                             const char *name, u32 *lenp);
 
+const struct dt_property *dt_find_property(const struct dt_device_node *np,
+                                           const char *name, u32 *lenp);
+
+
 /**
  * dt_property_read_u32 - Helper to read a u32 property.
  * @np: node to get the value
@@ -369,6 +374,22 @@ bool_t dt_property_read_u64(const struct dt_device_node *np,
                             const char *name, u64 *out_value);
 
 /**
+ * dt_property_read_bool - Check if a property exists
+ * @np: node to get the value
+ * @name: name of the property
+ *
+ * Search for a property in a device node.
+ * Return true if the property exists false otherwise.
+ */
+static inline bool_t dt_property_read_bool(const struct dt_device_node *np,
+                                           const char *name)
+{
+    const struct dt_property *prop = dt_find_property(np, name, NULL);
+
+    return prop ? true : false;
+}
+
+/**
  * dt_property_read_string - Find and read a string from a property
  * @np:         Device node from which the property value is to be read
  * @propname:   Name of the property to be searched
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (2 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 03/13] xen/dts: Add dt_property_read_bool Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:20   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM Julien Grall
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano.stabellini, Julien Grall, tim, ian.campbell

Code adapted from linux drivers/of/base.c (commit ef42c58).

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    Changes in v2:
        - Remove hard tabs in dt_parse_phandle
---
 xen/common/device_tree.c      |  151 ++++++++++++++++++++++++++++++++++++++++-
 xen/include/xen/device_tree.h |   54 +++++++++++++++
 2 files changed, 203 insertions(+), 2 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index ccdb7ff..564f2bb 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -1090,9 +1090,9 @@ int dt_device_get_address(const struct dt_device_node *dev, int index,
  *
  * Returns a node pointer.
  */
-static const struct dt_device_node *dt_find_node_by_phandle(dt_phandle handle)
+static struct dt_device_node *dt_find_node_by_phandle(dt_phandle handle)
 {
-    const struct dt_device_node *np;
+    struct dt_device_node *np;
 
     dt_for_each_device_node(dt_host, np)
         if ( np->phandle == handle )
@@ -1477,6 +1477,153 @@ bool_t dt_device_is_available(const struct dt_device_node *device)
     return 0;
 }
 
+static int __dt_parse_phandle_with_args(const struct dt_device_node *np,
+                                        const char *list_name,
+                                        const char *cells_name,
+                                        int cell_count, int index,
+                                        struct dt_phandle_args *out_args)
+{
+    const __be32 *list, *list_end;
+    int rc = 0, cur_index = 0;
+    u32 size, count = 0;
+    struct dt_device_node *node = NULL;
+    dt_phandle phandle;
+
+    /* Retrieve the phandle list property */
+    list = dt_get_property(np, list_name, &size);
+    if ( !list )
+        return -ENOENT;
+    list_end = list + size / sizeof(*list);
+
+    /* Loop over the phandles until all the requested entry is found */
+    while ( list < list_end )
+    {
+        rc = -EINVAL;
+        count = 0;
+
+        /*
+         * If phandle is 0, then it is an empty entry with no
+         * arguments.  Skip forward to the next entry.
+         * */
+        phandle = be32_to_cpup(list++);
+        if ( phandle )
+        {
+            /*
+             * Find the provider node and parse the #*-cells
+             * property to determine the argument length.
+             *
+             * This is not needed if the cell count is hard-coded
+             * (i.e. cells_name not set, but cell_count is set),
+             * except when we're going to return the found node
+             * below.
+             */
+            if ( cells_name || cur_index == index )
+            {
+                node = dt_find_node_by_phandle(phandle);
+                if ( !node )
+                {
+                    dt_printk(XENLOG_ERR "%s: could not find phandle\n",
+                              np->full_name);
+                    goto err;
+                }
+            }
+
+            if ( cells_name )
+            {
+                if ( !dt_property_read_u32(node, cells_name, &count) )
+                {
+                    dt_printk("%s: could not get %s for %s\n",
+                              np->full_name, cells_name, node->full_name);
+                    goto err;
+                }
+            }
+            else
+                count = cell_count;
+
+            /*
+             * Make sure that the arguments actually fit in the
+             * remaining property data length
+             */
+            if ( list + count > list_end )
+            {
+                dt_printk(XENLOG_ERR "%s: arguments longer than property\n",
+                          np->full_name);
+                goto err;
+            }
+        }
+
+        /*
+         * All of the error cases above bail out of the loop, so at
+         * this point, the parsing is successful. If the requested
+         * index matches, then fill the out_args structure and return,
+         * or return -ENOENT for an empty entry.
+         */
+        rc = -ENOENT;
+        if ( cur_index == index )
+        {
+            if (!phandle)
+                goto err;
+
+            if ( out_args )
+            {
+                int i;
+
+                WARN_ON(count > MAX_PHANDLE_ARGS);
+                if (count > MAX_PHANDLE_ARGS)
+                    count = MAX_PHANDLE_ARGS;
+                out_args->np = node;
+                out_args->args_count = count;
+                for ( i = 0; i < count; i++ )
+                    out_args->args[i] = be32_to_cpup(list++);
+            }
+
+            /* Found it! return success */
+            return 0;
+        }
+
+        node = NULL;
+        list += count;
+        cur_index++;
+    }
+
+    /*
+     * Returning result will be one of:
+     * -ENOENT : index is for empty phandle
+     * -EINVAL : parsing error on data
+     * [1..n]  : Number of phandle (count mode; when index = -1)
+     */
+    rc = index < 0 ? cur_index : -ENOENT;
+err:
+    return rc;
+}
+
+struct dt_device_node *dt_parse_phandle(const struct dt_device_node *np,
+                                        const char *phandle_name, int index)
+{
+    struct dt_phandle_args args;
+
+    if (index < 0)
+        return NULL;
+
+    if (__dt_parse_phandle_with_args(np, phandle_name, NULL, 0,
+                                     index, &args))
+        return NULL;
+
+    return args.np;
+}
+
+
+int dt_parse_phandle_with_args(const struct dt_device_node *np,
+                               const char *list_name,
+                               const char *cells_name, int index,
+                               struct dt_phandle_args *out_args)
+{
+    if ( index < 0 )
+        return -EINVAL;
+    return __dt_parse_phandle_with_args(np, list_name, cells_name, 0,
+                                        index, out_args);
+}
+
 /**
  * unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @fdt: The parent device tree blob
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 7c075d9..d429e60 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -112,6 +112,13 @@ struct dt_device_node {
 
 };
 
+#define MAX_PHANDLE_ARGS 16
+struct dt_phandle_args {
+    struct dt_device_node *np;
+    int args_count;
+    uint32_t args[MAX_PHANDLE_ARGS];
+};
+
 /**
  * IRQ line type.
  *
@@ -621,6 +628,53 @@ void dt_set_range(__be32 **cellp, const struct dt_device_node *np,
 void dt_get_range(const __be32 **cellp, const struct dt_device_node *np,
                   u64 *address, u64 *size);
 
+/**
+ * dt_parse_phandle - Resolve a phandle property to a device_node pointer
+ * @np: Pointer to device node holding phandle property
+ * @phandle_name: Name of property holding a phandle value
+ * @index: For properties holding a table of phandles, this is the index into
+ *         the table
+ *
+ * Returns the device_node pointer.
+ */
+struct dt_device_node *dt_parse_phandle(const struct dt_device_node *np,
+				                        const char *phandle_name,
+                                        int index);
+
+/**
+ * dt_parse_phandle_with_args() - Find a node pointed by phandle in a list
+ * @np:	pointer to a device tree node containing a list
+ * @list_name: property name that contains a list
+ * @cells_name: property name that specifies phandles' arguments count
+ * @index: index of a phandle to parse out
+ * @out_args: optional pointer to output arguments structure (will be filled)
+ *
+ * This function is useful to parse lists of phandles and their arguments.
+ * Returns 0 on success and fills out_args, on error returns appropriate
+ * errno value.
+ *
+ * Example:
+ *
+ * phandle1: node1 {
+ * 	#list-cells = <2>;
+ * }
+ *
+ * phandle2: node2 {
+ * 	#list-cells = <1>;
+ * }
+ *
+ * node3 {
+ * 	list = <&phandle1 1 2 &phandle2 3>;
+ * }
+ *
+ * To get a device_node of the `node2' node you may call this:
+ * dt_parse_phandle_with_args(node3, "list", "#list-cells", 1, &args);
+ */
+int dt_parse_phandle_with_args(const struct dt_device_node *np,
+                               const char *list_name,
+                               const char *cells_name, int index,
+                               struct dt_phandle_args *out_args);
+
 #endif /* __XEN_DEVICE_TREE_H */
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (3 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:22   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code Julien Grall
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: stefano.stabellini, Julien Grall, tim, ian.campbell, Xiantao Zhang

DOM0 on ARM will have the same requirements as DOM0 PVH when iommu is enabled.
Both PVH and ARM guest has paging mode translate enabled, so Xen can use it
to know if it needs to check the requirements.

Rename the function and remove "pvh" word in the panic message.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>

---
    Changes in v2:
        - IOMMU can be disabled on ARM if the platform doesn't have
        IOMMU.
---
 xen/drivers/passthrough/iommu.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index c70165a..3c63f87 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -130,13 +130,17 @@ int iommu_domain_init(struct domain *d)
     return hd->platform_ops->init(d);
 }
 
-static __init void check_dom0_pvh_reqs(struct domain *d)
+static __init void check_dom0_reqs(struct domain *d)
 {
-    if ( !iommu_enabled )
+    if ( !paging_mode_translate(d) )
+        return;
+
+    if ( is_pvh_domain(d) && !iommu_enabled )
         panic("Presently, iommu must be enabled for pvh dom0\n");
 
     if ( iommu_passthrough )
-        panic("For pvh dom0, dom0-passthrough must not be enabled\n");
+        panic("Dom0 uses translate paging mode, dom0-passthrough must not be "
+              "enabled\n");
 
     iommu_dom0_strict = 1;
 }
@@ -145,8 +149,7 @@ void __init iommu_dom0_init(struct domain *d)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    if ( is_pvh_domain(d) )
-        check_dom0_pvh_reqs(d);
+    check_dom0_reqs(d);
 
     if ( !iommu_enabled )
         return;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (4 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 16:50   ` Jan Beulich
  2014-03-18 16:24   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Xiantao Zhang

The generic IOMMU framework code (xen/drivers/passthrough/iommu.c) contains
functions specific to x86 and PCI.

Split the framework in 3 distincts files:
    - iommu.c: contains generic functions shared between x86 and ARM
               (when it will be supported)
    - pci.c: contains specific functions for PCI passthrough
    - x86/iommu.c: contains specific functions for x86

io.c contains x86 HVM specific code. Only compile for x86.

This patch is mostly code movement in new files.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>

---
    Changes in v3:
        - share_p2m_table should stay in common code
        - update_ire_from_msi and read_msi_from_ire should go in pci code
        - remove switch case in iommu_do_domctl

    Changes in v2:
        - Update commit message
        - Removing spurious change in drivers/passthrough/vtd/iommu.c
        - Move iommu_x86.c in x86/iommu.c
        - Merge iommu_pci.c in pci.c
        - Introduce iommu_do_pci_domctl
---
 xen/drivers/passthrough/Makefile     |    4 +-
 xen/drivers/passthrough/iommu.c      |  483 +---------------------------------
 xen/drivers/passthrough/pci.c        |  452 +++++++++++++++++++++++++++++++
 xen/drivers/passthrough/x86/Makefile |    1 +
 xen/drivers/passthrough/x86/iommu.c  |   50 ++++
 xen/include/asm-x86/iommu.h          |   42 +++
 xen/include/xen/hvm/iommu.h          |    1 +
 xen/include/xen/iommu.h              |   47 ++--
 8 files changed, 588 insertions(+), 492 deletions(-)
 create mode 100644 xen/drivers/passthrough/x86/iommu.c
 create mode 100644 xen/include/asm-x86/iommu.h

diff --git a/xen/drivers/passthrough/Makefile b/xen/drivers/passthrough/Makefile
index 7c40fa5..6e08f89 100644
--- a/xen/drivers/passthrough/Makefile
+++ b/xen/drivers/passthrough/Makefile
@@ -3,5 +3,5 @@ subdir-$(x86) += amd
 subdir-$(x86_64) += x86
 
 obj-y += iommu.o
-obj-y += io.o
-obj-y += pci.o
+obj-$(x86) += io.o
+obj-$(HAS_PCI) += pci.o
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 3c63f87..8a2fdea 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -24,7 +24,6 @@
 #include <xsm/xsm.h>
 
 static void parse_iommu_param(char *s);
-static int iommu_populate_page_table(struct domain *d);
 static void iommu_dump_p2m_table(unsigned char key);
 
 /*
@@ -179,86 +178,7 @@ void __init iommu_dom0_init(struct domain *d)
     return hd->platform_ops->dom0_init(d);
 }
 
-int iommu_add_device(struct pci_dev *pdev)
-{
-    struct hvm_iommu *hd;
-    int rc;
-    u8 devfn;
-
-    if ( !pdev->domain )
-        return -EINVAL;
-
-    ASSERT(spin_is_locked(&pcidevs_lock));
-
-    hd = domain_hvm_iommu(pdev->domain);
-    if ( !iommu_enabled || !hd->platform_ops )
-        return 0;
-
-    rc = hd->platform_ops->add_device(pdev->devfn, pdev);
-    if ( rc || !pdev->phantom_stride )
-        return rc;
-
-    for ( devfn = pdev->devfn ; ; )
-    {
-        devfn += pdev->phantom_stride;
-        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
-            return 0;
-        rc = hd->platform_ops->add_device(devfn, pdev);
-        if ( rc )
-            printk(XENLOG_WARNING "IOMMU: add %04x:%02x:%02x.%u failed (%d)\n",
-                   pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rc);
-    }
-}
-
-int iommu_enable_device(struct pci_dev *pdev)
-{
-    struct hvm_iommu *hd;
-
-    if ( !pdev->domain )
-        return -EINVAL;
-
-    ASSERT(spin_is_locked(&pcidevs_lock));
-
-    hd = domain_hvm_iommu(pdev->domain);
-    if ( !iommu_enabled || !hd->platform_ops ||
-         !hd->platform_ops->enable_device )
-        return 0;
-
-    return hd->platform_ops->enable_device(pdev);
-}
-
-int iommu_remove_device(struct pci_dev *pdev)
-{
-    struct hvm_iommu *hd;
-    u8 devfn;
-
-    if ( !pdev->domain )
-        return -EINVAL;
-
-    hd = domain_hvm_iommu(pdev->domain);
-    if ( !iommu_enabled || !hd->platform_ops )
-        return 0;
-
-    for ( devfn = pdev->devfn ; pdev->phantom_stride; )
-    {
-        int rc;
-
-        devfn += pdev->phantom_stride;
-        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
-            break;
-        rc = hd->platform_ops->remove_device(devfn, pdev);
-        if ( !rc )
-            continue;
-
-        printk(XENLOG_ERR "IOMMU: remove %04x:%02x:%02x.%u failed (%d)\n",
-               pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rc);
-        return rc;
-    }
-
-    return hd->platform_ops->remove_device(pdev->devfn, pdev);
-}
-
-static void iommu_teardown(struct domain *d)
+void iommu_teardown(struct domain *d)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
 
@@ -267,151 +187,6 @@ static void iommu_teardown(struct domain *d)
     tasklet_schedule(&iommu_pt_cleanup_tasklet);
 }
 
-/*
- * If the device isn't owned by dom0, it means it already
- * has been assigned to other domain, or it doesn't exist.
- */
-static int device_assigned(u16 seg, u8 bus, u8 devfn)
-{
-    struct pci_dev *pdev;
-
-    spin_lock(&pcidevs_lock);
-    pdev = pci_get_pdev_by_domain(dom0, seg, bus, devfn);
-    spin_unlock(&pcidevs_lock);
-
-    return pdev ? 0 : -EBUSY;
-}
-
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
-{
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct pci_dev *pdev;
-    int rc = 0;
-
-    if ( !iommu_enabled || !hd->platform_ops )
-        return 0;
-
-    /* Prevent device assign if mem paging or mem sharing have been 
-     * enabled for this domain */
-    if ( unlikely(!need_iommu(d) &&
-            (d->arch.hvm_domain.mem_sharing_enabled ||
-             d->mem_event->paging.ring_page)) )
-        return -EXDEV;
-
-    if ( !spin_trylock(&pcidevs_lock) )
-        return -ERESTART;
-
-    if ( need_iommu(d) <= 0 )
-    {
-        if ( !iommu_use_hap_pt(d) )
-        {
-            rc = iommu_populate_page_table(d);
-            if ( rc )
-            {
-                spin_unlock(&pcidevs_lock);
-                return rc;
-            }
-        }
-        d->need_iommu = 1;
-    }
-
-    pdev = pci_get_pdev_by_domain(dom0, seg, bus, devfn);
-    if ( !pdev )
-    {
-        rc = pci_get_pdev(seg, bus, devfn) ? -EBUSY : -ENODEV;
-        goto done;
-    }
-
-    pdev->fault.count = 0;
-
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pdev)) )
-        goto done;
-
-    for ( ; pdev->phantom_stride; rc = 0 )
-    {
-        devfn += pdev->phantom_stride;
-        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
-            break;
-        rc = hd->platform_ops->assign_device(d, devfn, pdev);
-        if ( rc )
-            printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
-                   d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                   rc);
-    }
-
- done:
-    if ( !has_arch_pdevs(d) && need_iommu(d) )
-        iommu_teardown(d);
-    spin_unlock(&pcidevs_lock);
-
-    return rc;
-}
-
-static int iommu_populate_page_table(struct domain *d)
-{
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct page_info *page;
-    int rc = 0, n = 0;
-
-    d->need_iommu = -1;
-
-    this_cpu(iommu_dont_flush_iotlb) = 1;
-    spin_lock(&d->page_alloc_lock);
-
-    if ( unlikely(d->is_dying) )
-        rc = -ESRCH;
-
-    while ( !rc && (page = page_list_remove_head(&d->page_list)) )
-    {
-        if ( is_hvm_domain(d) ||
-            (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
-        {
-            BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page))));
-            rc = hd->platform_ops->map_page(
-                d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
-                IOMMUF_readable|IOMMUF_writable);
-            if ( rc )
-            {
-                page_list_add(page, &d->page_list);
-                break;
-            }
-        }
-        page_list_add_tail(page, &d->arch.relmem_list);
-        if ( !(++n & 0xff) && !page_list_empty(&d->page_list) &&
-             hypercall_preempt_check() )
-            rc = -ERESTART;
-    }
-
-    if ( !rc )
-    {
-        /*
-         * The expectation here is that generally there are many normal pages
-         * on relmem_list (the ones we put there) and only few being in an
-         * offline/broken state. The latter ones are always at the head of the
-         * list. Hence we first move the whole list, and then move back the
-         * first few entries.
-         */
-        page_list_move(&d->page_list, &d->arch.relmem_list);
-        while ( (page = page_list_first(&d->page_list)) != NULL &&
-                (page->count_info & (PGC_state|PGC_broken)) )
-        {
-            page_list_del(page, &d->page_list);
-            page_list_add_tail(page, &d->arch.relmem_list);
-        }
-    }
-
-    spin_unlock(&d->page_alloc_lock);
-    this_cpu(iommu_dont_flush_iotlb) = 0;
-
-    if ( !rc )
-        iommu_iotlb_flush_all(d);
-    else if ( rc != -ERESTART )
-        iommu_teardown(d);
-
-    return rc;
-}
-
-
 void iommu_domain_destroy(struct domain *d)
 {
     struct hvm_iommu *hd  = domain_hvm_iommu(d);
@@ -498,53 +273,6 @@ void iommu_iotlb_flush_all(struct domain *d)
     hd->platform_ops->iotlb_flush_all(d);
 }
 
-/* caller should hold the pcidevs_lock */
-int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
-{
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct pci_dev *pdev = NULL;
-    int ret = 0;
-
-    if ( !iommu_enabled || !hd->platform_ops )
-        return -EINVAL;
-
-    ASSERT(spin_is_locked(&pcidevs_lock));
-    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
-    if ( !pdev )
-        return -ENODEV;
-
-    while ( pdev->phantom_stride )
-    {
-        devfn += pdev->phantom_stride;
-        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
-            break;
-        ret = hd->platform_ops->reassign_device(d, dom0, devfn, pdev);
-        if ( !ret )
-            continue;
-
-        printk(XENLOG_G_ERR "d%d: deassign %04x:%02x:%02x.%u failed (%d)\n",
-               d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
-        return ret;
-    }
-
-    devfn = pdev->devfn;
-    ret = hd->platform_ops->reassign_device(d, dom0, devfn, pdev);
-    if ( ret )
-    {
-        dprintk(XENLOG_G_ERR,
-                "d%d: deassign device (%04x:%02x:%02x.%u) failed\n",
-                d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
-        return ret;
-    }
-
-    pdev->fault.count = 0;
-
-    if ( !has_arch_pdevs(d) && need_iommu(d) )
-        iommu_teardown(d);
-
-    return ret;
-}
-
 int __init iommu_setup(void)
 {
     int rc = -ENODEV;
@@ -585,91 +313,27 @@ int __init iommu_setup(void)
     return rc;
 }
 
-static int iommu_get_device_group(
-    struct domain *d, u16 seg, u8 bus, u8 devfn,
-    XEN_GUEST_HANDLE_64(uint32) buf, int max_sdevs)
-{
-    struct hvm_iommu *hd = domain_hvm_iommu(d);
-    struct pci_dev *pdev;
-    int group_id, sdev_id;
-    u32 bdf;
-    int i = 0;
-    const struct iommu_ops *ops = hd->platform_ops;
-
-    if ( !iommu_enabled || !ops || !ops->get_device_group_id )
-        return 0;
-
-    group_id = ops->get_device_group_id(seg, bus, devfn);
-
-    spin_lock(&pcidevs_lock);
-    for_each_pdev( d, pdev )
-    {
-        if ( (pdev->seg != seg) ||
-             ((pdev->bus == bus) && (pdev->devfn == devfn)) )
-            continue;
-
-        if ( xsm_get_device_group(XSM_HOOK, (seg << 16) | (pdev->bus << 8) | pdev->devfn) )
-            continue;
-
-        sdev_id = ops->get_device_group_id(seg, pdev->bus, pdev->devfn);
-        if ( (sdev_id == group_id) && (i < max_sdevs) )
-        {
-            bdf = 0;
-            bdf |= (pdev->bus & 0xff) << 16;
-            bdf |= (pdev->devfn & 0xff) << 8;
-
-            if ( unlikely(copy_to_guest_offset(buf, i, &bdf, 1)) )
-            {
-                spin_unlock(&pcidevs_lock);
-                return -1;
-            }
-            i++;
-        }
-    }
-    spin_unlock(&pcidevs_lock);
-
-    return i;
-}
-
-void iommu_update_ire_from_apic(
-    unsigned int apic, unsigned int reg, unsigned int value)
-{
-    const struct iommu_ops *ops = iommu_get_ops();
-    ops->update_ire_from_apic(apic, reg, value);
-}
-
-int iommu_update_ire_from_msi(
-    struct msi_desc *msi_desc, struct msi_msg *msg)
+void iommu_resume()
 {
     const struct iommu_ops *ops = iommu_get_ops();
-    return iommu_intremap ? ops->update_ire_from_msi(msi_desc, msg) : 0;
+    if ( iommu_enabled )
+        ops->resume();
 }
 
-void iommu_read_msi_from_ire(
-    struct msi_desc *msi_desc, struct msi_msg *msg)
+int iommu_do_domctl(
+    struct xen_domctl *domctl, struct domain *d,
+    XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
-    const struct iommu_ops *ops = iommu_get_ops();
-    if ( iommu_intremap )
-        ops->read_msi_from_ire(msi_desc, msg);
-}
+    int ret = -ENOSYS;
 
-unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg)
-{
-    const struct iommu_ops *ops = iommu_get_ops();
-    return ops->read_apic_from_ire(apic, reg);
-}
+    if ( !iommu_enabled )
+        return -ENOSYS;
 
-int __init iommu_setup_hpet_msi(struct msi_desc *msi)
-{
-    const struct iommu_ops *ops = iommu_get_ops();
-    return ops->setup_hpet_msi ? ops->setup_hpet_msi(msi) : -ENODEV;
-}
+#ifdef HAS_PCI
+    ret = iommu_do_pci_domctl(domctl, d, u_domctl);
+#endif
 
-void iommu_resume()
-{
-    const struct iommu_ops *ops = iommu_get_ops();
-    if ( iommu_enabled )
-        ops->resume();
+    return ret;
 }
 
 void iommu_suspend()
@@ -695,125 +359,6 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
-int iommu_do_domctl(
-    struct xen_domctl *domctl, struct domain *d,
-    XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
-{
-    u16 seg;
-    u8 bus, devfn;
-    int ret = 0;
-
-    if ( !iommu_enabled )
-        return -ENOSYS;
-
-    switch ( domctl->cmd )
-    {
-    case XEN_DOMCTL_get_device_group:
-    {
-        u32 max_sdevs;
-        XEN_GUEST_HANDLE_64(uint32) sdevs;
-
-        ret = xsm_get_device_group(XSM_HOOK, domctl->u.get_device_group.machine_sbdf);
-        if ( ret )
-            break;
-
-        seg = domctl->u.get_device_group.machine_sbdf >> 16;
-        bus = (domctl->u.get_device_group.machine_sbdf >> 8) & 0xff;
-        devfn = domctl->u.get_device_group.machine_sbdf & 0xff;
-        max_sdevs = domctl->u.get_device_group.max_sdevs;
-        sdevs = domctl->u.get_device_group.sdev_array;
-
-        ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, max_sdevs);
-        if ( ret < 0 )
-        {
-            dprintk(XENLOG_ERR, "iommu_get_device_group() failed!\n");
-            ret = -EFAULT;
-            domctl->u.get_device_group.num_sdevs = 0;
-        }
-        else
-        {
-            domctl->u.get_device_group.num_sdevs = ret;
-            ret = 0;
-        }
-        if ( __copy_field_to_guest(u_domctl, domctl, u.get_device_group) )
-            ret = -EFAULT;
-    }
-    break;
-
-    case XEN_DOMCTL_test_assign_device:
-        ret = xsm_test_assign_device(XSM_HOOK, domctl->u.assign_device.machine_sbdf);
-        if ( ret )
-            break;
-
-        seg = domctl->u.assign_device.machine_sbdf >> 16;
-        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
-        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
-
-        if ( device_assigned(seg, bus, devfn) )
-        {
-            printk(XENLOG_G_INFO
-                   "%04x:%02x:%02x.%u already assigned, or non-existent\n",
-                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
-            ret = -EINVAL;
-        }
-        break;
-
-    case XEN_DOMCTL_assign_device:
-        if ( unlikely(d->is_dying) )
-        {
-            ret = -EINVAL;
-            break;
-        }
-
-        ret = xsm_assign_device(XSM_HOOK, d, domctl->u.assign_device.machine_sbdf);
-        if ( ret )
-            break;
-
-        seg = domctl->u.assign_device.machine_sbdf >> 16;
-        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
-        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
-
-        ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
-        if ( ret == -ERESTART )
-            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
-                                                "h", u_domctl);
-        else if ( ret )
-            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
-                   "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n",
-                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                   d->domain_id, ret);
-
-        break;
-
-    case XEN_DOMCTL_deassign_device:
-        ret = xsm_deassign_device(XSM_HOOK, d, domctl->u.assign_device.machine_sbdf);
-        if ( ret )
-            break;
-
-        seg = domctl->u.assign_device.machine_sbdf >> 16;
-        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
-        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
-
-        spin_lock(&pcidevs_lock);
-        ret = deassign_device(d, seg, bus, devfn);
-        spin_unlock(&pcidevs_lock);
-        if ( ret )
-            printk(XENLOG_G_ERR
-                   "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
-                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                   d->domain_id, ret);
-
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
 static void iommu_dump_p2m_table(unsigned char key)
 {
     struct domain *d;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ff78142..ea7e169 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -26,6 +26,9 @@
 #include <asm/hvm/irq.h>
 #include <xen/delay.h>
 #include <xen/keyhandler.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/paging.h>
 #include <xen/radix-tree.h>
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
@@ -995,6 +998,455 @@ static int __init setup_dump_pcidevs(void)
 }
 __initcall(setup_dump_pcidevs);
 
+int iommu_update_ire_from_msi(
+    struct msi_desc *msi_desc, struct msi_msg *msg)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    return iommu_intremap ? ops->update_ire_from_msi(msi_desc, msg) : 0;
+}
+
+void iommu_read_msi_from_ire(
+    struct msi_desc *msi_desc, struct msi_msg *msg)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    if ( iommu_intremap )
+        ops->read_msi_from_ire(msi_desc, msg);
+}
+
+static int iommu_populate_page_table(struct domain *d)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    struct page_info *page;
+    int rc = 0, n = 0;
+
+    d->need_iommu = -1;
+
+    this_cpu(iommu_dont_flush_iotlb) = 1;
+    spin_lock(&d->page_alloc_lock);
+
+    if ( unlikely(d->is_dying) )
+        rc = -ESRCH;
+
+
+    while ( !rc && (page = page_list_remove_head(&d->page_list)) )
+    {
+        if ( is_hvm_domain(d) ||
+            (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page )
+        {
+            BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page))));
+            rc = hd->platform_ops->map_page(
+                d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
+                IOMMUF_readable|IOMMUF_writable);
+            if ( rc )
+            {
+                page_list_add(page, &d->page_list);
+                break;
+            }
+        }
+        page_list_add_tail(page, &d->arch.relmem_list);
+        if ( !(++n & 0xff) && !page_list_empty(&d->page_list) &&
+             hypercall_preempt_check() )
+            rc = -ERESTART;
+    }
+
+    if ( !rc )
+    {
+        /*
+         * The expectation here is that generally there are many normal pages
+         * on relmem_list (the ones we put there) and only few being in an
+         * offline/broken state. The latter ones are always at the head of the
+         * list. Hence we first move the whole list, and then move back the
+         * first few entries.
+         */
+        page_list_move(&d->page_list, &d->arch.relmem_list);
+        while ( (page = page_list_first(&d->page_list)) != NULL &&
+                (page->count_info & (PGC_state|PGC_broken)) )
+        {
+            page_list_del(page, &d->page_list);
+            page_list_add_tail(page, &d->arch.relmem_list);
+        }
+    }
+
+    spin_unlock(&d->page_alloc_lock);
+    this_cpu(iommu_dont_flush_iotlb) = 0;
+
+    if ( !rc )
+        iommu_iotlb_flush_all(d);
+    else if ( rc != -ERESTART )
+        iommu_teardown(d);
+
+    return rc;
+}
+
+int iommu_add_device(struct pci_dev *pdev)
+{
+    struct hvm_iommu *hd;
+    int rc;
+    u8 devfn;
+
+    if ( !pdev->domain )
+        return -EINVAL;
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    hd = domain_hvm_iommu(pdev->domain);
+    if ( !iommu_enabled || !hd->platform_ops )
+        return 0;
+
+    rc = hd->platform_ops->add_device(pdev->devfn, pdev);
+    if ( rc || !pdev->phantom_stride )
+        return rc;
+
+    for ( devfn = pdev->devfn ; ; )
+    {
+        devfn += pdev->phantom_stride;
+        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
+            return 0;
+        rc = hd->platform_ops->add_device(devfn, pdev);
+        if ( rc )
+            printk(XENLOG_WARNING "IOMMU: add %04x:%02x:%02x.%u failed (%d)\n",
+                   pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rc);
+    }
+}
+
+int iommu_enable_device(struct pci_dev *pdev)
+{
+    struct hvm_iommu *hd;
+
+    if ( !pdev->domain )
+        return -EINVAL;
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    hd = domain_hvm_iommu(pdev->domain);
+    if ( !iommu_enabled || !hd->platform_ops ||
+         !hd->platform_ops->enable_device )
+        return 0;
+
+    return hd->platform_ops->enable_device(pdev);
+}
+
+int iommu_remove_device(struct pci_dev *pdev)
+{
+    struct hvm_iommu *hd;
+    u8 devfn;
+
+    if ( !pdev->domain )
+        return -EINVAL;
+
+    hd = domain_hvm_iommu(pdev->domain);
+    if ( !iommu_enabled || !hd->platform_ops )
+        return 0;
+
+    for ( devfn = pdev->devfn ; pdev->phantom_stride; )
+    {
+        int rc;
+
+        devfn += pdev->phantom_stride;
+        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
+            break;
+        rc = hd->platform_ops->remove_device(devfn, pdev);
+        if ( !rc )
+            continue;
+
+        printk(XENLOG_ERR "IOMMU: remove %04x:%02x:%02x.%u failed (%d)\n",
+               pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rc);
+        return rc;
+    }
+
+    return hd->platform_ops->remove_device(pdev->devfn, pdev);
+}
+
+/*
+ * If the device isn't owned by dom0, it means it already
+ * has been assigned to other domain, or it doesn't exist.
+ */
+static int device_assigned(u16 seg, u8 bus, u8 devfn)
+{
+    struct pci_dev *pdev = NULL;
+
+    spin_lock(&pcidevs_lock);
+    pdev = pci_get_pdev_by_domain(dom0, seg, bus, devfn);
+    spin_unlock(&pcidevs_lock);
+
+    return pdev ? 0 : -EBUSY;
+}
+
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    struct pci_dev *pdev;
+    int rc = 0;
+
+    if ( !iommu_enabled || !hd->platform_ops )
+        return 0;
+
+    /* Prevent device assign if mem paging or mem sharing have been 
+     * enabled for this domain */
+    if ( unlikely(!need_iommu(d) &&
+            (d->arch.hvm_domain.mem_sharing_enabled ||
+             d->mem_event->paging.ring_page)) )
+        return -EXDEV;
+
+    if ( !spin_trylock(&pcidevs_lock) )
+        return -ERESTART;
+
+    if ( need_iommu(d) <= 0 )
+    {
+        if ( !iommu_use_hap_pt(d) )
+        {
+            rc = iommu_populate_page_table(d);
+            if ( rc )
+            {
+                spin_unlock(&pcidevs_lock);
+                return rc;
+            }
+        }
+        d->need_iommu = 1;
+    }
+
+    pdev = pci_get_pdev_by_domain(dom0, seg, bus, devfn);
+    if ( !pdev )
+    {
+        rc = pci_get_pdev(seg, bus, devfn) ? -EBUSY : -ENODEV;
+        goto done;
+    }
+
+    pdev->fault.count = 0;
+
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pdev)) )
+        goto done;
+
+    for ( ; pdev->phantom_stride; rc = 0 )
+    {
+        devfn += pdev->phantom_stride;
+        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
+            break;
+        rc = hd->platform_ops->assign_device(d, devfn, pdev);
+        if ( rc )
+            printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
+                   d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                   rc);
+    }
+
+ done:
+    if ( !has_arch_pdevs(d) && need_iommu(d) )
+        iommu_teardown(d);
+    spin_unlock(&pcidevs_lock);
+
+    return rc;
+}
+
+/* caller should hold the pcidevs_lock */
+int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    struct pci_dev *pdev = NULL;
+    int ret = 0;
+
+    if ( !iommu_enabled || !hd->platform_ops )
+        return -EINVAL;
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
+    if ( !pdev )
+        return -ENODEV;
+
+    while ( pdev->phantom_stride )
+    {
+        devfn += pdev->phantom_stride;
+        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
+            break;
+        ret = hd->platform_ops->reassign_device(d, dom0, devfn, pdev);
+        if ( !ret )
+            continue;
+
+        printk(XENLOG_G_ERR "d%d: deassign %04x:%02x:%02x.%u failed (%d)\n",
+               d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
+        return ret;
+    }
+
+    devfn = pdev->devfn;
+    ret = hd->platform_ops->reassign_device(d, dom0, devfn, pdev);
+    if ( ret )
+    {
+        dprintk(XENLOG_G_ERR,
+                "d%d: deassign device (%04x:%02x:%02x.%u) failed\n",
+                d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+        return ret;
+    }
+
+    pdev->fault.count = 0;
+
+    if ( !has_arch_pdevs(d) && need_iommu(d) )
+        iommu_teardown(d);
+
+    return ret;
+}
+
+static int iommu_get_device_group(
+    struct domain *d, u16 seg, u8 bus, u8 devfn,
+    XEN_GUEST_HANDLE_64(uint32) buf, int max_sdevs)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    struct pci_dev *pdev;
+    int group_id, sdev_id;
+    u32 bdf;
+    int i = 0;
+    const struct iommu_ops *ops = hd->platform_ops;
+
+    if ( !iommu_enabled || !ops || !ops->get_device_group_id )
+        return 0;
+
+    group_id = ops->get_device_group_id(seg, bus, devfn);
+
+    spin_lock(&pcidevs_lock);
+    for_each_pdev( d, pdev )
+    {
+        if ( (pdev->seg != seg) ||
+             ((pdev->bus == bus) && (pdev->devfn == devfn)) )
+            continue;
+
+        if ( xsm_get_device_group(XSM_HOOK, (seg << 16) | (pdev->bus << 8) | pdev->devfn) )
+            continue;
+
+        sdev_id = ops->get_device_group_id(seg, pdev->bus, pdev->devfn);
+        if ( (sdev_id == group_id) && (i < max_sdevs) )
+        {
+            bdf = 0;
+            bdf |= (pdev->bus & 0xff) << 16;
+            bdf |= (pdev->devfn & 0xff) << 8;
+
+            if ( unlikely(copy_to_guest_offset(buf, i, &bdf, 1)) )
+            {
+                spin_unlock(&pcidevs_lock);
+                return -1;
+            }
+            i++;
+        }
+    }
+
+    spin_unlock(&pcidevs_lock);
+
+    return i;
+}
+
+int iommu_do_pci_domctl(
+    struct xen_domctl *domctl, struct domain *d,
+    XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+    u16 seg;
+    u8 bus, devfn;
+    int ret = 0;
+
+    switch ( domctl->cmd )
+    {
+    case XEN_DOMCTL_get_device_group:
+    {
+        u32 max_sdevs;
+        XEN_GUEST_HANDLE_64(uint32) sdevs;
+
+        ret = xsm_get_device_group(XSM_HOOK, domctl->u.get_device_group.machine_sbdf);
+        if ( ret )
+            break;
+
+        seg = domctl->u.get_device_group.machine_sbdf >> 16;
+        bus = (domctl->u.get_device_group.machine_sbdf >> 8) & 0xff;
+        devfn = domctl->u.get_device_group.machine_sbdf & 0xff;
+        max_sdevs = domctl->u.get_device_group.max_sdevs;
+        sdevs = domctl->u.get_device_group.sdev_array;
+
+        ret = iommu_get_device_group(d, seg, bus, devfn, sdevs, max_sdevs);
+        if ( ret < 0 )
+        {
+            dprintk(XENLOG_ERR, "iommu_get_device_group() failed!\n");
+            ret = -EFAULT;
+            domctl->u.get_device_group.num_sdevs = 0;
+        }
+        else
+        {
+            domctl->u.get_device_group.num_sdevs = ret;
+            ret = 0;
+        }
+        if ( __copy_field_to_guest(u_domctl, domctl, u.get_device_group) )
+            ret = -EFAULT;
+    }
+    break;
+
+    case XEN_DOMCTL_test_assign_device:
+        ret = xsm_test_assign_device(XSM_HOOK, domctl->u.assign_device.machine_sbdf);
+        if ( ret )
+            break;
+
+        seg = domctl->u.assign_device.machine_sbdf >> 16;
+        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
+        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
+
+        if ( device_assigned(seg, bus, devfn) )
+        {
+            printk(XENLOG_G_INFO
+                   "%04x:%02x:%02x.%u already assigned, or non-existent\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+            ret = -EINVAL;
+        }
+        break;
+
+    case XEN_DOMCTL_assign_device:
+        if ( unlikely(d->is_dying) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        ret = xsm_assign_device(XSM_HOOK, d, domctl->u.assign_device.machine_sbdf);
+        if ( ret )
+            break;
+
+        seg = domctl->u.assign_device.machine_sbdf >> 16;
+        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
+        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
+
+        ret = device_assigned(seg, bus, devfn) ?:
+              assign_device(d, seg, bus, devfn);
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                "h", u_domctl);
+        else if ( ret )
+            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
+                   "assign %04x:%02x:%02x.%u to dom%d failed (%d)\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                   d->domain_id, ret);
+
+        break;
+
+    case XEN_DOMCTL_deassign_device:
+        ret = xsm_deassign_device(XSM_HOOK, d, domctl->u.assign_device.machine_sbdf);
+        if ( ret )
+            break;
+
+        seg = domctl->u.assign_device.machine_sbdf >> 16;
+        bus = (domctl->u.assign_device.machine_sbdf >> 8) & 0xff;
+        devfn = domctl->u.assign_device.machine_sbdf & 0xff;
+
+        spin_lock(&pcidevs_lock);
+        ret = deassign_device(d, seg, bus, devfn);
+        spin_unlock(&pcidevs_lock);
+        if ( ret )
+            printk(XENLOG_G_ERR
+                   "deassign %04x:%02x:%02x.%u from dom%d failed (%d)\n",
+                   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                   d->domain_id, ret);
+
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough/x86/Makefile
index c124a51..a70cf94 100644
--- a/xen/drivers/passthrough/x86/Makefile
+++ b/xen/drivers/passthrough/x86/Makefile
@@ -1 +1,2 @@
 obj-y += ats.o
+obj-y += iommu.o
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
new file mode 100644
index 0000000..c857ba8
--- /dev/null
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -0,0 +1,50 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/sched.h>
+#include <xen/iommu.h>
+#include <xen/paging.h>
+#include <xen/guest_access.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xsm/xsm.h>
+
+void iommu_update_ire_from_apic(
+    unsigned int apic, unsigned int reg, unsigned int value)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    ops->update_ire_from_apic(apic, reg, value);
+}
+
+unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    return ops->read_apic_from_ire(apic, reg);
+}
+
+int __init iommu_setup_hpet_msi(struct msi_desc *msi)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    return ops->setup_hpet_msi ? ops->setup_hpet_msi(msi) : -ENODEV;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
new file mode 100644
index 0000000..946291c
--- /dev/null
+++ b/xen/include/asm-x86/iommu.h
@@ -0,0 +1,42 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+*/
+#ifndef __ARCH_X86_IOMMU_H__
+#define __ARCH_X86_IOMMU_H__
+
+#define MAX_IOMMUS 32
+
+#include <asm/msi.h>
+
+void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
+unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
+int iommu_setup_hpet_msi(struct msi_desc *);
+
+/* While VT-d specific, this must get declared in a generic header. */
+int adjust_vtd_irq_affinities(void);
+void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, int order, int present);
+int iommu_supports_eim(void);
+int iommu_enable_x2apic_IR(void);
+void iommu_disable_x2apic_IR(void);
+void iommu_set_dom0_mapping(struct domain *d);
+
+#endif /* !__ARCH_X86_IOMMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
index 6ab684e..c9c10c1 100644
--- a/xen/include/xen/hvm/iommu.h
+++ b/xen/include/xen/hvm/iommu.h
@@ -21,6 +21,7 @@
 #define __XEN_HVM_IOMMU_H__
 
 #include <xen/iommu.h>
+#include <asm/hvm/iommu.h>
 
 struct g2m_ioport {
     struct list_head list;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 4f534ed..cf61d163 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -25,6 +25,7 @@
 #include <xen/pci.h>
 #include <public/hvm/ioreq.h>
 #include <public/domctl.h>
+#include <asm/iommu.h>
 
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
@@ -39,17 +40,12 @@ extern bool_t amd_iommu_perdev_intremap;
 
 #define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
 
-#define MAX_IOMMUS 32
-
 #define PAGE_SHIFT_4K       (12)
 #define PAGE_SIZE_4K        (1UL << PAGE_SHIFT_4K)
 #define PAGE_MASK_4K        (((u64)-1) << PAGE_SHIFT_4K)
 #define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
 
 int iommu_setup(void);
-int iommu_supports_eim(void);
-int iommu_enable_x2apic_IR(void);
-void iommu_disable_x2apic_IR(void);
 
 int iommu_add_device(struct pci_dev *pdev);
 int iommu_enable_device(struct pci_dev *pdev);
@@ -59,6 +55,9 @@ void iommu_dom0_init(struct domain *d);
 void iommu_domain_destroy(struct domain *d);
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn);
 
+/* Function used internally, use iommu_domain_destroy */
+void iommu_teardown(struct domain *d);
+
 /* iommu_map_page() takes flags to direct the mapping operation. */
 #define _IOMMUF_readable 0
 #define IOMMUF_readable  (1u<<_IOMMUF_readable)
@@ -67,8 +66,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn);
 int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
                    unsigned int flags);
 int iommu_unmap_page(struct domain *d, unsigned long gfn);
-void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, int order, int present);
 
+#ifdef HAS_PCI
 void pt_pci_init(void);
 
 struct pirq;
@@ -82,32 +81,43 @@ struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
 void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
 bool_t pt_irq_need_timer(uint32_t flags);
 
+int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg *msg);
+void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg *msg);
+
 #define PT_IRQ_TIME_OUT MILLISECS(8)
+#endif /* HAS_PCI */
 
+#ifdef HAS_PCI
 struct msi_desc;
 struct msi_msg;
+#endif /* HAS_PCI */
+
 struct page_info;
 
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*dom0_init)(struct domain *d);
+#ifdef HAS_PCI
     int (*add_device)(u8 devfn, struct pci_dev *);
     int (*enable_device)(struct pci_dev *pdev);
     int (*remove_device)(u8 devfn, struct pci_dev *);
     int (*assign_device)(struct domain *, u8 devfn, struct pci_dev *);
+    int (*reassign_device)(struct domain *s, struct domain *t,
+			   u8 devfn, struct pci_dev *);
+    int (*get_device_group_id)(u16 seg, u8 bus, u8 devfn);
+    int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *msg);
+    void (*read_msi_from_ire)(struct msi_desc *msi_desc, struct msi_msg *msg);
+#endif /* HAS_PCI */
     void (*teardown)(struct domain *d);
     int (*map_page)(struct domain *d, unsigned long gfn, unsigned long mfn,
                     unsigned int flags);
     int (*unmap_page)(struct domain *d, unsigned long gfn);
     void (*free_page_table)(struct page_info *);
-    int (*reassign_device)(struct domain *s, struct domain *t,
-			   u8 devfn, struct pci_dev *);
-    int (*get_device_group_id)(u16 seg, u8 bus, u8 devfn);
+#ifdef CONFIG_X86
     void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
-    int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *msg);
-    void (*read_msi_from_ire)(struct msi_desc *msi_desc, struct msi_msg *msg);
     unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg);
     int (*setup_hpet_msi)(struct msi_desc *);
+#endif /* CONFIG_X86 */
     void (*suspend)(void);
     void (*resume)(void);
     void (*share_p2m)(struct domain *d);
@@ -117,28 +127,23 @@ struct iommu_ops {
     void (*dump_p2m_table)(struct domain *d);
 };
 
-void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
-int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg *msg);
-void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg *msg);
-unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
-int iommu_setup_hpet_msi(struct msi_desc *);
-
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
 
-void iommu_set_dom0_mapping(struct domain *d);
 void iommu_share_p2m_table(struct domain *d);
 
+#ifdef HAS_PCI
+int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
+                        XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
+#endif
+
 int iommu_do_domctl(struct xen_domctl *, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
 
 void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count);
 void iommu_iotlb_flush_all(struct domain *d);
 
-/* While VT-d specific, this must get declared in a generic header. */
-int adjust_vtd_irq_affinities(void);
-
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
  * avoid unecessary iotlb_flush in the low level IOMMU code.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (5 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 16:15   ` Julien Grall
                     ` (2 more replies)
  2014-03-11 15:49 ` [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment Julien Grall
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Joseph Cihula, Keir Fraser, ian.campbell, Shane Wang,
	Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Suravee Suthikulpanit, Gang Wei, Xiantao Zhang

Currently the structure hvm_iommu (xen/include/xen/hvm/iommu.h) contains
x86 specific fields.

This patch creates:
    - arch_hvm_iommu structure which will contain architecture depend
    fields
    - arch_iommu_domain_{init,destroy} function to execute arch
    specific during domain creation/destruction

Also move iommu_use_hap_pt and domain_hvm_iommu in asm-x86/iommu.h.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Joseph Cihula <joseph.cihula@intel.com>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
---
 xen/arch/x86/domctl.c                       |    6 +-
 xen/arch/x86/hvm/io.c                       |    2 +-
 xen/arch/x86/tboot.c                        |    3 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |    8 +--
 xen/drivers/passthrough/amd/iommu_map.c     |   54 +++++++++---------
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   49 ++++++++--------
 xen/drivers/passthrough/iommu.c             |   28 +++-------
 xen/drivers/passthrough/vtd/iommu.c         |   80 +++++++++++++--------------
 xen/drivers/passthrough/x86/iommu.c         |   41 ++++++++++++++
 xen/include/asm-x86/hvm/iommu.h             |   28 ++++++++++
 xen/include/asm-x86/iommu.h                 |    4 +-
 xen/include/xen/hvm/iommu.h                 |   25 +--------
 xen/include/xen/iommu.h                     |   16 +++---
 13 files changed, 190 insertions(+), 154 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 26635ff..e55d9d5 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -745,7 +745,7 @@ long arch_do_domctl(
                    "ioport_map:add: dom%d gport=%x mport=%x nr=%x\n",
                    d->domain_id, fgp, fmp, np);
 
-            list_for_each_entry(g2m_ioport, &hd->g2m_ioport_list, list)
+            list_for_each_entry(g2m_ioport, &hd->arch.g2m_ioport_list, list)
                 if (g2m_ioport->mport == fmp )
                 {
                     g2m_ioport->gport = fgp;
@@ -764,7 +764,7 @@ long arch_do_domctl(
                 g2m_ioport->gport = fgp;
                 g2m_ioport->mport = fmp;
                 g2m_ioport->np = np;
-                list_add_tail(&g2m_ioport->list, &hd->g2m_ioport_list);
+                list_add_tail(&g2m_ioport->list, &hd->arch.g2m_ioport_list);
             }
             if ( !ret )
                 ret = ioports_permit_access(d, fmp, fmp + np - 1);
@@ -779,7 +779,7 @@ long arch_do_domctl(
             printk(XENLOG_G_INFO
                    "ioport_map:remove: dom%d gport=%x mport=%x nr=%x\n",
                    d->domain_id, fgp, fmp, np);
-            list_for_each_entry(g2m_ioport, &hd->g2m_ioport_list, list)
+            list_for_each_entry(g2m_ioport, &hd->arch.g2m_ioport_list, list)
                 if ( g2m_ioport->mport == fmp )
                 {
                     list_del(&g2m_ioport->list);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..ddb03f8 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -451,7 +451,7 @@ int dpci_ioport_intercept(ioreq_t *p)
     unsigned int s = 0, e = 0;
     int rc;
 
-    list_for_each_entry( g2m_ioport, &hd->g2m_ioport_list, list )
+    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
     {
         s = g2m_ioport->gport;
         e = s + g2m_ioport->np;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index ccde4a0..c40fe12 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -230,7 +230,8 @@ static void tboot_gen_domain_integrity(const uint8_t key[TB_KEY_SIZE],
         if ( !is_idle_domain(d) )
         {
             struct hvm_iommu *hd = domain_hvm_iommu(d);
-            update_iommu_mac(&ctx, hd->pgd_maddr, agaw_to_level(hd->agaw));
+            update_iommu_mac(&ctx, hd->arch.pgd_maddr,
+                             agaw_to_level(hd->arch.agaw));
         }
     }
 
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 477de20..bd31bb5 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -60,12 +60,12 @@ static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf)
 
 static inline struct guest_iommu *domain_iommu(struct domain *d)
 {
-    return domain_hvm_iommu(d)->g_iommu;
+    return domain_hvm_iommu(d)->arch.g_iommu;
 }
 
 static inline struct guest_iommu *vcpu_iommu(struct vcpu *v)
 {
-    return domain_hvm_iommu(v->domain)->g_iommu;
+    return domain_hvm_iommu(v->domain)->arch.g_iommu;
 }
 
 static void guest_iommu_enable(struct guest_iommu *iommu)
@@ -886,7 +886,7 @@ int guest_iommu_init(struct domain* d)
 
     guest_iommu_reg_init(iommu);
     iommu->domain = d;
-    hd->g_iommu = iommu;
+    hd->arch.g_iommu = iommu;
 
     tasklet_init(&iommu->cmd_buffer_tasklet,
                  guest_iommu_process_command, (unsigned long)d);
@@ -907,7 +907,7 @@ void guest_iommu_destroy(struct domain *d)
     tasklet_kill(&iommu->cmd_buffer_tasklet);
     xfree(iommu);
 
-    domain_hvm_iommu(d)->g_iommu = NULL;
+    domain_hvm_iommu(d)->arch.g_iommu = NULL;
 }
 
 static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index b79e470..ceb1c28 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -344,7 +344,7 @@ static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     bool_t ok = 0;
 
-    ASSERT( spin_is_locked(&hd->mapping_lock) && pt_mfn );
+    ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
 
     next_level = merge_level - 1;
 
@@ -398,7 +398,7 @@ static int iommu_merge_pages(struct domain *d, unsigned long pt_mfn,
     unsigned long first_mfn;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    ASSERT( spin_is_locked(&hd->mapping_lock) && pt_mfn );
+    ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
 
     table = map_domain_page(pt_mfn);
     pde = table + pfn_to_pde_idx(gfn, merge_level);
@@ -448,8 +448,8 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn,
     struct page_info *table;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    table = hd->root_table;
-    level = hd->paging_mode;
+    table = hd->arch.root_table;
+    level = hd->arch.paging_mode;
 
     BUG_ON( table == NULL || level < IOMMU_PAGING_MODE_LEVEL_1 || 
             level > IOMMU_PAGING_MODE_LEVEL_6 );
@@ -557,11 +557,11 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
     unsigned long old_root_mfn;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    level = hd->paging_mode;
-    old_root = hd->root_table;
+    level = hd->arch.paging_mode;
+    old_root = hd->arch.root_table;
     offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
 
-    ASSERT(spin_is_locked(&hd->mapping_lock) && is_hvm_domain(d));
+    ASSERT(spin_is_locked(&hd->arch.mapping_lock) && is_hvm_domain(d));
 
     while ( offset >= PTE_PER_TABLE_SIZE )
     {
@@ -587,8 +587,8 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
 
     if ( new_root != NULL )
     {
-        hd->paging_mode = level;
-        hd->root_table = new_root;
+        hd->arch.paging_mode = level;
+        hd->arch.root_table = new_root;
 
         if ( !spin_is_locked(&pcidevs_lock) )
             AMD_IOMMU_DEBUG("%s Try to access pdev_list "
@@ -613,9 +613,9 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
 
                 /* valid = 0 only works for dom0 passthrough mode */
                 amd_iommu_set_root_page_table((u32 *)device_entry,
-                                              page_to_maddr(hd->root_table),
+                                              page_to_maddr(hd->arch.root_table),
                                               d->domain_id,
-                                              hd->paging_mode, 1);
+                                              hd->arch.paging_mode, 1);
 
                 amd_iommu_flush_device(iommu, req_id);
                 bdf += pdev->phantom_stride;
@@ -638,14 +638,14 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     unsigned long pt_mfn[7];
     unsigned int merge_level;
 
-    BUG_ON( !hd->root_table );
+    BUG_ON( !hd->arch.root_table );
 
     if ( iommu_use_hap_pt(d) )
         return 0;
 
     memset(pt_mfn, 0, sizeof(pt_mfn));
 
-    spin_lock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
 
     /* Since HVM domain is initialized with 2 level IO page table,
      * we might need a deeper page table for lager gfn now */
@@ -653,7 +653,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     {
         if ( update_paging_mode(d, gfn) )
         {
-            spin_unlock(&hd->mapping_lock);
+            spin_unlock(&hd->arch.mapping_lock);
             AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
             domain_crash(d);
             return -EFAULT;
@@ -662,7 +662,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
 
     if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
         domain_crash(d);
         return -EFAULT;
@@ -684,7 +684,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
         amd_iommu_flush_pages(d, gfn, 0);
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
-          merge_level <= hd->paging_mode; merge_level++ )
+          merge_level <= hd->arch.paging_mode; merge_level++ )
     {
         if ( pt_mfn[merge_level] == 0 )
             break;
@@ -697,7 +697,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
         if ( iommu_merge_pages(d, pt_mfn[merge_level], gfn, 
                                flags, merge_level) )
         {
-            spin_unlock(&hd->mapping_lock);
+            spin_unlock(&hd->arch.mapping_lock);
             AMD_IOMMU_DEBUG("Merge iommu page failed at level %d, "
                             "gfn = %lx mfn = %lx\n", merge_level, gfn, mfn);
             domain_crash(d);
@@ -706,7 +706,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     }
 
 out:
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
     return 0;
 }
 
@@ -715,14 +715,14 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
     unsigned long pt_mfn[7];
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    BUG_ON( !hd->root_table );
+    BUG_ON( !hd->arch.root_table );
 
     if ( iommu_use_hap_pt(d) )
         return 0;
 
     memset(pt_mfn, 0, sizeof(pt_mfn));
 
-    spin_lock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
 
     /* Since HVM domain is initialized with 2 level IO page table,
      * we might need a deeper page table for lager gfn now */
@@ -730,7 +730,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
     {
         if ( update_paging_mode(d, gfn) )
         {
-            spin_unlock(&hd->mapping_lock);
+            spin_unlock(&hd->arch.mapping_lock);
             AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
             domain_crash(d);
             return -EFAULT;
@@ -739,7 +739,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
 
     if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
         domain_crash(d);
         return -EFAULT;
@@ -747,7 +747,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
 
     /* mark PTE as 'page not present' */
     clear_iommu_pte_present(pt_mfn[1], gfn);
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
 
     amd_iommu_flush_pages(d, gfn, 0);
 
@@ -792,13 +792,13 @@ void amd_iommu_share_p2m(struct domain *d)
     pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
     p2m_table = mfn_to_page(mfn_x(pgd_mfn));
 
-    if ( hd->root_table != p2m_table )
+    if ( hd->arch.root_table != p2m_table )
     {
-        free_amd_iommu_pgtable(hd->root_table);
-        hd->root_table = p2m_table;
+        free_amd_iommu_pgtable(hd->arch.root_table);
+        hd->arch.root_table = p2m_table;
 
         /* When sharing p2m with iommu, paging mode = 4 */
-        hd->paging_mode = IOMMU_PAGING_MODE_LEVEL_4;
+        hd->arch.paging_mode = IOMMU_PAGING_MODE_LEVEL_4;
         AMD_IOMMU_DEBUG("Share p2m table with iommu: p2m table = %#lx\n",
                         mfn_x(pgd_mfn));
     }
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 79f4a77..aeefabb 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -120,7 +120,8 @@ static void amd_iommu_setup_domain_device(
 
     struct hvm_iommu *hd = domain_hvm_iommu(domain);
 
-    BUG_ON( !hd->root_table || !hd->paging_mode || !iommu->dev_table.buffer );
+    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
+            !iommu->dev_table.buffer );
 
     if ( iommu_passthrough && (domain->domain_id == 0) )
         valid = 0;
@@ -138,8 +139,8 @@ static void amd_iommu_setup_domain_device(
     {
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
-            (u32 *)dte, page_to_maddr(hd->root_table), domain->domain_id,
-            hd->paging_mode, valid);
+            (u32 *)dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
+            hd->arch.paging_mode, valid);
 
         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
              iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
@@ -151,8 +152,8 @@ static void amd_iommu_setup_domain_device(
                         "root table = %#"PRIx64", "
                         "domain = %d, paging mode = %d\n",
                         req_id, pdev->type,
-                        page_to_maddr(hd->root_table),
-                        domain->domain_id, hd->paging_mode);
+                        page_to_maddr(hd->arch.root_table),
+                        domain->domain_id, hd->arch.paging_mode);
     }
 
     spin_unlock_irqrestore(&iommu->lock, flags);
@@ -225,17 +226,17 @@ int __init amd_iov_detect(void)
 static int allocate_domain_resources(struct hvm_iommu *hd)
 {
     /* allocate root table */
-    spin_lock(&hd->mapping_lock);
-    if ( !hd->root_table )
+    spin_lock(&hd->arch.mapping_lock);
+    if ( !hd->arch.root_table )
     {
-        hd->root_table = alloc_amd_iommu_pgtable();
-        if ( !hd->root_table )
+        hd->arch.root_table = alloc_amd_iommu_pgtable();
+        if ( !hd->arch.root_table )
         {
-            spin_unlock(&hd->mapping_lock);
+            spin_unlock(&hd->arch.mapping_lock);
             return -ENOMEM;
         }
     }
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
     return 0;
 }
 
@@ -262,14 +263,14 @@ static int amd_iommu_domain_init(struct domain *d)
     /* allocate page directroy */
     if ( allocate_domain_resources(hd) != 0 )
     {
-        if ( hd->root_table )
-            free_domheap_page(hd->root_table);
+        if ( hd->arch.root_table )
+            free_domheap_page(hd->arch.root_table);
         return -ENOMEM;
     }
 
     /* For pv and dom0, stick with get_paging_mode(max_page)
      * For HVM dom0, use 2 level page table at first */
-    hd->paging_mode = is_hvm_domain(d) ?
+    hd->arch.paging_mode = is_hvm_domain(d) ?
                       IOMMU_PAGING_MODE_LEVEL_2 :
                       get_paging_mode(max_page);
 
@@ -332,7 +333,7 @@ void amd_iommu_disable_domain_device(struct domain *domain,
         AMD_IOMMU_DEBUG("Disable: device id = %#x, "
                         "domain = %d, paging mode = %d\n",
                         req_id,  domain->domain_id,
-                        domain_hvm_iommu(domain)->paging_mode);
+                        domain_hvm_iommu(domain)->arch.paging_mode);
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
 
@@ -372,7 +373,7 @@ static int reassign_device(struct domain *source, struct domain *target,
 
     /* IO page tables might be destroyed after pci-detach the last device
      * In this case, we have to re-allocate root table for next pci-attach.*/
-    if ( t->root_table == NULL )
+    if ( t->arch.root_table == NULL )
         allocate_domain_resources(t);
 
     amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
@@ -454,13 +455,13 @@ static void deallocate_iommu_page_tables(struct domain *d)
     if ( iommu_use_hap_pt(d) )
         return;
 
-    spin_lock(&hd->mapping_lock);
-    if ( hd->root_table )
+    spin_lock(&hd->arch.mapping_lock);
+    if ( hd->arch.root_table )
     {
-        deallocate_next_page_table(hd->root_table, hd->paging_mode);
-        hd->root_table = NULL;
+        deallocate_next_page_table(hd->arch.root_table, hd->arch.paging_mode);
+        hd->arch.root_table = NULL;
     }
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
 }
 
 
@@ -591,11 +592,11 @@ static void amd_dump_p2m_table(struct domain *d)
 {
     struct hvm_iommu *hd  = domain_hvm_iommu(d);
 
-    if ( !hd->root_table ) 
+    if ( !hd->arch.root_table ) 
         return;
 
-    printk("p2m table has %d levels\n", hd->paging_mode);
-    amd_dump_p2m_table_level(hd->root_table, hd->paging_mode, 0, 0);
+    printk("p2m table has %d levels\n", hd->arch.paging_mode);
+    amd_dump_p2m_table_level(hd->arch.root_table, hd->arch.paging_mode, 0, 0);
 }
 
 const struct iommu_ops amd_iommu_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 8a2fdea..9cd996a 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -117,10 +117,11 @@ static void __init parse_iommu_param(char *s)
 int iommu_domain_init(struct domain *d)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
+    int ret = 0;
 
-    spin_lock_init(&hd->mapping_lock);
-    INIT_LIST_HEAD(&hd->g2m_ioport_list);
-    INIT_LIST_HEAD(&hd->mapped_rmrrs);
+    ret = arch_iommu_domain_init(d);
+    if ( ret )
+        return ret;
 
     if ( !iommu_enabled )
         return 0;
@@ -189,10 +190,7 @@ void iommu_teardown(struct domain *d)
 
 void iommu_domain_destroy(struct domain *d)
 {
-    struct hvm_iommu *hd  = domain_hvm_iommu(d);
-    struct list_head *ioport_list, *rmrr_list, *tmp;
-    struct g2m_ioport *ioport;
-    struct mapped_rmrr *mrmrr;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
 
     if ( !iommu_enabled || !hd->platform_ops )
         return;
@@ -200,20 +198,8 @@ void iommu_domain_destroy(struct domain *d)
     if ( need_iommu(d) )
         iommu_teardown(d);
 
-    list_for_each_safe ( ioport_list, tmp, &hd->g2m_ioport_list )
-    {
-        ioport = list_entry(ioport_list, struct g2m_ioport, list);
-        list_del(&ioport->list);
-        xfree(ioport);
-    }
-
-    list_for_each_safe ( rmrr_list, tmp, &hd->mapped_rmrrs )
-    {
-        mrmrr = list_entry(rmrr_list, struct mapped_rmrr, list);
-        list_del(&mrmrr->list);
-        xfree(mrmrr);
-    }
-}
+    arch_iommu_domain_destroy(d);
+ }
 
 int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
                    unsigned int flags)
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d4be75c..8efe6f9 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -248,16 +248,16 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
     struct acpi_drhd_unit *drhd;
     struct pci_dev *pdev;
     struct hvm_iommu *hd = domain_hvm_iommu(domain);
-    int addr_width = agaw_to_width(hd->agaw);
+    int addr_width = agaw_to_width(hd->arch.agaw);
     struct dma_pte *parent, *pte = NULL;
-    int level = agaw_to_level(hd->agaw);
+    int level = agaw_to_level(hd->arch.agaw);
     int offset;
     u64 pte_maddr = 0, maddr;
     u64 *vaddr = NULL;
 
     addr &= (((u64)1) << addr_width) - 1;
-    ASSERT(spin_is_locked(&hd->mapping_lock));
-    if ( hd->pgd_maddr == 0 )
+    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
+    if ( hd->arch.pgd_maddr == 0 )
     {
         /*
          * just get any passthrough device in the domainr - assume user
@@ -265,11 +265,11 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
          */
         pdev = pci_get_pdev_by_domain(domain, -1, -1, -1);
         drhd = acpi_find_matched_drhd_unit(pdev);
-        if ( !alloc || ((hd->pgd_maddr = alloc_pgtable_maddr(drhd, 1)) == 0) )
+        if ( !alloc || ((hd->arch.pgd_maddr = alloc_pgtable_maddr(drhd, 1)) == 0) )
             goto out;
     }
 
-    parent = (struct dma_pte *)map_vtd_domain_page(hd->pgd_maddr);
+    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
     while ( level > 1 )
     {
         offset = address_level_offset(addr, level);
@@ -579,7 +579,7 @@ static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
     {
         iommu = drhd->iommu;
 
-        if ( !test_bit(iommu->index, &hd->iommu_bitmap) )
+        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
             continue;
 
         flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
@@ -621,12 +621,12 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
     u64 pg_maddr;
     struct mapped_rmrr *mrmrr;
 
-    spin_lock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
     /* get last level pte */
     pg_maddr = addr_to_dma_page_maddr(domain, addr, 0);
     if ( pg_maddr == 0 )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         return;
     }
 
@@ -635,13 +635,13 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
 
     if ( !dma_pte_present(*pte) )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         unmap_vtd_domain_page(page);
         return;
     }
 
     dma_clear_pte(*pte);
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
     iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
@@ -652,8 +652,8 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
     /* if the cleared address is between mapped RMRR region,
      * remove the mapped RMRR
      */
-    spin_lock(&hd->mapping_lock);
-    list_for_each_entry ( mrmrr, &hd->mapped_rmrrs, list )
+    spin_lock(&hd->arch.mapping_lock);
+    list_for_each_entry ( mrmrr, &hd->arch.mapped_rmrrs, list )
     {
         if ( addr >= mrmrr->base && addr <= mrmrr->end )
         {
@@ -662,7 +662,7 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
             break;
         }
     }
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
 }
 
 static void iommu_free_pagetable(u64 pt_maddr, int level)
@@ -1247,7 +1247,7 @@ static int intel_iommu_domain_init(struct domain *d)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
-    hd->agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    hd->arch.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
 
     return 0;
 }
@@ -1344,16 +1344,16 @@ int domain_context_mapping_one(
     }
     else
     {
-        spin_lock(&hd->mapping_lock);
+        spin_lock(&hd->arch.mapping_lock);
 
         /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->pgd_maddr == 0 )
+        if ( hd->arch.pgd_maddr == 0 )
         {
             addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->pgd_maddr == 0 )
+            if ( hd->arch.pgd_maddr == 0 )
             {
             nomem:
-                spin_unlock(&hd->mapping_lock);
+                spin_unlock(&hd->arch.mapping_lock);
                 spin_unlock(&iommu->lock);
                 unmap_vtd_domain_page(context_entries);
                 return -ENOMEM;
@@ -1361,7 +1361,7 @@ int domain_context_mapping_one(
         }
 
         /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->pgd_maddr;
+        pgd_maddr = hd->arch.pgd_maddr;
         for ( agaw = level_to_agaw(4);
               agaw != level_to_agaw(iommu->nr_pt_levels);
               agaw-- )
@@ -1379,7 +1379,7 @@ int domain_context_mapping_one(
         else
             context_set_translation_type(*context, CONTEXT_TT_MULTI_LEVEL);
 
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
     }
 
     if ( context_set_domain_id(context, domain, iommu) )
@@ -1405,7 +1405,7 @@ int domain_context_mapping_one(
         iommu_flush_iotlb_dsi(iommu, 0, 1, flush_dev_iotlb);
     }
 
-    set_bit(iommu->index, &hd->iommu_bitmap);
+    set_bit(iommu->index, &hd->arch.iommu_bitmap);
 
     unmap_vtd_domain_page(context_entries);
 
@@ -1648,7 +1648,7 @@ static int domain_context_unmap(
         struct hvm_iommu *hd = domain_hvm_iommu(domain);
         int iommu_domid;
 
-        clear_bit(iommu->index, &hd->iommu_bitmap);
+        clear_bit(iommu->index, &hd->arch.iommu_bitmap);
 
         iommu_domid = domain_iommu_domid(domain, iommu);
         if ( iommu_domid == -1 )
@@ -1707,10 +1707,10 @@ static void iommu_domain_teardown(struct domain *d)
     if ( iommu_use_hap_pt(d) )
         return;
 
-    spin_lock(&hd->mapping_lock);
-    iommu_free_pagetable(hd->pgd_maddr, agaw_to_level(hd->agaw));
-    hd->pgd_maddr = 0;
-    spin_unlock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
+    iommu_free_pagetable(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw));
+    hd->arch.pgd_maddr = 0;
+    spin_unlock(&hd->arch.mapping_lock);
 }
 
 static int intel_iommu_map_page(
@@ -1729,12 +1729,12 @@ static int intel_iommu_map_page(
     if ( iommu_passthrough && (d->domain_id == 0) )
         return 0;
 
-    spin_lock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
 
     pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1);
     if ( pg_maddr == 0 )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         return -ENOMEM;
     }
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
@@ -1751,14 +1751,14 @@ static int intel_iommu_map_page(
 
     if ( old.val == new.val )
     {
-        spin_unlock(&hd->mapping_lock);
+        spin_unlock(&hd->arch.mapping_lock);
         unmap_vtd_domain_page(page);
         return 0;
     }
     *pte = new;
 
     iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
-    spin_unlock(&hd->mapping_lock);
+    spin_unlock(&hd->arch.mapping_lock);
     unmap_vtd_domain_page(page);
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
@@ -1792,7 +1792,7 @@ void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
     for_each_drhd_unit ( drhd )
     {
         iommu = drhd->iommu;
-        if ( !test_bit(iommu->index, &hd->iommu_bitmap) )
+        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
             continue;
 
         flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
@@ -1833,7 +1833,7 @@ static void iommu_set_pgd(struct domain *d)
         return;
 
     pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    hd->pgd_maddr = pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
+    hd->arch.pgd_maddr = pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
 }
 
 static int rmrr_identity_mapping(struct domain *d,
@@ -1848,10 +1848,10 @@ static int rmrr_identity_mapping(struct domain *d,
     ASSERT(rmrr->base_address < rmrr->end_address);
 
     /*
-     * No need to acquire hd->mapping_lock, as the only theoretical race is
+     * No need to acquire hd->arch.mapping_lock, as the only theoretical race is
      * with the insertion below (impossible due to holding pcidevs_lock).
      */
-    list_for_each_entry( mrmrr, &hd->mapped_rmrrs, list )
+    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
     {
         if ( mrmrr->base == rmrr->base_address &&
              mrmrr->end == rmrr->end_address )
@@ -1876,9 +1876,9 @@ static int rmrr_identity_mapping(struct domain *d,
         return -ENOMEM;
     mrmrr->base = rmrr->base_address;
     mrmrr->end = rmrr->end_address;
-    spin_lock(&hd->mapping_lock);
-    list_add_tail(&mrmrr->list, &hd->mapped_rmrrs);
-    spin_unlock(&hd->mapping_lock);
+    spin_lock(&hd->arch.mapping_lock);
+    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
+    spin_unlock(&hd->arch.mapping_lock);
 
     return 0;
 }
@@ -2423,8 +2423,8 @@ static void vtd_dump_p2m_table(struct domain *d)
         return;
 
     hd = domain_hvm_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->agaw));
-    vtd_dump_p2m_table_level(hd->pgd_maddr, agaw_to_level(hd->agaw), 0, 0);
+    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.agaw));
+    vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 0, 0);
 }
 
 const struct iommu_ops intel_iommu_ops = {
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index c857ba8..68e308c 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -40,6 +40,47 @@ int __init iommu_setup_hpet_msi(struct msi_desc *msi)
     return ops->setup_hpet_msi ? ops->setup_hpet_msi(msi) : -ENODEV;
 }
 
+void iommu_share_p2m_table(struct domain* d)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( iommu_enabled && is_hvm_domain(d) )
+        ops->share_p2m(d);
+}
+
+int arch_iommu_domain_init(struct domain *d)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+
+    spin_lock_init(&hd->arch.mapping_lock);
+    INIT_LIST_HEAD(&hd->arch.g2m_ioport_list);
+    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
+
+    return 0;
+}
+
+void arch_iommu_domain_destroy(struct domain *d)
+{
+   struct hvm_iommu *hd  = domain_hvm_iommu(d);
+   struct list_head *ioport_list, *rmrr_list, *tmp;
+   struct g2m_ioport *ioport;
+   struct mapped_rmrr *mrmrr;
+
+   list_for_each_safe ( ioport_list, tmp, &hd->arch.g2m_ioport_list )
+   {
+       ioport = list_entry(ioport_list, struct g2m_ioport, list);
+       list_del(&ioport->list);
+       xfree(ioport);
+   }
+
+    list_for_each_safe ( rmrr_list, tmp, &hd->arch.mapped_rmrrs )
+    {
+        mrmrr = list_entry(rmrr_list, struct mapped_rmrr, list);
+        list_del(&mrmrr->list);
+        xfree(mrmrr);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/hvm/iommu.h b/xen/include/asm-x86/hvm/iommu.h
index d488edf..927a02d 100644
--- a/xen/include/asm-x86/hvm/iommu.h
+++ b/xen/include/asm-x86/hvm/iommu.h
@@ -39,4 +39,32 @@ static inline int iommu_hardware_setup(void)
     return 0;
 }
 
+struct g2m_ioport {
+    struct list_head list;
+    unsigned int gport;
+    unsigned int mport;
+    unsigned int np;
+};
+
+struct mapped_rmrr {
+    struct list_head list;
+    u64 base;
+    u64 end;
+};
+
+struct arch_hvm_iommu
+{
+    u64 pgd_maddr;                 /* io page directory machine address */
+    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
+    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
+    /* amd iommu support */
+    int paging_mode;
+    struct page_info *root_table;
+    struct guest_iommu *g_iommu;
+
+    struct list_head g2m_ioport_list;   /* guest to machine ioport mapping */
+    struct list_head mapped_rmrrs;
+    spinlock_t mapping_lock;            /* io page table lock */
+};
+
 #endif /* __ASM_X86_HVM_IOMMU_H__ */
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 946291c..dc06ceb 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -17,7 +17,9 @@
 
 #define MAX_IOMMUS 32
 
-#include <asm/msi.h>
+/* Does this domain have a P2M table we can use as its IOMMU pagetable? */
+#define iommu_use_hap_pt(d) (hap_enabled(d) && iommu_hap_pt_share)
+#define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
 
 void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
 unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
index c9c10c1..f8f8a93 100644
--- a/xen/include/xen/hvm/iommu.h
+++ b/xen/include/xen/hvm/iommu.h
@@ -23,31 +23,8 @@
 #include <xen/iommu.h>
 #include <asm/hvm/iommu.h>
 
-struct g2m_ioport {
-    struct list_head list;
-    unsigned int gport;
-    unsigned int mport;
-    unsigned int np;
-};
-
-struct mapped_rmrr {
-    struct list_head list;
-    u64 base;
-    u64 end;
-};
-
 struct hvm_iommu {
-    u64 pgd_maddr;                 /* io page directory machine address */
-    spinlock_t mapping_lock;       /* io page table lock */
-    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
-    struct list_head g2m_ioport_list;  /* guest to machine ioport mapping */
-    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
-    struct list_head mapped_rmrrs;
-
-    /* amd iommu support */
-    int paging_mode;
-    struct page_info *root_table;
-    struct guest_iommu *g_iommu;
+    struct arch_hvm_iommu arch;
 
     /* iommu_ops */
     const struct iommu_ops *platform_ops;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index cf61d163..f556a7e 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -35,11 +35,6 @@ extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
 
-/* Does this domain have a P2M table we can use as its IOMMU pagetable? */
-#define iommu_use_hap_pt(d) (hap_enabled(d) && iommu_hap_pt_share)
-
-#define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
-
 #define PAGE_SHIFT_4K       (12)
 #define PAGE_SIZE_4K        (1UL << PAGE_SHIFT_4K)
 #define PAGE_MASK_4K        (((u64)-1) << PAGE_SHIFT_4K)
@@ -55,6 +50,9 @@ void iommu_dom0_init(struct domain *d);
 void iommu_domain_destroy(struct domain *d);
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn);
 
+void arch_iommu_domain_destroy(struct domain *d);
+int arch_iommu_domain_init(struct domain *d);
+
 /* Function used internally, use iommu_domain_destroy */
 void iommu_teardown(struct domain *d);
 
@@ -81,9 +79,6 @@ struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
 void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
 bool_t pt_irq_need_timer(uint32_t flags);
 
-int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg *msg);
-void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg *msg);
-
 #define PT_IRQ_TIME_OUT MILLISECS(8)
 #endif /* HAS_PCI */
 
@@ -127,6 +122,11 @@ struct iommu_ops {
     void (*dump_p2m_table)(struct domain *d);
 };
 
+#ifdef HAS_PCI
+int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg *msg);
+void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg *msg);
+#endif
+
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (6 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 16:55   ` Jan Beulich
  2014-03-18 16:33   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture Julien Grall
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Xiantao Zhang

Add IOMMU helpers to support device tree assignment/deassignment. This patch
introduces 2 new fields in the dt_device_node:
    - is_protected: Does the device is protected by an IOMMU
    - next_assigned: Pointer to the next device assigned to the same
    domain

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>

---
    Changes in v3:
        - Remove iommu_dt_domain_{init,destroy} call in common code. Let
        architecture code to call them
        - Fix indentation in xen/include/xen/hvm/iommu.h
    Changes in v2:
        - Patch added
---
 xen/common/device_tree.c              |    4 ++
 xen/drivers/passthrough/Makefile      |    1 +
 xen/drivers/passthrough/device_tree.c |  106 +++++++++++++++++++++++++++++++++
 xen/include/xen/device_tree.h         |   14 +++++
 xen/include/xen/hvm/iommu.h           |    6 ++
 xen/include/xen/iommu.h               |   16 +++++
 6 files changed, 147 insertions(+)
 create mode 100644 xen/drivers/passthrough/device_tree.c

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 564f2bb..7c6b683 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -1695,6 +1695,10 @@ static unsigned long __init unflatten_dt_node(const void *fdt,
         np->full_name = ((char *)np) + sizeof(struct dt_device_node);
         /* By default dom0 owns the device */
         np->used_by = 0;
+        /* By default the device is not protected */
+        np->is_protected = false;
+        INIT_LIST_HEAD(&np->next_assigned);
+
         if ( new_format )
         {
             char *fn = np->full_name;
diff --git a/xen/drivers/passthrough/Makefile b/xen/drivers/passthrough/Makefile
index 6e08f89..5a0a35e 100644
--- a/xen/drivers/passthrough/Makefile
+++ b/xen/drivers/passthrough/Makefile
@@ -5,3 +5,4 @@ subdir-$(x86_64) += x86
 obj-y += iommu.o
 obj-$(x86) += io.o
 obj-$(HAS_PCI) += pci.o
+obj-$(HAS_DEVICE_TREE) += device_tree.o
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
new file mode 100644
index 0000000..7384e73
--- /dev/null
+++ b/xen/drivers/passthrough/device_tree.c
@@ -0,0 +1,106 @@
+/*
+ * xen/drivers/passthrough/arm/device_tree.c
+ *
+ * Code to passthrough device tree node to a guest
+ *
+ * Julien Grall <julien.grall@linaro.org>
+ * Copyright (c) 2014 Linaro Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/lib.h>
+#include <xen/sched.h>
+#include <xen/iommu.h>
+#include <xen/device_tree.h>
+
+static spinlock_t dtdevs_lock = SPIN_LOCK_UNLOCKED;
+
+int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
+{
+    int rc = -EBUSY;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+
+    if ( !iommu_enabled || !hd->platform_ops )
+        return -EINVAL;
+
+    if ( !dt_device_is_protected(dev) )
+        return -EINVAL;
+
+    spin_lock(&dtdevs_lock);
+
+    if ( !list_empty(&dev->next_assigned) )
+        goto fail;
+
+    rc = hd->platform_ops->assign_dt_device(d, dev);
+
+    if ( rc )
+        goto fail;
+
+    list_add(&dev->next_assigned, &hd->dt_devices);
+    dt_device_set_used_by(dev, d->domain_id);
+
+fail:
+    spin_unlock(&dtdevs_lock);
+
+    return rc;
+}
+
+int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    int rc;
+
+    if ( !iommu_enabled || !hd->platform_ops )
+        return -EINVAL;
+
+    if ( !dt_device_is_protected(dev) )
+        return -EINVAL;
+
+    spin_lock(&dtdevs_lock);
+
+    rc = hd->platform_ops->reassign_dt_device(d, dom0, dev);
+    if ( rc )
+        goto fail;
+
+    dt_device_set_used_by(dev, dom0->domain_id);
+
+    list_del(&dev->next_assigned);
+
+fail:
+    spin_unlock(&dtdevs_lock);
+
+    return rc;
+}
+
+int iommu_dt_domain_init(struct domain *d)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+
+    INIT_LIST_HEAD(&hd->dt_devices);
+
+    return 0;
+}
+
+void iommu_dt_domain_destroy(struct domain *d)
+{
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    struct dt_device_node *dev, *_dev;
+    int rc;
+
+    list_for_each_entry_safe(dev, _dev, &hd->dt_devices, next_assigned)
+    {
+        rc = iommu_deassign_dt_device(d, dev);
+        if ( rc )
+            dprintk(XENLOG_ERR, "Failed to deassign %s in domain %u\n",
+                    dt_node_full_name(dev), d->domain_id);
+    }
+}
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index d429e60..2aae047 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -16,6 +16,7 @@
 #include <xen/string.h>
 #include <xen/types.h>
 #include <xen/stdbool.h>
+#include <xen/list.h>
 
 #define DEVICE_TREE_MAX_DEPTH 16
 
@@ -110,6 +111,9 @@ struct dt_device_node {
     struct dt_device_node *next; /* TODO: Remove it. Only use to know the last children */
     struct dt_device_node *allnext;
 
+    /* IOMMU specific fields */
+    bool is_protected; /* Tell if the device is protected by an IOMMU */
+    struct list_head next_assigned;
 };
 
 #define MAX_PHANDLE_ARGS 16
@@ -325,6 +329,16 @@ static inline domid_t dt_device_used_by(const struct dt_device_node *device)
     return device->used_by;
 }
 
+static inline void dt_device_set_protected(struct dt_device_node *device)
+{
+    device->is_protected = true;
+}
+
+static inline bool dt_device_is_protected(const struct dt_device_node *device)
+{
+    return device->is_protected;
+}
+
 static inline bool_t dt_property_name_is_equal(const struct dt_property *pp,
                                                const char *name)
 {
diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
index f8f8a93..1259e16 100644
--- a/xen/include/xen/hvm/iommu.h
+++ b/xen/include/xen/hvm/iommu.h
@@ -21,6 +21,7 @@
 #define __XEN_HVM_IOMMU_H__
 
 #include <xen/iommu.h>
+#include <xen/list.h>
 #include <asm/hvm/iommu.h>
 
 struct hvm_iommu {
@@ -28,6 +29,11 @@ struct hvm_iommu {
 
     /* iommu_ops */
     const struct iommu_ops *platform_ops;
+
+#ifdef HAS_DEVICE_TREE
+    /* List of DT devices assigned to this domain */
+    struct list_head dt_devices;
+#endif
 };
 
 #endif /* __XEN_HVM_IOMMU_H__ */
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index f556a7e..56f6c5c 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -82,6 +82,16 @@ bool_t pt_irq_need_timer(uint32_t flags);
 #define PT_IRQ_TIME_OUT MILLISECS(8)
 #endif /* HAS_PCI */
 
+#ifdef HAS_DEVICE_TREE
+#include <xen/device_tree.h>
+
+int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev);
+int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev);
+int iommu_dt_domain_init(struct domain *d);
+void iommu_dt_domain_destroy(struct domain *d);
+
+#endif /* HAS_DEVICE_TREE */
+
 #ifdef HAS_PCI
 struct msi_desc;
 struct msi_msg;
@@ -103,6 +113,12 @@ struct iommu_ops {
     int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *msg);
     void (*read_msi_from_ire)(struct msi_desc *msi_desc, struct msi_msg *msg);
 #endif /* HAS_PCI */
+#ifdef HAS_DEVICE_TREE
+    int (*assign_dt_device)(struct domain *d, const struct dt_device_node *dev);
+    int (*reassign_dt_device)(struct domain *s, struct domain *t,
+                              const struct dt_device_node *dev);
+#endif
+
     void (*teardown)(struct domain *d);
     int (*map_page)(struct domain *d, unsigned long gfn, unsigned long mfn,
                     unsigned int flags);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (7 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:40   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 10/13] MAINTAINERS: Add drivers/passthrough/arm Julien Grall
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Xiantao Zhang

This patch contains the architecture to use IOMMUs on ARM. There is no
IOMMU drivers on this patch.

In this implementation, IOMMU page table will be shared with the P2M.

The code will run through the device tree and will initialize every IOMMU.
It's possible to have multiple IOMMUs on the same platform, but they must
be handled with the same driver. For now, there is no support for using
multiple iommu drivers at runtime.

Each new IOMMU drivers should contain:

static const char * const myiommu_dt_compat[] __initconst =
{
    /* list of device compatible with the drivers. Will be matched with
     * the "compatible" property on the device tree
     */
    NULL,
};

DT_DEVICE_START(myiommu, "MY IOMMU", DEVICE_IOMMU)
        .compatible = myiommu_compatible,
        .init = myiommu_init,
DT_DEVICE_END

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>

---
    Changes in v3:
        - Call iommu_dt_domain_{init,destroy} function in arch code
    Changes in v2:
        - Fix typoes in commit message
        - Remove useless comment in arch/arm/setup.c
        - Update copyright date to 2014
        - Move iommu_dom0_init earlier
        - Call iommu_assign_dt_device in map_device when the device is
        protected by an IOMMU
---
 xen/arch/arm/Rules.mk                |    1 +
 xen/arch/arm/domain.c                |    7 ++++
 xen/arch/arm/domain_build.c          |   19 +++++++--
 xen/arch/arm/p2m.c                   |    4 ++
 xen/arch/arm/setup.c                 |    2 +
 xen/drivers/passthrough/Makefile     |    1 +
 xen/drivers/passthrough/arm/Makefile |    1 +
 xen/drivers/passthrough/arm/iommu.c  |   70 ++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/device.h         |    3 +-
 xen/include/asm-arm/domain.h         |    2 +
 xen/include/asm-arm/hvm/iommu.h      |   10 +++++
 xen/include/asm-arm/iommu.h          |   36 +++++++++++++++++
 12 files changed, 152 insertions(+), 4 deletions(-)
 create mode 100644 xen/drivers/passthrough/arm/Makefile
 create mode 100644 xen/drivers/passthrough/arm/iommu.c
 create mode 100644 xen/include/asm-arm/hvm/iommu.h
 create mode 100644 xen/include/asm-arm/iommu.h

diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
index 57f2eb1..1703551 100644
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -9,6 +9,7 @@
 HAS_DEVICE_TREE := y
 HAS_VIDEO := y
 HAS_ARM_HDLCD := y
+HAS_PASSTHROUGH := y
 
 CFLAGS += -I$(BASEDIR)/include
 
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 8f20fdf..c42a1c6 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -550,6 +550,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     if ( (d->domain_id == 0) && (rc = domain_vuart_init(d)) )
         goto fail;
 
+    if ( (rc = iommu_domain_init(d)) != 0 )
+        goto fail;
+
     return 0;
 
 fail:
@@ -561,6 +564,10 @@ fail:
 
 void arch_domain_destroy(struct domain *d)
 {
+    /* IOMMU page table is shared with P2M, always call
+     * iommu_domain_destroy() before p2m_teardown().
+     */
+    iommu_domain_destroy(d);
     p2m_teardown(d);
     domain_vgic_free(d);
     domain_vuart_free(d);
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 32861aa..229954b 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -669,7 +669,7 @@ static int make_timer_node(const struct domain *d, void *fdt,
 }
 
 /* Map the device in the domain */
-static int map_device(struct domain *d, const struct dt_device_node *dev)
+static int map_device(struct domain *d, struct dt_device_node *dev)
 {
     unsigned int nirq;
     unsigned int naddr;
@@ -684,6 +684,18 @@ static int map_device(struct domain *d, const struct dt_device_node *dev)
 
     DPRINT("%s nirq = %d naddr = %u\n", dt_node_full_name(dev), nirq, naddr);
 
+    if ( dt_device_is_protected(dev) )
+    {
+        DPRINT("%s setup iommu\n", dt_node_full_name(dev));
+        res = iommu_assign_dt_device(d, dev);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to setup the IOMMU for %s\n",
+                   dt_node_full_name(dev));
+            return res;
+        }
+    }
+
     /* Map IRQs */
     for ( i = 0; i < nirq; i++ )
     {
@@ -754,7 +766,7 @@ static int map_device(struct domain *d, const struct dt_device_node *dev)
 }
 
 static int handle_node(struct domain *d, struct kernel_info *kinfo,
-                       const struct dt_device_node *node)
+                       struct dt_device_node *node)
 {
     static const struct dt_device_match skip_matches[] __initconst =
     {
@@ -775,7 +787,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
         DT_MATCH_TIMER,
         { /* sentinel */ },
     };
-    const struct dt_device_node *child;
+    struct dt_device_node *child;
     int res;
     const char *name;
     const char *path;
@@ -1008,6 +1020,7 @@ int construct_dom0(struct domain *d)
     kinfo.unassigned_mem = dom0_mem;
 
     allocate_memory(d, &kinfo);
+    iommu_dom0_init(d);
 
     rc = kernel_prepare(&kinfo);
     if ( rc < 0 )
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d00c882..d8ed0de 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -412,12 +412,16 @@ static int apply_p2m_changes(struct domain *d,
 
     if ( flush )
     {
+        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
+        unsigned long egfn = paddr_to_pfn(end_gpaddr);
+
         /* At the beginning of the function, Xen is updating VTTBR
          * with the domain where the mappings are created. In this
          * case it's only necessary to flush TLBs on every CPUs with
          * the current VMID (our domain).
          */
         flush_tlb();
+        iommu_iotlb_flush(d, sgfn, egfn - sgfn);
     }
 
     if ( op == ALLOCATE || op == INSERT )
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f6d713..a771c30 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -725,6 +725,8 @@ void __init start_xen(unsigned long boot_phys_offset,
     local_irq_enable();
     local_abort_enable();
 
+    iommu_setup();
+
     smp_prepare_cpus(cpus);
 
     initialize_keytable();
diff --git a/xen/drivers/passthrough/Makefile b/xen/drivers/passthrough/Makefile
index 5a0a35e..16e9027 100644
--- a/xen/drivers/passthrough/Makefile
+++ b/xen/drivers/passthrough/Makefile
@@ -1,6 +1,7 @@
 subdir-$(x86) += vtd
 subdir-$(x86) += amd
 subdir-$(x86_64) += x86
+subdir-$(arm) += arm
 
 obj-y += iommu.o
 obj-$(x86) += io.o
diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
new file mode 100644
index 0000000..0484b79
--- /dev/null
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -0,0 +1 @@
+obj-y += iommu.o
diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c
new file mode 100644
index 0000000..b0bd71d
--- /dev/null
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -0,0 +1,70 @@
+/*
+ * xen/drivers/passthrough/arm/iommu.c
+ *
+ * Generic IOMMU framework via the device tree
+ *
+ * Julien Grall <julien.grall@linaro.org>
+ * Copyright (c) 2014 Linaro Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/lib.h>
+#include <xen/iommu.h>
+#include <xen/device_tree.h>
+#include <asm/device.h>
+
+static const struct iommu_ops *iommu_ops;
+
+const struct iommu_ops *iommu_get_ops(void)
+{
+    return iommu_ops;
+}
+
+void __init iommu_set_ops(const struct iommu_ops *ops)
+{
+    BUG_ON(ops == NULL);
+
+    if ( iommu_ops && iommu_ops != ops )
+        printk("WARNING: IOMMU ops already set to a different value\n");
+
+    iommu_ops = ops;
+}
+
+int __init iommu_hardware_setup(void)
+{
+    struct dt_device_node *np;
+    int rc;
+    unsigned int num_iommus = 0;
+
+    dt_for_each_device_node(dt_host, np)
+    {
+        rc = device_init(np, DEVICE_IOMMU, NULL);
+        if ( !rc )
+            num_iommus++;
+    }
+
+    return ( num_iommus > 0 ) ? 0 : -ENODEV;
+}
+
+int arch_iommu_domain_init(struct domain *d)
+{
+    int ret;
+
+    ret = iommu_dt_domain_init(d);
+
+    return ret;
+}
+
+void arch_iommu_domain_destroy(struct domain *d)
+{
+    iommu_dt_domain_destroy(d);
+}
diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h
index 9e47ca6..ed04344 100644
--- a/xen/include/asm-arm/device.h
+++ b/xen/include/asm-arm/device.h
@@ -6,7 +6,8 @@
 
 enum device_type
 {
-    DEVICE_SERIAL
+    DEVICE_SERIAL,
+    DEVICE_IOMMU,
 };
 
 struct device_desc {
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index bc20a15..ad6587a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -9,6 +9,7 @@
 #include <asm/vfp.h>
 #include <public/hvm/params.h>
 #include <xen/serial.h>
+#include <xen/hvm/iommu.h>
 
 /* Represents state corresponding to a block of 32 interrupts */
 struct vgic_irq_rank {
@@ -72,6 +73,7 @@ struct pending_irq
 struct hvm_domain
 {
     uint64_t              params[HVM_NR_PARAMS];
+    struct hvm_iommu      hvm_iommu;
 }  __cacheline_aligned;
 
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-arm/hvm/iommu.h b/xen/include/asm-arm/hvm/iommu.h
new file mode 100644
index 0000000..461c8cf
--- /dev/null
+++ b/xen/include/asm-arm/hvm/iommu.h
@@ -0,0 +1,10 @@
+#ifndef __ASM_ARM_HVM_IOMMU_H_
+#define __ASM_ARM_HVM_IOMMU_H_
+
+struct arch_hvm_iommu
+{
+    /* Private information for the IOMMU drivers */
+    void *priv;
+};
+
+#endif /* __ASM_ARM_HVM_IOMMU_H_ */
diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h
new file mode 100644
index 0000000..81eec83
--- /dev/null
+++ b/xen/include/asm-arm/iommu.h
@@ -0,0 +1,36 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+*/
+#ifndef __ARCH_ARM_IOMMU_H__
+#define __ARCH_ARM_IOMMU_H__
+
+/* Always share P2M Table between the CPU and the IOMMU */
+#define iommu_use_hap_pt(d) (1)
+#define domain_hvm_iommu(d) (&d->arch.hvm_domain.hvm_iommu)
+
+const struct iommu_ops *iommu_get_ops(void);
+void __init iommu_set_ops(const struct iommu_ops *ops);
+
+int __init iommu_hardware_setup(void);
+
+#endif /* __ARCH_ARM_IOMMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 10/13] MAINTAINERS: Add drivers/passthrough/arm
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (8 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-11 15:49 ` [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled Julien Grall
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: stefano.stabellini, Keir Fraser, Julien Grall, tim, ian.campbell

Add the ARM IOMMU directory to "ARM ARCHITECTURE" part

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
---
 MAINTAINERS |    1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7757cdd..ad6c8a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -130,6 +130,7 @@ S:	Supported
 L:	xen-devel@lists.xen.org
 F:	xen/arch/arm/
 F:	xen/include/asm-arm/
+F:	xen/drivers/passthrough/arm
 
 CPU POOLS
 M:	Juergen Gross <juergen.gross@ts.fujitsu.com>
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (9 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 10/13] MAINTAINERS: Add drivers/passthrough/arm Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:41   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node Julien Grall
  2014-03-11 15:49 ` [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers Julien Grall
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano.stabellini, Julien Grall, tim, ian.campbell

When iommu={disable,off,no,false} is given to Xen command line, the IOMMU
framework won't specify that the device shouldn't be passthrough to DOM0.

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    Changes in v2:
        - Patch added
---
 xen/arch/arm/device.c        |   15 +++++++++++++++
 xen/arch/arm/domain_build.c  |   10 ++++++++++
 xen/include/asm-arm/device.h |   10 ++++++++++
 3 files changed, 35 insertions(+)

diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index f86b2e3..59e94c0 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -67,6 +67,21 @@ int __init device_init(struct dt_device_node *dev, enum device_type type,
     return -EBADF;
 }
 
+enum device_type device_get_type(const struct dt_device_node *dev)
+{
+    const struct device_desc *desc;
+
+    ASSERT(dev != NULL);
+
+    for ( desc = _sdevice; desc != _edevice; desc++ )
+    {
+        if ( device_is_compatible(desc, dev) )
+            return desc->type;
+    }
+
+    return DEVICE_UNKNOWN;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 229954b..2438aa0 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -11,6 +11,7 @@
 #include <xen/device_tree.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/guest_access.h>
+#include <asm/device.h>
 #include <asm/setup.h>
 #include <asm/platform.h>
 #include <asm/psci.h>
@@ -822,6 +823,15 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
         return 0;
     }
 
+    /* Even if the IOMMU device is not used by Xen, it should not be
+     * passthrough to DOM0
+     */
+    if ( device_get_type(node) == DEVICE_IOMMU )
+    {
+        DPRINT(" IOMMU, skip it\n");
+        return 0;
+    }
+
     /*
      * Some device doesn't need to be mapped in Xen:
      *  - Memory: the guest will see a different view of memory. It will
diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h
index ed04344..60109cc 100644
--- a/xen/include/asm-arm/device.h
+++ b/xen/include/asm-arm/device.h
@@ -8,6 +8,8 @@ enum device_type
 {
     DEVICE_SERIAL,
     DEVICE_IOMMU,
+    /* Use for error */
+    DEVICE_UNKNOWN,
 };
 
 struct device_desc {
@@ -32,6 +34,14 @@ struct device_desc {
 int __init device_init(struct dt_device_node *dev, enum device_type type,
                        const void *data);
 
+/**
+ * device_get_type - Get the type of the device
+ * @dev: device to match
+ *
+ * Return the device type on success or DEVICE_ANY on failure
+ */
+enum device_type device_get_type(const struct dt_device_node *dev);
+
 #define DT_DEVICE_START(_name, _namestr, _type)                     \
 static const struct device_desc __dev_desc_##_name __used           \
 __attribute__((__section__(".dev.info"))) = {                       \
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (10 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:48   ` Ian Campbell
  2014-03-11 15:49 ` [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers Julien Grall
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano.stabellini, Julien Grall, tim, ian.campbell

DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
protected devices should not use it.

Only Xen is abled to know if an IOMMU protects the device. The new property
"protected-devices" is a list of device phandles protected by an IOMMU.

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    This patch *MUST NOT* be applied until we agreed on a device binding
    the device tree folks. DOM0 can run safely with swiotlb on protected
    devices while LVM is not used for guest disk.

    Changes in v2:
        - Patch added
---
 xen/arch/arm/domain_build.c |   51 ++++++++++++++++++++++++++++++++++++++-----
 xen/arch/arm/kernel.h       |    3 +++
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2438aa0..565784a 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -324,19 +324,22 @@ static int make_memory_node(const struct domain *d,
     return res;
 }
 
-static int make_hypervisor_node(struct domain *d,
-                                void *fdt, const struct dt_device_node *parent)
+static int make_hypervisor_node(struct domain *d, struct kernel_info *kinfo,
+                                const struct dt_device_node *parent)
 {
     const char compat[] =
         "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
         "xen,xen";
     __be32 reg[4];
     gic_interrupt_t intr;
-    __be32 *cells;
+    __be32 *cells, *_cells;
     int res;
     int addrcells = dt_n_addr_cells(parent);
     int sizecells = dt_n_size_cells(parent);
     paddr_t gnttab_start, gnttab_size;
+    const struct dt_device_node *dev;
+    struct hvm_iommu *hd = domain_hvm_iommu(d);
+    void *fdt = kinfo->fdt;
 
     DPRINT("Create hypervisor node\n");
 
@@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
     if ( res )
         return res;
 
+    if ( kinfo->num_dev_protected )
+    {
+        /* Don't need to take dtdevs_lock here */
+        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
+                              dt_size_to_cells(sizeof(dt_phandle)));
+        if ( !cells )
+            return -FDT_ERR_XEN(ENOMEM);
+
+        _cells = cells;
+
+        DPRINT("  List of protected devices\n");
+        list_for_each_entry( dev, &hd->dt_devices, next_assigned )
+        {
+            DPRINT("    - %s\n", dt_node_full_name(dev));
+            if ( !dev->phandle )
+            {
+                printk(XENLOG_ERR "Unable to handle protected device (%s)"
+                       "with no phandle", dt_node_full_name(dev));
+                xfree(cells);
+                return -FDT_ERR_XEN(EINVAL);
+            }
+            dt_set_cell(&_cells, dt_size_to_cells(sizeof(dt_phandle)),
+                        dev->phandle);
+        }
+
+        res = fdt_property(fdt, "protected-devices", cells,
+                           sizeof (dt_phandle) * kinfo->num_dev_protected);
+
+        xfree(cells);
+        if ( res )
+            return res;
+    }
+
     res = fdt_end_node(fdt);
 
     return res;
@@ -670,7 +706,8 @@ static int make_timer_node(const struct domain *d, void *fdt,
 }
 
 /* Map the device in the domain */
-static int map_device(struct domain *d, struct dt_device_node *dev)
+static int map_device(struct domain *d, struct kernel_info *kinfo,
+                      struct dt_device_node *dev)
 {
     unsigned int nirq;
     unsigned int naddr;
@@ -695,6 +732,7 @@ static int map_device(struct domain *d, struct dt_device_node *dev)
                    dt_node_full_name(dev));
             return res;
         }
+        kinfo->num_dev_protected++;
     }
 
     /* Map IRQs */
@@ -843,7 +881,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
     if ( !dt_device_type_is_equal(node, "memory") &&
          dt_device_is_available(node) )
     {
-        res = map_device(d, node);
+        res = map_device(d, kinfo, node);
 
         if ( res )
             return res;
@@ -874,7 +912,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
 
     if ( node == dt_host )
     {
-        res = make_hypervisor_node(d, kinfo->fdt, node);
+        res = make_hypervisor_node(d, kinfo, node);
         if ( res )
             return res;
 
@@ -1027,6 +1065,7 @@ int construct_dom0(struct domain *d)
 
     d->max_pages = ~0U;
 
+    kinfo.num_dev_protected = 0;
     kinfo.unassigned_mem = dom0_mem;
 
     allocate_memory(d, &kinfo);
diff --git a/xen/arch/arm/kernel.h b/xen/arch/arm/kernel.h
index b48c2c9..3af5c50 100644
--- a/xen/arch/arm/kernel.h
+++ b/xen/arch/arm/kernel.h
@@ -18,6 +18,9 @@ struct kernel_info {
     paddr_t unassigned_mem; /* RAM not (yet) assigned to a bank */
     struct dt_mem_info mem;
 
+    /* Number of devices protected by an IOMMU */
+    unsigned int num_dev_protected;
+
     paddr_t dtb_paddr;
     paddr_t entry;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers
  2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
                   ` (11 preceding siblings ...)
  2014-03-11 15:49 ` [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node Julien Grall
@ 2014-03-11 15:49 ` Julien Grall
  2014-03-18 16:54   ` Ian Campbell
  12 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 15:49 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, Julien Grall, tim, stefano.stabellini, Jan Beulich,
	Xiantao Zhang

This patch add support for ARM architected SMMU driver. It's based on the
linux drivers (drivers/iommu/arm-smmu) commit 89ac23cd.

The major differences with the Linux driver are:
    - Fault by default if the SMMU is enabled to translate an
    address (Linux is bypassing the SMMU)
    - Using P2M page table instead of creating new one
    - Dropped stage-1 support
    - Dropped chained SMMUs support for now
    - Reworking device assignment and the different structures

Xen is programming each IOMMU by:
    - Using stage-2 mode translation
    - Sharing the page table with the processor
    - Injecting a fault if the device has made a wrong translation

Signed-off-by: Julien Grall<julien.grall@linaro.org>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>

---
    Changes in v3:
        - Missing some static
    Changes in v2:
        - Update commit message
        - Update some comments in the code
        - Add new callbacks to assign/reassign DT device
        - Rework init_dom0 and domain_teardown. The
        assignment/deassignement is now made in the generic code
        - Set protected field in dt_device_node when the device is under
        an IOMMU
        - Use SZ_64K and SZ_4K by the global PAGE_SIZE_{64,4}K in
        xen/iommu.h. The first one was not defined.
---
 xen/drivers/passthrough/arm/Makefile |    1 +
 xen/drivers/passthrough/arm/smmu.c   | 1736 ++++++++++++++++++++++++++++++++++
 xen/include/xen/iommu.h              |    3 +
 3 files changed, 1740 insertions(+)
 create mode 100644 xen/drivers/passthrough/arm/smmu.c

diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
index 0484b79..f4cd26e 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1 +1,2 @@
 obj-y += iommu.o
+obj-y += smmu.o
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
new file mode 100644
index 0000000..78b26fd
--- /dev/null
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -0,0 +1,1736 @@
+/*
+ * IOMMU API for ARM architected SMMU implementations.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Based on Linux drivers/iommu/arm-smmu.c (commit 89a23cd)
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ *
+ * Xen modification:
+ * Julien Grall <julien.grall@linaro.org>
+ * Copyright (C) 2014 Linaro Limited.
+ *
+ * This driver currently supports:
+ *  - SMMUv1 and v2 implementations (didn't try v2 SMMU)
+ *  - Stream-matching and stream-indexing
+ *  - v7/v8 long-descriptor format
+ *  - Non-secure access to the SMMU
+ *  - 4k pages, p2m shared with the processor
+ *  - Up to 40-bit addressing
+ *  - Context fault reporting
+ */
+
+#include <xen/config.h>
+#include <xen/delay.h>
+#include <xen/errno.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/vmap.h>
+#include <xen/rbtree.h>
+#include <xen/sched.h>
+#include <asm/atomic.h>
+#include <asm/device.h>
+#include <asm/io.h>
+#include <asm/platform.h>
+
+/* Driver options */
+#define SMMU_OPT_SECURE_CONFIG_ACCESS   (1 << 0)
+
+/* Maximum number of stream IDs assigned to a single device */
+#define MAX_MASTER_STREAMIDS    MAX_PHANDLE_ARGS
+
+/* Maximum stream ID */
+#define SMMU_MAX_STREAMIDS      (PAGE_SIZE_64K - 1)
+
+/* Maximum number of context banks per SMMU */
+#define SMMU_MAX_CBS        128
+
+/* Maximum number of mapping groups per SMMU */
+#define SMMU_MAX_SMRS       128
+
+/* SMMU global address space */
+#define SMMU_GR0(smmu)      ((smmu)->base)
+#define SMMU_GR1(smmu)      ((smmu)->base + (smmu)->pagesize)
+
+/*
+ * SMMU global address space with conditional offset to access secure aliases of
+ * non-secure registers (e.g. nsCR0: 0x400, nsGFSR: 0x448, nsGFSYNR0: 0x450)
+ */
+#define SMMU_GR0_NS(smmu)                                   \
+    ((smmu)->base +                                         \
+     ((smmu->options & SMMU_OPT_SECURE_CONFIG_ACCESS)    \
+        ? 0x400 : 0))
+
+/* Page table bits */
+#define SMMU_PTE_PAGE           (((pteval_t)3) << 0)
+#define SMMU_PTE_CONT           (((pteval_t)1) << 52)
+#define SMMU_PTE_AF             (((pteval_t)1) << 10)
+#define SMMU_PTE_SH_NS          (((pteval_t)0) << 8)
+#define SMMU_PTE_SH_OS          (((pteval_t)2) << 8)
+#define SMMU_PTE_SH_IS          (((pteval_t)3) << 8)
+
+#if PAGE_SIZE == PAGE_SIZE_4K
+#define SMMU_PTE_CONT_ENTRIES   16
+#elif PAGE_SIZE == PAGE_SIZE_64K
+#define SMMU_PTE_CONT_ENTRIES   32
+#else
+#define SMMU_PTE_CONT_ENTRIES   1
+#endif
+
+#define SMMU_PTE_CONT_SIZE      (PAGE_SIZE * SMMU_PTE_CONT_ENTRIES)
+#define SMMU_PTE_CONT_MASK      (~(SMMU_PTE_CONT_SIZE - 1))
+#define SMMU_PTE_HWTABLE_SIZE   (PTRS_PER_PTE * sizeof(pte_t))
+
+/* Stage-1 PTE */
+#define SMMU_PTE_AP_UNPRIV      (((pteval_t)1) << 6)
+#define SMMU_PTE_AP_RDONLY      (((pteval_t)2) << 6)
+#define SMMU_PTE_ATTRINDX_SHIFT 2
+#define SMMU_PTE_nG             (((pteval_t)1) << 11)
+
+/* Stage-2 PTE */
+#define SMMU_PTE_HAP_FAULT      (((pteval_t)0) << 6)
+#define SMMU_PTE_HAP_READ       (((pteval_t)1) << 6)
+#define SMMU_PTE_HAP_WRITE      (((pteval_t)2) << 6)
+#define SMMU_PTE_MEMATTR_OIWB   (((pteval_t)0xf) << 2)
+#define SMMU_PTE_MEMATTR_NC     (((pteval_t)0x5) << 2)
+#define SMMU_PTE_MEMATTR_DEV    (((pteval_t)0x1) << 2)
+
+/* Configuration registers */
+#define SMMU_GR0_sCR0           0x0
+#define SMMU_sCR0_CLIENTPD      (1 << 0)
+#define SMMU_sCR0_GFRE          (1 << 1)
+#define SMMU_sCR0_GFIE          (1 << 2)
+#define SMMU_sCR0_GCFGFRE       (1 << 4)
+#define SMMU_sCR0_GCFGFIE       (1 << 5)
+#define SMMU_sCR0_USFCFG        (1 << 10)
+#define SMMU_sCR0_VMIDPNE       (1 << 11)
+#define SMMU_sCR0_PTM           (1 << 12)
+#define SMMU_sCR0_FB            (1 << 13)
+#define SMMU_sCR0_BSU_SHIFT     14
+#define SMMU_sCR0_BSU_MASK      0x3
+
+/* Identification registers */
+#define SMMU_GR0_ID0            0x20
+#define SMMU_GR0_ID1            0x24
+#define SMMU_GR0_ID2            0x28
+#define SMMU_GR0_ID3            0x2c
+#define SMMU_GR0_ID4            0x30
+#define SMMU_GR0_ID5            0x34
+#define SMMU_GR0_ID6            0x38
+#define SMMU_GR0_ID7            0x3c
+#define SMMU_GR0_sGFSR          0x48
+#define SMMU_GR0_sGFSYNR0       0x50
+#define SMMU_GR0_sGFSYNR1       0x54
+#define SMMU_GR0_sGFSYNR2       0x58
+#define SMMU_GR0_PIDR0          0xfe0
+#define SMMU_GR0_PIDR1          0xfe4
+#define SMMU_GR0_PIDR2          0xfe8
+
+#define SMMU_ID0_S1TS           (1 << 30)
+#define SMMU_ID0_S2TS           (1 << 29)
+#define SMMU_ID0_NTS            (1 << 28)
+#define SMMU_ID0_SMS            (1 << 27)
+#define SMMU_ID0_PTFS_SHIFT     24
+#define SMMU_ID0_PTFS_MASK      0x2
+#define SMMU_ID0_PTFS_V8_ONLY   0x2
+#define SMMU_ID0_CTTW           (1 << 14)
+#define SMMU_ID0_NUMIRPT_SHIFT  16
+#define SMMU_ID0_NUMIRPT_MASK   0xff
+#define SMMU_ID0_NUMSMRG_SHIFT  0
+#define SMMU_ID0_NUMSMRG_MASK   0xff
+
+#define SMMU_ID1_PAGESIZE            (1 << 31)
+#define SMMU_ID1_NUMPAGENDXB_SHIFT   28
+#define SMMU_ID1_NUMPAGENDXB_MASK    7
+#define SMMU_ID1_NUMS2CB_SHIFT       16
+#define SMMU_ID1_NUMS2CB_MASK        0xff
+#define SMMU_ID1_NUMCB_SHIFT         0
+#define SMMU_ID1_NUMCB_MASK          0xff
+
+#define SMMU_ID2_OAS_SHIFT           4
+#define SMMU_ID2_OAS_MASK            0xf
+#define SMMU_ID2_IAS_SHIFT           0
+#define SMMU_ID2_IAS_MASK            0xf
+#define SMMU_ID2_UBS_SHIFT           8
+#define SMMU_ID2_UBS_MASK            0xf
+#define SMMU_ID2_PTFS_4K             (1 << 12)
+#define SMMU_ID2_PTFS_16K            (1 << 13)
+#define SMMU_ID2_PTFS_64K            (1 << 14)
+
+#define SMMU_PIDR2_ARCH_SHIFT        4
+#define SMMU_PIDR2_ARCH_MASK         0xf
+
+/* Global TLB invalidation */
+#define SMMU_GR0_STLBIALL           0x60
+#define SMMU_GR0_TLBIVMID           0x64
+#define SMMU_GR0_TLBIALLNSNH        0x68
+#define SMMU_GR0_TLBIALLH           0x6c
+#define SMMU_GR0_sTLBGSYNC          0x70
+#define SMMU_GR0_sTLBGSTATUS        0x74
+#define SMMU_sTLBGSTATUS_GSACTIVE   (1 << 0)
+#define SMMU_TLB_LOOP_TIMEOUT       1000000 /* 1s! */
+
+/* Stream mapping registers */
+#define SMMU_GR0_SMR(n)             (0x800 + ((n) << 2))
+#define SMMU_SMR_VALID              (1 << 31)
+#define SMMU_SMR_MASK_SHIFT         16
+#define SMMU_SMR_MASK_MASK          0x7fff
+#define SMMU_SMR_ID_SHIFT           0
+#define SMMU_SMR_ID_MASK            0x7fff
+
+#define SMMU_GR0_S2CR(n)        (0xc00 + ((n) << 2))
+#define SMMU_S2CR_CBNDX_SHIFT   0
+#define SMMU_S2CR_CBNDX_MASK    0xff
+#define SMMU_S2CR_TYPE_SHIFT    16
+#define SMMU_S2CR_TYPE_MASK     0x3
+#define SMMU_S2CR_TYPE_TRANS    (0 << SMMU_S2CR_TYPE_SHIFT)
+#define SMMU_S2CR_TYPE_BYPASS   (1 << SMMU_S2CR_TYPE_SHIFT)
+#define SMMU_S2CR_TYPE_FAULT    (2 << SMMU_S2CR_TYPE_SHIFT)
+
+/* Context bank attribute registers */
+#define SMMU_GR1_CBAR(n)                    (0x0 + ((n) << 2))
+#define SMMU_CBAR_VMID_SHIFT                0
+#define SMMU_CBAR_VMID_MASK                 0xff
+#define SMMU_CBAR_S1_MEMATTR_SHIFT          12
+#define SMMU_CBAR_S1_MEMATTR_MASK           0xf
+#define SMMU_CBAR_S1_MEMATTR_WB             0xf
+#define SMMU_CBAR_TYPE_SHIFT                16
+#define SMMU_CBAR_TYPE_MASK                 0x3
+#define SMMU_CBAR_TYPE_S2_TRANS             (0 << SMMU_CBAR_TYPE_SHIFT)
+#define SMMU_CBAR_TYPE_S1_TRANS_S2_BYPASS   (1 << SMMU_CBAR_TYPE_SHIFT)
+#define SMMU_CBAR_TYPE_S1_TRANS_S2_FAULT    (2 << SMMU_CBAR_TYPE_SHIFT)
+#define SMMU_CBAR_TYPE_S1_TRANS_S2_TRANS    (3 << SMMU_CBAR_TYPE_SHIFT)
+#define SMMU_CBAR_IRPTNDX_SHIFT             24
+#define SMMU_CBAR_IRPTNDX_MASK              0xff
+
+#define SMMU_GR1_CBA2R(n)                   (0x800 + ((n) << 2))
+#define SMMU_CBA2R_RW64_32BIT               (0 << 0)
+#define SMMU_CBA2R_RW64_64BIT               (1 << 0)
+
+/* Translation context bank */
+#define SMMU_CB_BASE(smmu)                  ((smmu)->base + ((smmu)->size >> 1))
+#define SMMU_CB(smmu, n)                    ((n) * (smmu)->pagesize)
+
+#define SMMU_CB_SCTLR                       0x0
+#define SMMU_CB_RESUME                      0x8
+#define SMMU_CB_TCR2                        0x10
+#define SMMU_CB_TTBR0_LO                    0x20
+#define SMMU_CB_TTBR0_HI                    0x24
+#define SMMU_CB_TCR                         0x30
+#define SMMU_CB_S1_MAIR0                    0x38
+#define SMMU_CB_FSR                         0x58
+#define SMMU_CB_FAR_LO                      0x60
+#define SMMU_CB_FAR_HI                      0x64
+#define SMMU_CB_FSYNR0                      0x68
+#define SMMU_CB_S1_TLBIASID                 0x610
+
+#define SMMU_SCTLR_S1_ASIDPNE               (1 << 12)
+#define SMMU_SCTLR_CFCFG                    (1 << 7)
+#define SMMU_SCTLR_CFIE                     (1 << 6)
+#define SMMU_SCTLR_CFRE                     (1 << 5)
+#define SMMU_SCTLR_E                        (1 << 4)
+#define SMMU_SCTLR_AFE                      (1 << 2)
+#define SMMU_SCTLR_TRE                      (1 << 1)
+#define SMMU_SCTLR_M                        (1 << 0)
+#define SMMU_SCTLR_EAE_SBOP                 (SMMU_SCTLR_AFE | SMMU_SCTLR_TRE)
+
+#define SMMU_RESUME_RETRY                   (0 << 0)
+#define SMMU_RESUME_TERMINATE               (1 << 0)
+
+#define SMMU_TCR_EAE                        (1 << 31)
+
+#define SMMU_TCR_PASIZE_SHIFT               16
+#define SMMU_TCR_PASIZE_MASK                0x7
+
+#define SMMU_TCR_TG0_4K                     (0 << 14)
+#define SMMU_TCR_TG0_64K                    (1 << 14)
+
+#define SMMU_TCR_SH0_SHIFT                  12
+#define SMMU_TCR_SH0_MASK                   0x3
+#define SMMU_TCR_SH_NS                      0
+#define SMMU_TCR_SH_OS                      2
+#define SMMU_TCR_SH_IS                      3
+
+#define SMMU_TCR_ORGN0_SHIFT                10
+#define SMMU_TCR_IRGN0_SHIFT                8
+#define SMMU_TCR_RGN_MASK                   0x3
+#define SMMU_TCR_RGN_NC                     0
+#define SMMU_TCR_RGN_WBWA                   1
+#define SMMU_TCR_RGN_WT                     2
+#define SMMU_TCR_RGN_WB                     3
+
+#define SMMU_TCR_SL0_SHIFT                  6
+#define SMMU_TCR_SL0_MASK                   0x3
+#define SMMU_TCR_SL0_LVL_2                  0
+#define SMMU_TCR_SL0_LVL_1                  1
+
+#define SMMU_TCR_T1SZ_SHIFT                 16
+#define SMMU_TCR_T0SZ_SHIFT                 0
+#define SMMU_TCR_SZ_MASK                    0xf
+
+#define SMMU_TCR2_SEP_SHIFT                 15
+#define SMMU_TCR2_SEP_MASK                  0x7
+
+#define SMMU_TCR2_PASIZE_SHIFT              0
+#define SMMU_TCR2_PASIZE_MASK               0x7
+
+/* Common definitions for PASize and SEP fields */
+#define SMMU_TCR2_ADDR_32                   0
+#define SMMU_TCR2_ADDR_36                   1
+#define SMMU_TCR2_ADDR_40                   2
+#define SMMU_TCR2_ADDR_42                   3
+#define SMMU_TCR2_ADDR_44                   4
+#define SMMU_TCR2_ADDR_48                   5
+
+#define SMMU_TTBRn_HI_ASID_SHIFT            16
+
+#define SMMU_MAIR_ATTR_SHIFT(n)             ((n) << 3)
+#define SMMU_MAIR_ATTR_MASK                 0xff
+#define SMMU_MAIR_ATTR_DEVICE               0x04
+#define SMMU_MAIR_ATTR_NC                   0x44
+#define SMMU_MAIR_ATTR_WBRWA                0xff
+#define SMMU_MAIR_ATTR_IDX_NC               0
+#define SMMU_MAIR_ATTR_IDX_CACHE            1
+#define SMMU_MAIR_ATTR_IDX_DEV              2
+
+#define SMMU_FSR_MULTI                      (1 << 31)
+#define SMMU_FSR_SS                         (1 << 30)
+#define SMMU_FSR_UUT                        (1 << 8)
+#define SMMU_FSR_ASF                        (1 << 7)
+#define SMMU_FSR_TLBLKF                     (1 << 6)
+#define SMMU_FSR_TLBMCF                     (1 << 5)
+#define SMMU_FSR_EF                         (1 << 4)
+#define SMMU_FSR_PF                         (1 << 3)
+#define SMMU_FSR_AFF                        (1 << 2)
+#define SMMU_FSR_TF                         (1 << 1)
+
+#define SMMU_FSR_IGN                        (SMMU_FSR_AFF | SMMU_FSR_ASF |    \
+                                             SMMU_FSR_TLBMCF | SMMU_FSR_TLBLKF)
+#define SMMU_FSR_FAULT                      (SMMU_FSR_MULTI | SMMU_FSR_SS |   \
+                                             SMMU_FSR_UUT | SMMU_FSR_EF |     \
+                                             SMMU_FSR_PF | SMMU_FSR_TF |      \
+                                             SMMU_FSR_IGN)
+
+#define SMMU_FSYNR0_WNR                     (1 << 4)
+
+#define smmu_print(dev, lvl, fmt, ...)                                        \
+    printk(lvl "smmu: %s: " fmt, dt_node_full_name(dev->node), ## __VA_ARGS__)
+
+#define smmu_err(dev, fmt, ...) smmu_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
+
+#define smmu_dbg(dev, fmt, ...)                                             \
+    smmu_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
+
+#define smmu_info(dev, fmt, ...)                                            \
+    smmu_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
+
+#define smmu_warn(dev, fmt, ...)                                            \
+    smmu_print(dev, XENLOG_WARNING, fmt, ## __VA_ARGS__)
+
+struct arm_smmu_device {
+    const struct dt_device_node *node;
+
+    void __iomem                *base;
+    unsigned long               size;
+    unsigned long               pagesize;
+
+#define SMMU_FEAT_COHERENT_WALK (1 << 0)
+#define SMMU_FEAT_STREAM_MATCH  (1 << 1)
+#define SMMU_FEAT_TRANS_S1      (1 << 2)
+#define SMMU_FEAT_TRANS_S2      (1 << 3)
+#define SMMU_FEAT_TRANS_NESTED  (1 << 4)
+    u32                         features;
+    u32                         options;
+    int                         version;
+
+    u32                         num_context_banks;
+    u32                         num_s2_context_banks;
+    DECLARE_BITMAP(context_map, SMMU_MAX_CBS);
+    atomic_t                    irptndx;
+
+    u32                         num_mapping_groups;
+    DECLARE_BITMAP(smr_map, SMMU_MAX_SMRS);
+
+    unsigned long               input_size;
+    unsigned long               s1_output_size;
+    unsigned long               s2_output_size;
+
+    u32                         num_global_irqs;
+    u32                         num_context_irqs;
+    struct dt_irq               *irqs;
+
+    u32                         smr_mask_mask;
+    u32                         smr_id_mask;
+
+    unsigned long               *sids;
+
+    struct list_head            list;
+    struct rb_root              masters;
+};
+
+struct arm_smmu_smr {
+    u8                          idx;
+    u16                         mask;
+    u16                         id;
+};
+
+#define INVALID_IRPTNDX         0xff
+
+#define SMMU_CB_ASID(cfg)       ((cfg)->cbndx)
+#define SMMU_CB_VMID(cfg)       ((cfg)->cbndx + 1)
+
+struct arm_smmu_domain_cfg {
+    struct arm_smmu_device  *smmu;
+    u8                      cbndx;
+    u8                      irptndx;
+    u32                     cbar;
+    /* Domain associated to this device */
+    struct domain           *domain;
+    /* List of master which use this structure */
+    struct list_head        masters;
+
+    /* Used to link domain context for a same domain */
+    struct list_head        list;
+};
+
+struct arm_smmu_master {
+    const struct dt_device_node *dt_node;
+
+    /*
+     * The following is specific to the master's position in the
+     * SMMU chain.
+     */
+    struct rb_node              node;
+    u32                         num_streamids;
+    u16                         streamids[MAX_MASTER_STREAMIDS];
+    int                         num_s2crs;
+
+    struct arm_smmu_smr         *smrs;
+    struct arm_smmu_domain_cfg  *cfg;
+
+    /* Used to link masters in a same domain context */
+    struct list_head            list;
+};
+
+static LIST_HEAD(arm_smmu_devices);
+
+struct arm_smmu_domain {
+    spinlock_t lock;
+    struct list_head contexts;
+};
+
+struct arm_smmu_option_prop {
+    u32         opt;
+    const char  *prop;
+};
+
+static const struct arm_smmu_option_prop arm_smmu_options [] __initconst =
+{
+    { SMMU_OPT_SECURE_CONFIG_ACCESS, "calxeda,smmu-secure-config-access" },
+    { 0, NULL},
+};
+
+static void __init check_driver_options(struct arm_smmu_device *smmu)
+{
+    int i = 0;
+
+    do {
+        if ( dt_property_read_bool(smmu->node, arm_smmu_options[i].prop) )
+        {
+            smmu->options |= arm_smmu_options[i].opt;
+            smmu_dbg(smmu, "option %s\n", arm_smmu_options[i].prop);
+        }
+    } while ( arm_smmu_options[++i].opt );
+}
+
+static void arm_smmu_context_fault(int irq, void *data,
+                                   struct cpu_user_regs *regs)
+{
+    u32 fsr, far, fsynr;
+    unsigned long iova;
+    struct arm_smmu_domain_cfg *cfg = data;
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *cb_base;
+
+    cb_base = SMMU_CB_BASE(smmu) + SMMU_CB(smmu, cfg->cbndx);
+    fsr = readl_relaxed(cb_base + SMMU_CB_FSR);
+
+    if ( !(fsr & SMMU_FSR_FAULT) )
+        return;
+
+    if ( fsr & SMMU_FSR_IGN )
+        smmu_err(smmu, "Unexpected context fault (fsr 0x%u)\n", fsr);
+
+    fsynr = readl_relaxed(cb_base + SMMU_CB_FSYNR0);
+    far = readl_relaxed(cb_base + SMMU_CB_FAR_LO);
+    iova = far;
+#ifdef CONFIG_ARM_64
+    far = readl_relaxed(cb_base + SMMU_CB_FAR_HI);
+    iova |= ((unsigned long)far << 32);
+#endif
+
+    smmu_err(smmu,
+             "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, cb=%d\n",
+             iova, fsynr, cfg->cbndx);
+
+    /* Clear the faulting FSR */
+    writel(fsr, cb_base + SMMU_CB_FSR);
+
+    /* Terminate any stalled transactions */
+    if ( fsr & SMMU_FSR_SS )
+        writel_relaxed(SMMU_RESUME_TERMINATE, cb_base + SMMU_CB_RESUME);
+}
+
+static void arm_smmu_global_fault(int irq, void *data,
+                                  struct cpu_user_regs *regs)
+{
+    u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+    struct arm_smmu_device *smmu = data;
+    void __iomem *gr0_base = SMMU_GR0_NS(smmu);
+
+    gfsr = readl_relaxed(gr0_base + SMMU_GR0_sGFSR);
+    gfsynr0 = readl_relaxed(gr0_base + SMMU_GR0_sGFSYNR0);
+    gfsynr1 = readl_relaxed(gr0_base + SMMU_GR0_sGFSYNR1);
+    gfsynr2 = readl_relaxed(gr0_base + SMMU_GR0_sGFSYNR2);
+
+    if ( !gfsr )
+        return;
+
+    smmu_err(smmu, "Unexpected global fault, this could be serious\n");
+    smmu_err(smmu,
+             "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
+             gfsr, gfsynr0, gfsynr1, gfsynr2);
+    writel(gfsr, gr0_base + SMMU_GR0_sGFSR);
+}
+
+static struct arm_smmu_master *
+find_smmu_master(struct arm_smmu_device *smmu,
+                 const struct dt_device_node *dev_node)
+{
+    struct rb_node *node = smmu->masters.rb_node;
+
+    while ( node )
+    {
+        struct arm_smmu_master *master;
+
+        master = container_of(node, struct arm_smmu_master, node);
+
+        if ( dev_node < master->dt_node )
+            node = node->rb_left;
+        else if ( dev_node > master->dt_node )
+            node = node->rb_right;
+        else
+            return master;
+    }
+
+    return NULL;
+}
+
+static __init int insert_smmu_master(struct arm_smmu_device *smmu,
+                                     struct arm_smmu_master *master)
+{
+    struct rb_node **new, *parent;
+
+    new = &smmu->masters.rb_node;
+    parent = NULL;
+    while ( *new )
+    {
+        struct arm_smmu_master *this;
+
+        this = container_of(*new, struct arm_smmu_master, node);
+
+        parent = *new;
+        if ( master->dt_node < this->dt_node )
+            new = &((*new)->rb_left);
+        else if (master->dt_node > this->dt_node)
+            new = &((*new)->rb_right);
+        else
+            return -EEXIST;
+    }
+
+    rb_link_node(&master->node, parent, new);
+    rb_insert_color(&master->node, &smmu->masters);
+    return 0;
+}
+
+static __init int register_smmu_master(struct arm_smmu_device *smmu,
+                                       struct dt_phandle_args *masterspec)
+{
+    int i, sid;
+    struct arm_smmu_master *master;
+    int rc = 0;
+
+    smmu_dbg(smmu, "Try to add master %s\n", masterspec->np->name);
+
+    master = find_smmu_master(smmu, masterspec->np);
+    if ( master )
+    {
+        smmu_err(smmu,
+                 "rejecting multiple registrations for master device %s\n",
+                 masterspec->np->name);
+        return -EBUSY;
+    }
+
+    if ( masterspec->args_count > MAX_MASTER_STREAMIDS )
+    {
+        smmu_err(smmu,
+            "reached maximum number (%d) of stream IDs for master device %s\n",
+            MAX_MASTER_STREAMIDS, masterspec->np->name);
+        return -ENOSPC;
+    }
+
+    master = xzalloc(struct arm_smmu_master);
+    if ( !master )
+        return -ENOMEM;
+
+    INIT_LIST_HEAD(&master->list);
+    master->dt_node = masterspec->np;
+    master->num_streamids = masterspec->args_count;
+
+    dt_device_set_protected(masterspec->np);
+
+    for ( i = 0; i < master->num_streamids; ++i )
+    {
+        sid = masterspec->args[i];
+        if ( test_and_set_bit(sid, smmu->sids) )
+        {
+            smmu_err(smmu, "duplicate stream ID (%d)\n", sid);
+            xfree(master);
+            return -EEXIST;
+        }
+        master->streamids[i] = masterspec->args[i];
+    }
+
+    rc = insert_smmu_master(smmu, master);
+    /* Insertion should never fail */
+    ASSERT(rc == 0);
+
+    return 0;
+}
+
+static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+{
+    int idx;
+
+    do
+    {
+        idx = find_next_zero_bit(map, end, start);
+        if ( idx == end )
+            return -ENOSPC;
+    } while ( test_and_set_bit(idx, map) );
+
+    return idx;
+}
+
+static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
+{
+    clear_bit(idx, map);
+}
+
+static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+{
+    int count = 0;
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+
+    writel_relaxed(0, gr0_base + SMMU_GR0_sTLBGSYNC);
+    while ( readl_relaxed(gr0_base + SMMU_GR0_sTLBGSTATUS) &
+            SMMU_sTLBGSTATUS_GSACTIVE )
+    {
+        cpu_relax();
+        if ( ++count == SMMU_TLB_LOOP_TIMEOUT )
+        {
+            smmu_err(smmu, "TLB sync timed out -- SMMU may be deadlocked\n");
+            return;
+        }
+        udelay(1);
+    }
+}
+
+static void arm_smmu_tlb_inv_context(struct arm_smmu_domain_cfg *cfg)
+{
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *base = SMMU_GR0(smmu);
+
+    writel_relaxed(SMMU_CB_VMID(cfg),
+                   base + SMMU_GR0_TLBIVMID);
+
+    arm_smmu_tlb_sync(smmu);
+}
+
+static void arm_smmu_iotlb_flush_all(struct domain *d)
+{
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+    struct arm_smmu_domain_cfg *cfg;
+
+    spin_lock(&smmu_domain->lock);
+    list_for_each_entry(cfg, &smmu_domain->contexts, list)
+        arm_smmu_tlb_inv_context(cfg);
+    spin_unlock(&smmu_domain->lock);
+}
+
+static void arm_smmu_iotlb_flush(struct domain *d, unsigned long gfn,
+                                 unsigned int page_count)
+{
+    /* ARM SMMU v1 doesn't have flush by VMA and VMID */
+    arm_smmu_iotlb_flush_all(d);
+}
+
+static int determine_smr_mask(struct arm_smmu_device *smmu,
+                              struct arm_smmu_master *master,
+                              struct arm_smmu_smr *smr, int start, int order)
+{
+    u16 i, zero_bits_mask, one_bits_mask, const_mask;
+    int nr;
+
+    nr = 1 << order;
+
+    if ( nr == 1 )
+    {
+        /* no mask, use streamid to match and be done with it */
+        smr->mask = 0;
+        smr->id = master->streamids[start];
+        return 0;
+    }
+
+    zero_bits_mask = 0;
+    one_bits_mask = 0xffff;
+    for ( i = start; i < start + nr; i++)
+    {
+        zero_bits_mask |= master->streamids[i];   /* const 0 bits */
+        one_bits_mask &= master->streamids[i]; /* const 1 bits */
+    }
+    zero_bits_mask = ~zero_bits_mask;
+
+    /* bits having constant values (either 0 or 1) */
+    const_mask = zero_bits_mask | one_bits_mask;
+
+    i = hweight16(~const_mask);
+    if ( (1 << i) == nr )
+    {
+        smr->mask = ~const_mask;
+        smr->id = one_bits_mask;
+    }
+    else
+        /* no usable mask for this set of streamids */
+        return 1;
+
+    if ( ((smr->mask & smmu->smr_mask_mask) != smr->mask) ||
+         ((smr->id & smmu->smr_id_mask) != smr->id) )
+        /* insufficient number of mask/id bits */
+        return 1;
+
+    return 0;
+}
+
+static int determine_smr_mapping(struct arm_smmu_device *smmu,
+                                 struct arm_smmu_master *master,
+                                 struct arm_smmu_smr *smrs, int max_smrs)
+{
+    int nr_sid, nr, i, bit, start;
+
+    /*
+     * This function is called only once -- when a master is added
+     * to a domain. If master->num_s2crs != 0 then this master
+     * was already added to a domain.
+     */
+    BUG_ON(master->num_s2crs);
+
+    start = nr = 0;
+    nr_sid = master->num_streamids;
+    do
+    {
+        /*
+         * largest power-of-2 number of streamids for which to
+         * determine a usable mask/id pair for stream matching
+         */
+        bit = fls(nr_sid);
+        if (!bit)
+            return 0;
+
+        /*
+         * iterate over power-of-2 numbers to determine
+         * largest possible mask/id pair for stream matching
+         * of next 2**i streamids
+         */
+        for ( i = bit - 1; i >= 0; i-- )
+        {
+            if( !determine_smr_mask(smmu, master,
+                                    &smrs[master->num_s2crs],
+                                    start, i))
+                break;
+        }
+
+        if ( i < 0 )
+            goto out;
+
+        nr = 1 << i;
+        nr_sid -= nr;
+        start += nr;
+        master->num_s2crs++;
+    } while ( master->num_s2crs <= max_smrs );
+
+out:
+    if ( nr_sid )
+    {
+        /* not enough mapping groups available */
+        master->num_s2crs = 0;
+        return -ENOSPC;
+    }
+
+    return 0;
+}
+
+static int arm_smmu_master_configure_smrs(struct arm_smmu_device *smmu,
+                                          struct arm_smmu_master *master)
+{
+    int i, max_smrs, ret;
+    struct arm_smmu_smr *smrs;
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+
+    if ( !(smmu->features & SMMU_FEAT_STREAM_MATCH) )
+        return 0;
+
+    if ( master->smrs )
+        return -EEXIST;
+
+    max_smrs = min(smmu->num_mapping_groups, master->num_streamids);
+    smrs = xmalloc_array(struct arm_smmu_smr, max_smrs);
+    if ( !smrs )
+    {
+        smmu_err(smmu, "failed to allocated %d SMRs for master %s\n",
+                 max_smrs, dt_node_name(master->dt_node));
+        return -ENOMEM;
+    }
+
+    ret = determine_smr_mapping(smmu, master, smrs, max_smrs);
+    if ( ret )
+        goto err_free_smrs;
+
+    /* Allocate the SMRs on the root SMMU */
+    for ( i = 0; i < master->num_s2crs; ++i )
+    {
+        int idx = __arm_smmu_alloc_bitmap(smmu->smr_map, 0,
+                                          smmu->num_mapping_groups);
+        if ( idx < 0 )
+        {
+            smmu_err(smmu, "failed to allocate free SMR\n");
+            goto err_free_bitmap;
+        }
+        smrs[i].idx = idx;
+    }
+
+    /* It worked! Now, poke the actual hardware */
+    for ( i = 0; i < master->num_s2crs; ++i )
+    {
+        u32 reg = SMMU_SMR_VALID | smrs[i].id << SMMU_SMR_ID_SHIFT |
+            smrs[i].mask << SMMU_SMR_MASK_SHIFT;
+        smmu_dbg(smmu, "SMR%d: 0x%x\n", smrs[i].idx, reg);
+        writel_relaxed(reg, gr0_base + SMMU_GR0_SMR(smrs[i].idx));
+    }
+
+    master->smrs = smrs;
+    return 0;
+
+err_free_bitmap:
+    while (--i >= 0)
+        __arm_smmu_free_bitmap(smmu->smr_map, smrs[i].idx);
+    master->num_s2crs = 0;
+err_free_smrs:
+    xfree(smrs);
+    return -ENOSPC;
+}
+
+/* Forward declaration */
+static void arm_smmu_destroy_domain_context(struct arm_smmu_domain_cfg *cfg);
+
+static int arm_smmu_domain_add_master(struct domain *d,
+                                      struct arm_smmu_domain_cfg *cfg,
+                                      struct arm_smmu_master *master)
+{
+    int i, ret;
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+    struct arm_smmu_smr *smrs = master->smrs;
+
+    if ( master->cfg )
+        return -EBUSY;
+
+    ret = arm_smmu_master_configure_smrs(smmu, master);
+    if ( ret )
+        return ret;
+
+    /* Now we're at the root, time to point at our context bank */
+    if ( !master->num_s2crs )
+        master->num_s2crs = master->num_streamids;
+
+    for ( i = 0; i < master->num_s2crs; ++i )
+    {
+        u32 idx, s2cr;
+
+        idx = smrs ? smrs[i].idx : master->streamids[i];
+        s2cr = (SMMU_S2CR_TYPE_TRANS << SMMU_S2CR_TYPE_SHIFT) |
+            (cfg->cbndx << SMMU_S2CR_CBNDX_SHIFT);
+        smmu_dbg(smmu, "S2CR%d: 0x%x\n", idx, s2cr);
+        writel_relaxed(s2cr, gr0_base + SMMU_GR0_S2CR(idx));
+    }
+
+    master->cfg = cfg;
+    list_add(&master->list, &cfg->masters);
+
+    return 0;
+}
+
+static void arm_smmu_domain_remove_master(struct arm_smmu_master *master)
+{
+    int i;
+    struct arm_smmu_domain_cfg *cfg = master->cfg;
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+    struct arm_smmu_smr *smrs = master->smrs;
+
+    /*
+     * We *must* clear the S2CR first, because freeing the SMR means
+     * that it can be reallocated immediately
+     */
+    for ( i = 0; i < master->num_streamids; ++i )
+    {
+        u16 sid = master->streamids[i];
+        writel_relaxed(SMMU_S2CR_TYPE_FAULT,
+                       gr0_base + SMMU_GR0_S2CR(sid));
+    }
+
+    /* Invalidate the SMRs before freeing back to the allocator */
+    for (i = 0; i < master->num_streamids; ++i) {
+        u8 idx = smrs[i].idx;
+        writel_relaxed(~SMMU_SMR_VALID, gr0_base + SMMU_GR0_SMR(idx));
+        __arm_smmu_free_bitmap(smmu->smr_map, idx);
+    }
+
+    master->smrs = NULL;
+    xfree(smrs);
+
+    master->cfg = NULL;
+    list_del(&master->list);
+    INIT_LIST_HEAD(&master->list);
+}
+
+static void arm_smmu_init_context_bank(struct arm_smmu_domain_cfg *cfg)
+{
+    u32 reg;
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *cb_base, *gr0_base, *gr1_base;
+    paddr_t p2maddr;
+
+    ASSERT(cfg->domain != NULL);
+    p2maddr = page_to_maddr(cfg->domain->arch.p2m.first_level);
+
+    gr0_base = SMMU_GR0(smmu);
+    gr1_base = SMMU_GR1(smmu);
+    cb_base = SMMU_CB_BASE(smmu) + SMMU_CB(smmu, cfg->cbndx);
+
+    /* CBAR */
+    reg = cfg->cbar;
+    if ( smmu->version == 1 )
+        reg |= cfg->irptndx << SMMU_CBAR_IRPTNDX_SHIFT;
+
+    reg |= SMMU_CB_VMID(cfg) << SMMU_CBAR_VMID_SHIFT;
+    writel_relaxed(reg, gr1_base + SMMU_GR1_CBAR(cfg->cbndx));
+
+    if ( smmu->version > 1 )
+    {
+        /* CBA2R */
+#ifdef CONFIG_ARM_64
+        reg = SMMU_CBA2R_RW64_64BIT;
+#else
+        reg = SMMU_CBA2R_RW64_32BIT;
+#endif
+        writel_relaxed(reg, gr1_base + SMMU_GR1_CBA2R(cfg->cbndx));
+    }
+
+    /* TTBR0 */
+    reg = (p2maddr & ((1ULL << 32) - 1));
+    writel_relaxed(reg, cb_base + SMMU_CB_TTBR0_LO);
+    reg = (p2maddr >> 32);
+    writel_relaxed(reg, cb_base + SMMU_CB_TTBR0_HI);
+
+    /*
+     * TCR
+     * We use long descriptor, with inner-shareable WBWA tables in TTBR0.
+     */
+    if ( smmu->version > 1 )
+    {
+        /* 4K Page Table */
+        if ( PAGE_SIZE == PAGE_SIZE_4K )
+            reg = SMMU_TCR_TG0_4K;
+        else
+            reg = SMMU_TCR_TG0_64K;
+
+        switch ( smmu->s2_output_size ) {
+        case 32:
+            reg |= (SMMU_TCR2_ADDR_32 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        case 36:
+            reg |= (SMMU_TCR2_ADDR_36 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        case 40:
+            reg |= (SMMU_TCR2_ADDR_40 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        case 42:
+            reg |= (SMMU_TCR2_ADDR_42 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        case 44:
+            reg |= (SMMU_TCR2_ADDR_44 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        case 48:
+            reg |= (SMMU_TCR2_ADDR_48 << SMMU_TCR_PASIZE_SHIFT);
+            break;
+        }
+    }
+    else
+        reg = 0;
+
+    /* The attribute to walk the page table should be the same as VTCR_EL2 */
+    reg |= SMMU_TCR_EAE |
+        (SMMU_TCR_SH_NS << SMMU_TCR_SH0_SHIFT) |
+        (SMMU_TCR_RGN_WBWA << SMMU_TCR_ORGN0_SHIFT) |
+        (SMMU_TCR_RGN_WBWA << SMMU_TCR_IRGN0_SHIFT) |
+        (SMMU_TCR_SL0_LVL_1 << SMMU_TCR_SL0_SHIFT);
+    writel_relaxed(reg, cb_base + SMMU_CB_TCR);
+
+    /* SCTLR */
+    reg = SMMU_SCTLR_CFCFG |
+        SMMU_SCTLR_CFIE |
+        SMMU_SCTLR_CFRE |
+        SMMU_SCTLR_M |
+        SMMU_SCTLR_EAE_SBOP;
+
+    writel_relaxed(reg, cb_base + SMMU_CB_SCTLR);
+}
+
+static struct arm_smmu_domain_cfg *
+arm_smmu_alloc_domain_context(struct domain *d,
+                              struct arm_smmu_device *smmu)
+{
+    const struct dt_irq *irq;
+    int ret, start;
+    struct arm_smmu_domain_cfg *cfg;
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+
+    ASSERT(spin_is_locked(&smmu_domain->lock));
+
+    cfg = xzalloc(struct arm_smmu_domain_cfg);
+    if ( !cfg )
+        return NULL;
+
+    /* Master already initialized to another domain ... */
+    if ( cfg->domain != NULL )
+        goto out_free_mem;
+
+    cfg->cbar = SMMU_CBAR_TYPE_S2_TRANS;
+    start = 0;
+
+    ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+                                  smmu->num_context_banks);
+    if ( ret < 0 )
+        goto out_free_mem;
+
+    cfg->cbndx = ret;
+    if ( smmu->version == 1 )
+    {
+        cfg->irptndx = atomic_inc_return(&smmu->irptndx);
+        cfg->irptndx %= smmu->num_context_irqs;
+    }
+    else
+        cfg->irptndx = cfg->cbndx;
+
+    irq = &smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
+    ret = request_dt_irq(irq, arm_smmu_context_fault,
+                         "arm-smmu-context-fault", cfg);
+    if ( ret )
+    {
+        smmu_err(smmu, "failed to request context IRQ %d (%u)\n",
+                 cfg->irptndx, irq->irq);
+        cfg->irptndx = INVALID_IRPTNDX;
+        goto out_free_context;
+    }
+
+    cfg->domain = d;
+    cfg->smmu = smmu;
+
+    arm_smmu_init_context_bank(cfg);
+    list_add(&cfg->list, &smmu_domain->contexts);
+    INIT_LIST_HEAD(&cfg->masters);
+
+    return cfg;
+
+out_free_context:
+    __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+out_free_mem:
+    xfree(cfg);
+
+    return NULL;
+}
+
+static void arm_smmu_destroy_domain_context(struct arm_smmu_domain_cfg *cfg)
+{
+    struct domain *d = cfg->domain;
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+    struct arm_smmu_device *smmu = cfg->smmu;
+    void __iomem *cb_base;
+    const struct dt_irq *irq;
+
+    ASSERT(spin_is_locked(&smmu_domain->lock));
+    BUG_ON(!list_empty(&cfg->masters));
+
+    /* Disable the context bank and nuke the TLB before freeing it */
+    cb_base = SMMU_CB_BASE(smmu) + SMMU_CB(smmu, cfg->cbndx);
+    writel_relaxed(0, cb_base + SMMU_CB_SCTLR);
+    arm_smmu_tlb_inv_context(cfg);
+
+    if ( cfg->irptndx != INVALID_IRPTNDX )
+    {
+        irq = &smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
+        release_dt_irq(irq, cfg);
+    }
+
+    __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+    list_del(&cfg->list);
+    xfree(cfg);
+}
+
+static struct arm_smmu_device *
+arm_smmu_find_smmu_by_dev(const struct dt_device_node *dev)
+{
+    struct arm_smmu_device *smmu;
+    struct arm_smmu_master *master = NULL;
+
+    list_for_each_entry( smmu, &arm_smmu_devices, list )
+    {
+        master = find_smmu_master(smmu, dev);
+        if ( master )
+            break;
+    }
+
+    if ( !master )
+        return NULL;
+
+    return smmu;
+}
+
+static int arm_smmu_attach_dev(struct domain *d,
+                               const struct dt_device_node *dev)
+{
+    struct arm_smmu_device *smmu = arm_smmu_find_smmu_by_dev(dev);
+    struct arm_smmu_master *master;
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+    struct arm_smmu_domain_cfg *cfg = NULL;
+    struct arm_smmu_domain_cfg *curr;
+    int ret;
+
+    printk(XENLOG_DEBUG "arm-smmu: attach %s to domain %d\n",
+           dt_node_full_name(dev), d->domain_id);
+
+    if ( !smmu )
+    {
+        printk(XENLOG_ERR "%s: cannot attach to SMMU, is it on the same bus?\n",
+               dt_node_full_name(dev));
+        return -ENODEV;
+    }
+
+    master = find_smmu_master(smmu, dev);
+    BUG_ON(master == NULL);
+
+    /* Check if the device is already assigned to someone */
+    if ( master->cfg )
+        return -EBUSY;
+
+    spin_lock(&smmu_domain->lock);
+    list_for_each_entry( curr, &smmu_domain->contexts, list )
+    {
+        if ( curr->smmu == smmu )
+        {
+            cfg = curr;
+            break;
+        }
+    }
+
+    if ( !cfg )
+    {
+        cfg = arm_smmu_alloc_domain_context(d, smmu);
+        if ( !cfg )
+        {
+            smmu_err(smmu, "unable to allocate context for domain %u\n",
+                     d->domain_id);
+            spin_unlock(&smmu_domain->lock);
+            return -ENOMEM;
+        }
+    }
+    spin_unlock(&smmu_domain->lock);
+
+    ret = arm_smmu_domain_add_master(d, cfg, master);
+    if ( ret )
+    {
+        spin_lock(&smmu_domain->lock);
+        if ( list_empty(&cfg->masters) )
+            arm_smmu_destroy_domain_context(cfg);
+        spin_unlock(&smmu_domain->lock);
+    }
+
+    return ret;
+}
+
+static int arm_smmu_detach_dev(struct domain *d,
+                               const struct dt_device_node *dev)
+{
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+    struct arm_smmu_master *master;
+    struct arm_smmu_device *smmu = arm_smmu_find_smmu_by_dev(dev);
+    struct arm_smmu_domain_cfg *cfg;
+
+    printk(XENLOG_DEBUG "arm-smmu: detach %s to domain %d\n",
+           dt_node_full_name(dev), d->domain_id);
+
+    if ( !smmu )
+    {
+        printk(XENLOG_ERR "%s: cannot find the SMMU, is it on the same bus?\n",
+               dt_node_full_name(dev));
+        return -ENODEV;
+    }
+
+    master = find_smmu_master(smmu, dev);
+    BUG_ON(master == NULL);
+
+    cfg = master->cfg;
+
+    /* Sanity check to avoid removing a device that doesn't belong to
+     * the domain
+     */
+    if ( !cfg || cfg->domain != d )
+    {
+        printk(XENLOG_ERR "%s: was not attach to domain %d\n",
+               dt_node_full_name(dev), d->domain_id);
+        return -ESRCH;
+    }
+
+    arm_smmu_domain_remove_master(master);
+
+    spin_lock(&smmu_domain->lock);
+    if ( list_empty(&cfg->masters) )
+        arm_smmu_destroy_domain_context(cfg);
+    spin_unlock(&smmu_domain->lock);
+
+    return 0;
+}
+
+static int arm_smmu_reassign_dt_dev(struct domain *s, struct domain *t,
+                                    const struct dt_device_node *dev)
+{
+    int ret = 0;
+
+    /* Don't allow remapping on other domain than dom0 */
+    if ( t != dom0 )
+        return -EPERM;
+
+    if ( t == s )
+        return 0;
+
+    ret = arm_smmu_detach_dev(s, dev);
+    if ( ret )
+        return ret;
+
+    ret = arm_smmu_attach_dev(t, dev);
+
+    return ret;
+}
+
+static __init int arm_smmu_id_size_to_bits(int size)
+{
+    switch ( size )
+    {
+    case 0:
+        return 32;
+    case 1:
+        return 36;
+    case 2:
+        return 40;
+    case 3:
+        return 42;
+    case 4:
+        return 44;
+    case 5:
+    default:
+        return 48;
+    }
+}
+
+static __init int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
+{
+    unsigned long size;
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+    u32 id;
+
+    smmu_info(smmu, "probing hardware configuration...\n");
+
+    /*
+     * Primecell ID
+     */
+    id = readl_relaxed(gr0_base + SMMU_GR0_PIDR2);
+    smmu->version = ((id >> SMMU_PIDR2_ARCH_SHIFT) & SMMU_PIDR2_ARCH_MASK) + 1;
+    smmu_info(smmu, "SMMUv%d with:\n", smmu->version);
+
+    /* ID0 */
+    id = readl_relaxed(gr0_base + SMMU_GR0_ID0);
+#ifndef CONFIG_ARM_64
+    if ( ((id >> SMMU_ID0_PTFS_SHIFT) & SMMU_ID0_PTFS_MASK) ==
+            SMMU_ID0_PTFS_V8_ONLY )
+    {
+        smmu_err(smmu, "\tno v7 descriptor support!\n");
+        return -ENODEV;
+    }
+#endif
+    if ( id & SMMU_ID0_S1TS )
+    {
+        smmu->features |= SMMU_FEAT_TRANS_S1;
+        smmu_info(smmu, "\tstage 1 translation\n");
+    }
+
+    if ( id & SMMU_ID0_S2TS )
+    {
+        smmu->features |= SMMU_FEAT_TRANS_S2;
+        smmu_info(smmu, "\tstage 2 translation\n");
+    }
+
+    if ( id & SMMU_ID0_NTS )
+    {
+        smmu->features |= SMMU_FEAT_TRANS_NESTED;
+        smmu_info(smmu, "\tnested translation\n");
+    }
+
+    if ( !(smmu->features &
+           (SMMU_FEAT_TRANS_S1 | SMMU_FEAT_TRANS_S2 |
+            SMMU_FEAT_TRANS_NESTED)) )
+    {
+        smmu_err(smmu, "\tno translation support!\n");
+        return -ENODEV;
+    }
+
+    /* We need at least support for Stage 2 */
+    if ( !(smmu->features & SMMU_FEAT_TRANS_S2) )
+    {
+        smmu_err(smmu, "\tno stage 2 translation!\n");
+        return -ENODEV;
+    }
+
+    if ( id & SMMU_ID0_CTTW )
+    {
+        smmu->features |= SMMU_FEAT_COHERENT_WALK;
+        smmu_info(smmu, "\tcoherent table walk\n");
+    }
+
+    if ( id & SMMU_ID0_SMS )
+    {
+        u32 smr, sid, mask;
+
+        smmu->features |= SMMU_FEAT_STREAM_MATCH;
+        smmu->num_mapping_groups = (id >> SMMU_ID0_NUMSMRG_SHIFT) &
+            SMMU_ID0_NUMSMRG_MASK;
+        if ( smmu->num_mapping_groups == 0 )
+        {
+            smmu_err(smmu,
+                     "stream-matching supported, but no SMRs present!\n");
+            return -ENODEV;
+        }
+
+        smr = SMMU_SMR_MASK_MASK << SMMU_SMR_MASK_SHIFT;
+        smr |= (SMMU_SMR_ID_MASK << SMMU_SMR_ID_SHIFT);
+        writel_relaxed(smr, gr0_base + SMMU_GR0_SMR(0));
+        smr = readl_relaxed(gr0_base + SMMU_GR0_SMR(0));
+
+        mask = (smr >> SMMU_SMR_MASK_SHIFT) & SMMU_SMR_MASK_MASK;
+        sid = (smr >> SMMU_SMR_ID_SHIFT) & SMMU_SMR_ID_MASK;
+        if ( (mask & sid) != sid )
+        {
+            smmu_err(smmu,
+                     "SMR mask bits (0x%x) insufficient for ID field (0x%x)\n",
+                     mask, sid);
+            return -ENODEV;
+        }
+        smmu->smr_mask_mask = mask;
+        smmu->smr_id_mask = sid;
+
+        smmu_info(smmu,
+                  "\tstream matching with %u register groups, mask 0x%x\n",
+                  smmu->num_mapping_groups, mask);
+    }
+
+    /* ID1 */
+    id = readl_relaxed(gr0_base + SMMU_GR0_ID1);
+    smmu->pagesize = (id & SMMU_ID1_PAGESIZE) ? PAGE_SIZE_64K : PAGE_SIZE_4K;
+
+    /* Check for size mismatch of SMMU address space from mapped region */
+    size = 1 << (((id >> SMMU_ID1_NUMPAGENDXB_SHIFT) &
+                  SMMU_ID1_NUMPAGENDXB_MASK) + 1);
+    size *= (smmu->pagesize << 1);
+    if ( smmu->size != size )
+        smmu_warn(smmu, "SMMU address space size (0x%lx) differs "
+                  "from mapped region size (0x%lx)!\n", size, smmu->size);
+
+    smmu->num_s2_context_banks = (id >> SMMU_ID1_NUMS2CB_SHIFT) &
+        SMMU_ID1_NUMS2CB_MASK;
+    smmu->num_context_banks = (id >> SMMU_ID1_NUMCB_SHIFT) &
+        SMMU_ID1_NUMCB_MASK;
+    if ( smmu->num_s2_context_banks > smmu->num_context_banks )
+    {
+        smmu_err(smmu, "impossible number of S2 context banks!\n");
+        return -ENODEV;
+    }
+    smmu_info(smmu, "\t%u context banks (%u stage-2 only)\n",
+              smmu->num_context_banks, smmu->num_s2_context_banks);
+
+    /* ID2 */
+    id = readl_relaxed(gr0_base + SMMU_GR0_ID2);
+    size = arm_smmu_id_size_to_bits((id >> SMMU_ID2_IAS_SHIFT) &
+                                    SMMU_ID2_IAS_MASK);
+
+    /*
+     * Stage-1 output limited by stage-2 input size due to VTCR_EL2
+     * setup (see setup_virt_paging)
+     */
+    /* Current maximum output size of 40 bits */
+    smmu->s1_output_size = min(40UL, size);
+
+    /* The stage-2 output mask is also applied for bypass */
+    size = arm_smmu_id_size_to_bits((id >> SMMU_ID2_OAS_SHIFT) &
+                                    SMMU_ID2_OAS_MASK);
+    smmu->s2_output_size = min((unsigned long)PADDR_BITS, size);
+
+    if ( smmu->version == 1 )
+        smmu->input_size = 32;
+    else
+    {
+#ifdef CONFIG_ARM_64
+        size = (id >> SMMU_ID2_UBS_SHIFT) & SMMU_ID2_UBS_MASK;
+        size = min(39, arm_smmu_id_size_to_bits(size));
+#else
+        size = 32;
+#endif
+        smmu->input_size = size;
+
+        if ( (PAGE_SIZE == PAGE_SIZE_4K && !(id & SMMU_ID2_PTFS_4K) ) ||
+             (PAGE_SIZE == PAGE_SIZE_64K && !(id & SMMU_ID2_PTFS_64K)) ||
+             (PAGE_SIZE != PAGE_SIZE_4K && PAGE_SIZE != PAGE_SIZE_64K) )
+        {
+            smmu_err(smmu, "CPU page size 0x%lx unsupported\n",
+                     PAGE_SIZE);
+            return -ENODEV;
+        }
+    }
+
+    smmu_info(smmu, "\t%lu-bit VA, %lu-bit IPA, %lu-bit PA\n",
+              smmu->input_size, smmu->s1_output_size, smmu->s2_output_size);
+    return 0;
+}
+
+static __init void arm_smmu_device_reset(struct arm_smmu_device *smmu)
+{
+    void __iomem *gr0_base = SMMU_GR0(smmu);
+    void __iomem *cb_base;
+    int i = 0;
+    u32 reg;
+
+    smmu_dbg(smmu, "device reset\n");
+
+    /* Clear Global FSR */
+    reg = readl_relaxed(SMMU_GR0_NS(smmu) + SMMU_GR0_sGFSR);
+    writel(reg, SMMU_GR0_NS(smmu) + SMMU_GR0_sGFSR);
+
+    /* Mark all SMRn as invalid and all S2CRn as fault */
+    for ( i = 0; i < smmu->num_mapping_groups; ++i )
+    {
+        writel_relaxed(~SMMU_SMR_VALID, gr0_base + SMMU_GR0_SMR(i));
+        writel_relaxed(SMMU_S2CR_TYPE_FAULT, gr0_base + SMMU_GR0_S2CR(i));
+    }
+
+    /* Make sure all context banks are disabled and clear CB_FSR  */
+    for ( i = 0; i < smmu->num_context_banks; ++i )
+    {
+        cb_base = SMMU_CB_BASE(smmu) + SMMU_CB(smmu, i);
+        writel_relaxed(0, cb_base + SMMU_CB_SCTLR);
+        writel_relaxed(SMMU_FSR_FAULT, cb_base + SMMU_CB_FSR);
+    }
+
+    /* Invalidate the TLB, just in case */
+    writel_relaxed(0, gr0_base + SMMU_GR0_STLBIALL);
+    writel_relaxed(0, gr0_base + SMMU_GR0_TLBIALLH);
+    writel_relaxed(0, gr0_base + SMMU_GR0_TLBIALLNSNH);
+
+    reg = readl_relaxed(SMMU_GR0_NS(smmu) + SMMU_GR0_sCR0);
+
+    /* Enable fault reporting */
+    reg |= (SMMU_sCR0_GFRE | SMMU_sCR0_GFIE |
+            SMMU_sCR0_GCFGFRE | SMMU_sCR0_GCFGFIE);
+
+    /* Disable TLB broadcasting. */
+    reg |= (SMMU_sCR0_VMIDPNE | SMMU_sCR0_PTM);
+
+    /* Enable client access, generate a fault if no mapping is found */
+    reg &= ~(SMMU_sCR0_CLIENTPD);
+    reg |= SMMU_sCR0_USFCFG;
+
+    /* Disable forced broadcasting */
+    reg &= ~SMMU_sCR0_FB;
+
+    /* Don't upgrade barriers */
+    reg &= ~(SMMU_sCR0_BSU_MASK << SMMU_sCR0_BSU_SHIFT);
+
+    /* Push the button */
+    arm_smmu_tlb_sync(smmu);
+    writel_relaxed(reg, SMMU_GR0_NS(smmu) + SMMU_GR0_sCR0);
+}
+
+static int arm_smmu_iommu_domain_init(struct domain *d)
+{
+    struct arm_smmu_domain *smmu_domain;
+
+    smmu_domain = xzalloc(struct arm_smmu_domain);
+    if ( !smmu_domain )
+        return -ENOMEM;
+
+    spin_lock_init(&smmu_domain->lock);
+    INIT_LIST_HEAD(&smmu_domain->contexts);
+
+    domain_hvm_iommu(d)->arch.priv = smmu_domain;
+
+    return 0;
+}
+
+static void arm_smmu_iommu_dom0_init(struct domain *d)
+{
+}
+
+static void arm_smmu_iommu_domain_teardown(struct domain *d)
+{
+    struct arm_smmu_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
+
+    ASSERT(list_empty(&smmu_domain->contexts));
+    xfree(smmu_domain);
+}
+
+static const struct iommu_ops arm_smmu_iommu_ops = {
+    .init = arm_smmu_iommu_domain_init,
+    .dom0_init = arm_smmu_iommu_dom0_init,
+    .teardown = arm_smmu_iommu_domain_teardown,
+    .iotlb_flush = arm_smmu_iotlb_flush,
+    .iotlb_flush_all = arm_smmu_iotlb_flush_all,
+    .assign_dt_device = arm_smmu_attach_dev,
+    .reassign_dt_device = arm_smmu_reassign_dt_dev,
+};
+
+static int __init smmu_init(struct dt_device_node *dev,
+                            const void *data)
+{
+    struct arm_smmu_device *smmu;
+    int res;
+    u64 addr, size;
+    unsigned int num_irqs, i;
+    struct dt_phandle_args masterspec;
+    struct rb_node *node;
+
+    /* Even if the device can't be initialized, we don't want to give to
+     * dom0 the smmu device
+     */
+    dt_device_set_used_by(dev, DOMID_XEN);
+
+    smmu = xzalloc(struct arm_smmu_device);
+    if ( !smmu )
+    {
+        printk(XENLOG_ERR "%s: failed to allocate arm_smmu_device\n",
+               dt_node_full_name(dev));
+        return -ENOMEM;
+    }
+
+    smmu->node = dev;
+    check_driver_options(smmu);
+
+    res = dt_device_get_address(smmu->node, 0, &addr, &size);
+    if ( res )
+    {
+        smmu_err(smmu, "unable to retrieve the base address of the SMMU\n");
+        goto out_err;
+    }
+
+    smmu->base = ioremap_nocache(addr, size);
+    if ( !smmu->base )
+    {
+        smmu_err(smmu, "unable to map the SMMU memory\n");
+        goto out_err;
+    }
+
+    smmu->size = size;
+
+    if ( !dt_property_read_u32(smmu->node, "#global-interrupts",
+                               &smmu->num_global_irqs) )
+    {
+        smmu_err(smmu, "missing #global-interrupts\n");
+        goto out_unmap;
+    }
+
+    num_irqs = dt_number_of_irq(smmu->node);
+    if ( num_irqs > smmu->num_global_irqs )
+        smmu->num_context_irqs = num_irqs - smmu->num_global_irqs;
+
+    if ( !smmu->num_context_irqs )
+    {
+        smmu_err(smmu, "found %d interrupts but expected at least %d\n",
+                 num_irqs, smmu->num_global_irqs + 1);
+        goto out_unmap;
+    }
+
+    smmu->irqs = xzalloc_array(struct dt_irq, num_irqs);
+    if ( !smmu->irqs )
+    {
+        smmu_err(smmu, "failed to allocated %d irqs\n", num_irqs);
+        goto out_unmap;
+    }
+
+    for ( i = 0; i < num_irqs; i++ )
+    {
+        res = dt_device_get_irq(smmu->node, i, &smmu->irqs[i]);
+        if ( res )
+        {
+            smmu_err(smmu, "failed to get irq index %d\n", i);
+            goto out_free_irqs;
+        }
+    }
+
+    smmu->sids = xzalloc_array(unsigned long,
+                               BITS_TO_LONGS(SMMU_MAX_STREAMIDS));
+    if ( !smmu->sids )
+    {
+        smmu_err(smmu, "failed to allocated bitmap for stream ID tracking\n");
+        goto out_free_masters;
+    }
+
+
+    i = 0;
+    smmu->masters = RB_ROOT;
+    while ( !dt_parse_phandle_with_args(smmu->node, "mmu-masters",
+                                        "#stream-id-cells", i, &masterspec) )
+    {
+        res = register_smmu_master(smmu, &masterspec);
+        if ( res )
+        {
+            smmu_err(smmu, "failed to add master %s\n",
+                     masterspec.np->name);
+            goto out_free_masters;
+        }
+        i++;
+    }
+
+    smmu_info(smmu, "registered %d master devices\n", i);
+
+    res = arm_smmu_device_cfg_probe(smmu);
+    if ( res )
+    {
+        smmu_err(smmu, "failed to probe the SMMU\n");
+        goto out_free_masters;
+    }
+
+    if ( smmu->version > 1 &&
+         smmu->num_context_banks != smmu->num_context_irqs )
+    {
+        smmu_err(smmu,
+                 "found only %d context interrupt(s) but %d required\n",
+                 smmu->num_context_irqs, smmu->num_context_banks);
+        goto out_free_masters;
+    }
+
+    smmu_dbg(smmu, "register global IRQs handler\n");
+
+    for ( i = 0; i < smmu->num_global_irqs; ++i )
+    {
+        smmu_dbg(smmu, "\t- global IRQ %u\n", smmu->irqs[i].irq);
+        res = request_dt_irq(&smmu->irqs[i], arm_smmu_global_fault,
+                             "arm-smmu global fault", smmu);
+        if ( res )
+        {
+            smmu_err(smmu, "failed to request global IRQ %d (%u)\n",
+                     i, smmu->irqs[i].irq);
+            goto out_release_irqs;
+        }
+    }
+
+    INIT_LIST_HEAD(&smmu->list);
+    list_add(&smmu->list, &arm_smmu_devices);
+
+    arm_smmu_device_reset(smmu);
+
+    iommu_set_ops(&arm_smmu_iommu_ops);
+
+    /* sids field can be freed... */
+    xfree(smmu->sids);
+    smmu->sids = NULL;
+
+    return 0;
+
+out_release_irqs:
+    while (i--)
+        release_dt_irq(&smmu->irqs[i], smmu);
+
+out_free_masters:
+    for ( node = rb_first(&smmu->masters); node; node = rb_next(node) )
+    {
+        struct arm_smmu_master *master;
+
+        master = container_of(node, struct arm_smmu_master, node);
+        xfree(master);
+    }
+
+    xfree(smmu->sids);
+
+out_free_irqs:
+    xfree(smmu->irqs);
+
+out_unmap:
+    iounmap(smmu->base);
+
+out_err:
+    xfree(smmu);
+
+    return -ENODEV;
+}
+
+static const char * const smmu_dt_compat[] __initconst =
+{
+    "arm,mmu-400",
+    NULL
+};
+
+DT_DEVICE_START(smmu, "ARM SMMU", DEVICE_IOMMU)
+    .compatible = smmu_dt_compat,
+    .init = smmu_init,
+DT_DEVICE_END
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 56f6c5c..230b5cc 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -40,6 +40,9 @@ extern bool_t amd_iommu_perdev_intremap;
 #define PAGE_MASK_4K        (((u64)-1) << PAGE_SHIFT_4K)
 #define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
 
+#define PAGE_SHIFT_64K      (16)
+#define PAGE_SIZE_64K       (1UL << PAGE_SHIFT_64K)
+
 int iommu_setup(void);
 
 int iommu_add_device(struct pci_dev *pdev);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code
  2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
@ 2014-03-11 16:15   ` Julien Grall
  2014-03-11 16:53   ` Jan Beulich
  2014-03-18 16:27   ` Ian Campbell
  2 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-11 16:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, ian.campbell, Shane Wang, Suravee Suthikulpanit,
	Joseph Cihula, tim, stefano.stabellini, Jan Beulich, xen-devel,
	Gang Wei, Xiantao Zhang

On 03/11/2014 03:49 PM, Julien Grall wrote:
> +void iommu_share_p2m_table(struct domain* d)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +
> +    if ( iommu_enabled && is_hvm_domain(d) )
> +        ops->share_p2m(d);
> +}

Hmmm ... I should have removed this function for
passthrough/x86/iommu.c. By mistake it's duplicate with the one in
passthrough/iommu.c

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-11 15:49 ` [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code Julien Grall
@ 2014-03-11 16:50   ` Jan Beulich
  2014-03-11 17:09     ` Julien Grall
  2014-03-18 16:24   ` Ian Campbell
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2014-03-11 16:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, stefano.stabellini, ian.campbell, Xiantao Zhang, tim

>>> On 11.03.14 at 16:49, Julien Grall <julien.grall@linaro.org> wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
>...
> +static int iommu_populate_page_table(struct domain *d)
> +{

I continue to be of the opinion that this is misplaced here. There's
nothing PCI-related in this function, and I doubt you can get away
on ARM without similar code (if you can, this should go into
.../x86/iommu.c imo).

> --- /dev/null
> +++ b/xen/include/asm-x86/iommu.h
>...
> +void iommu_set_dom0_mapping(struct domain *d);

How is this x86-specific?

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code
  2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
  2014-03-11 16:15   ` Julien Grall
@ 2014-03-11 16:53   ` Jan Beulich
  2014-03-18 16:27   ` Ian Campbell
  2 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2014-03-11 16:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, ian.campbell, Shane Wang, Joseph Cihula, tim,
	stefano.stabellini, Suravee Suthikulpanit, xen-devel, Gang Wei,
	Xiantao Zhang

>>> On 11.03.14 at 16:49, Julien Grall <julien.grall@linaro.org> wrote:
> Currently the structure hvm_iommu (xen/include/xen/hvm/iommu.h) contains
> x86 specific fields.
> 
> This patch creates:
>     - arch_hvm_iommu structure which will contain architecture depend
>     fields
>     - arch_iommu_domain_{init,destroy} function to execute arch
>     specific during domain creation/destruction
> 
> Also move iommu_use_hap_pt and domain_hvm_iommu in asm-x86/iommu.h.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> Cc: Keir Fraser <keir@xen.org>

Acked-by: Jan Beulich <jbeulich@suse.com>

> Cc: Joseph Cihula <joseph.cihula@intel.com>
> Cc: Gang Wei <gang.wei@intel.com>
> Cc: Shane Wang <shane.wang@intel.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
> ---
>  xen/arch/x86/domctl.c                       |    6 +-
>  xen/arch/x86/hvm/io.c                       |    2 +-
>  xen/arch/x86/tboot.c                        |    3 +-
>  xen/drivers/passthrough/amd/iommu_guest.c   |    8 +--
>  xen/drivers/passthrough/amd/iommu_map.c     |   54 +++++++++---------
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |   49 ++++++++--------
>  xen/drivers/passthrough/iommu.c             |   28 +++-------
>  xen/drivers/passthrough/vtd/iommu.c         |   80 +++++++++++++--------------
>  xen/drivers/passthrough/x86/iommu.c         |   41 ++++++++++++++
>  xen/include/asm-x86/hvm/iommu.h             |   28 ++++++++++
>  xen/include/asm-x86/iommu.h                 |    4 +-
>  xen/include/xen/hvm/iommu.h                 |   25 +--------
>  xen/include/xen/iommu.h                     |   16 +++---
>  13 files changed, 190 insertions(+), 154 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 26635ff..e55d9d5 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -745,7 +745,7 @@ long arch_do_domctl(
>                     "ioport_map:add: dom%d gport=%x mport=%x nr=%x\n",
>                     d->domain_id, fgp, fmp, np);
>  
> -            list_for_each_entry(g2m_ioport, &hd->g2m_ioport_list, list)
> +            list_for_each_entry(g2m_ioport, &hd->arch.g2m_ioport_list, list)
>                  if (g2m_ioport->mport == fmp )
>                  {
>                      g2m_ioport->gport = fgp;
> @@ -764,7 +764,7 @@ long arch_do_domctl(
>                  g2m_ioport->gport = fgp;
>                  g2m_ioport->mport = fmp;
>                  g2m_ioport->np = np;
> -                list_add_tail(&g2m_ioport->list, &hd->g2m_ioport_list);
> +                list_add_tail(&g2m_ioport->list, &hd->arch.g2m_ioport_list);
>              }
>              if ( !ret )
>                  ret = ioports_permit_access(d, fmp, fmp + np - 1);
> @@ -779,7 +779,7 @@ long arch_do_domctl(
>              printk(XENLOG_G_INFO
>                     "ioport_map:remove: dom%d gport=%x mport=%x nr=%x\n",
>                     d->domain_id, fgp, fmp, np);
> -            list_for_each_entry(g2m_ioport, &hd->g2m_ioport_list, list)
> +            list_for_each_entry(g2m_ioport, &hd->arch.g2m_ioport_list, list)
>                  if ( g2m_ioport->mport == fmp )
>                  {
>                      list_del(&g2m_ioport->list);
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index bf6309d..ddb03f8 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -451,7 +451,7 @@ int dpci_ioport_intercept(ioreq_t *p)
>      unsigned int s = 0, e = 0;
>      int rc;
>  
> -    list_for_each_entry( g2m_ioport, &hd->g2m_ioport_list, list )
> +    list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
>      {
>          s = g2m_ioport->gport;
>          e = s + g2m_ioport->np;
> diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
> index ccde4a0..c40fe12 100644
> --- a/xen/arch/x86/tboot.c
> +++ b/xen/arch/x86/tboot.c
> @@ -230,7 +230,8 @@ static void tboot_gen_domain_integrity(const uint8_t 
> key[TB_KEY_SIZE],
>          if ( !is_idle_domain(d) )
>          {
>              struct hvm_iommu *hd = domain_hvm_iommu(d);
> -            update_iommu_mac(&ctx, hd->pgd_maddr, agaw_to_level(hd->agaw));
> +            update_iommu_mac(&ctx, hd->arch.pgd_maddr,
> +                             agaw_to_level(hd->arch.agaw));
>          }
>      }
>  
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c 
> b/xen/drivers/passthrough/amd/iommu_guest.c
> index 477de20..bd31bb5 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -60,12 +60,12 @@ static uint16_t guest_bdf(struct domain *d, uint16_t 
> machine_bdf)
>  
>  static inline struct guest_iommu *domain_iommu(struct domain *d)
>  {
> -    return domain_hvm_iommu(d)->g_iommu;
> +    return domain_hvm_iommu(d)->arch.g_iommu;
>  }
>  
>  static inline struct guest_iommu *vcpu_iommu(struct vcpu *v)
>  {
> -    return domain_hvm_iommu(v->domain)->g_iommu;
> +    return domain_hvm_iommu(v->domain)->arch.g_iommu;
>  }
>  
>  static void guest_iommu_enable(struct guest_iommu *iommu)
> @@ -886,7 +886,7 @@ int guest_iommu_init(struct domain* d)
>  
>      guest_iommu_reg_init(iommu);
>      iommu->domain = d;
> -    hd->g_iommu = iommu;
> +    hd->arch.g_iommu = iommu;
>  
>      tasklet_init(&iommu->cmd_buffer_tasklet,
>                   guest_iommu_process_command, (unsigned long)d);
> @@ -907,7 +907,7 @@ void guest_iommu_destroy(struct domain *d)
>      tasklet_kill(&iommu->cmd_buffer_tasklet);
>      xfree(iommu);
>  
> -    domain_hvm_iommu(d)->g_iommu = NULL;
> +    domain_hvm_iommu(d)->arch.g_iommu = NULL;
>  }
>  
>  static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> b/xen/drivers/passthrough/amd/iommu_map.c
> index b79e470..ceb1c28 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -344,7 +344,7 @@ static int iommu_update_pde_count(struct domain *d, 
> unsigned long pt_mfn,
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>      bool_t ok = 0;
>  
> -    ASSERT( spin_is_locked(&hd->mapping_lock) && pt_mfn );
> +    ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
>  
>      next_level = merge_level - 1;
>  
> @@ -398,7 +398,7 @@ static int iommu_merge_pages(struct domain *d, unsigned 
> long pt_mfn,
>      unsigned long first_mfn;
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    ASSERT( spin_is_locked(&hd->mapping_lock) && pt_mfn );
> +    ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
>  
>      table = map_domain_page(pt_mfn);
>      pde = table + pfn_to_pde_idx(gfn, merge_level);
> @@ -448,8 +448,8 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned 
> long pfn,
>      struct page_info *table;
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    table = hd->root_table;
> -    level = hd->paging_mode;
> +    table = hd->arch.root_table;
> +    level = hd->arch.paging_mode;
>  
>      BUG_ON( table == NULL || level < IOMMU_PAGING_MODE_LEVEL_1 || 
>              level > IOMMU_PAGING_MODE_LEVEL_6 );
> @@ -557,11 +557,11 @@ static int update_paging_mode(struct domain *d, 
> unsigned long gfn)
>      unsigned long old_root_mfn;
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    level = hd->paging_mode;
> -    old_root = hd->root_table;
> +    level = hd->arch.paging_mode;
> +    old_root = hd->arch.root_table;
>      offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
>  
> -    ASSERT(spin_is_locked(&hd->mapping_lock) && is_hvm_domain(d));
> +    ASSERT(spin_is_locked(&hd->arch.mapping_lock) && is_hvm_domain(d));
>  
>      while ( offset >= PTE_PER_TABLE_SIZE )
>      {
> @@ -587,8 +587,8 @@ static int update_paging_mode(struct domain *d, unsigned 
> long gfn)
>  
>      if ( new_root != NULL )
>      {
> -        hd->paging_mode = level;
> -        hd->root_table = new_root;
> +        hd->arch.paging_mode = level;
> +        hd->arch.root_table = new_root;
>  
>          if ( !spin_is_locked(&pcidevs_lock) )
>              AMD_IOMMU_DEBUG("%s Try to access pdev_list "
> @@ -613,9 +613,9 @@ static int update_paging_mode(struct domain *d, unsigned 
> long gfn)
>  
>                  /* valid = 0 only works for dom0 passthrough mode */
>                  amd_iommu_set_root_page_table((u32 *)device_entry,
> -                                              page_to_maddr(hd->root_table),
> +                                              
> page_to_maddr(hd->arch.root_table),
>                                                d->domain_id,
> -                                              hd->paging_mode, 1);
> +                                              hd->arch.paging_mode, 1);
>  
>                  amd_iommu_flush_device(iommu, req_id);
>                  bdf += pdev->phantom_stride;
> @@ -638,14 +638,14 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>      unsigned long pt_mfn[7];
>      unsigned int merge_level;
>  
> -    BUG_ON( !hd->root_table );
> +    BUG_ON( !hd->arch.root_table );
>  
>      if ( iommu_use_hap_pt(d) )
>          return 0;
>  
>      memset(pt_mfn, 0, sizeof(pt_mfn));
>  
> -    spin_lock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
>  
>      /* Since HVM domain is initialized with 2 level IO page table,
>       * we might need a deeper page table for lager gfn now */
> @@ -653,7 +653,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>      {
>          if ( update_paging_mode(d, gfn) )
>          {
> -            spin_unlock(&hd->mapping_lock);
> +            spin_unlock(&hd->arch.mapping_lock);
>              AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
>              domain_crash(d);
>              return -EFAULT;
> @@ -662,7 +662,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>  
>      if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
>          domain_crash(d);
>          return -EFAULT;
> @@ -684,7 +684,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>          amd_iommu_flush_pages(d, gfn, 0);
>  
>      for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
> -          merge_level <= hd->paging_mode; merge_level++ )
> +          merge_level <= hd->arch.paging_mode; merge_level++ )
>      {
>          if ( pt_mfn[merge_level] == 0 )
>              break;
> @@ -697,7 +697,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>          if ( iommu_merge_pages(d, pt_mfn[merge_level], gfn, 
>                                 flags, merge_level) )
>          {
> -            spin_unlock(&hd->mapping_lock);
> +            spin_unlock(&hd->arch.mapping_lock);
>              AMD_IOMMU_DEBUG("Merge iommu page failed at level %d, "
>                              "gfn = %lx mfn = %lx\n", merge_level, gfn, 
> mfn);
>              domain_crash(d);
> @@ -706,7 +706,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long 
> gfn, unsigned long mfn,
>      }
>  
>  out:
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>      return 0;
>  }
>  
> @@ -715,14 +715,14 @@ int amd_iommu_unmap_page(struct domain *d, unsigned 
> long gfn)
>      unsigned long pt_mfn[7];
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    BUG_ON( !hd->root_table );
> +    BUG_ON( !hd->arch.root_table );
>  
>      if ( iommu_use_hap_pt(d) )
>          return 0;
>  
>      memset(pt_mfn, 0, sizeof(pt_mfn));
>  
> -    spin_lock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
>  
>      /* Since HVM domain is initialized with 2 level IO page table,
>       * we might need a deeper page table for lager gfn now */
> @@ -730,7 +730,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long 
> gfn)
>      {
>          if ( update_paging_mode(d, gfn) )
>          {
> -            spin_unlock(&hd->mapping_lock);
> +            spin_unlock(&hd->arch.mapping_lock);
>              AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
>              domain_crash(d);
>              return -EFAULT;
> @@ -739,7 +739,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long 
> gfn)
>  
>      if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
>          domain_crash(d);
>          return -EFAULT;
> @@ -747,7 +747,7 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long 
> gfn)
>  
>      /* mark PTE as 'page not present' */
>      clear_iommu_pte_present(pt_mfn[1], gfn);
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>  
>      amd_iommu_flush_pages(d, gfn, 0);
>  
> @@ -792,13 +792,13 @@ void amd_iommu_share_p2m(struct domain *d)
>      pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
>      p2m_table = mfn_to_page(mfn_x(pgd_mfn));
>  
> -    if ( hd->root_table != p2m_table )
> +    if ( hd->arch.root_table != p2m_table )
>      {
> -        free_amd_iommu_pgtable(hd->root_table);
> -        hd->root_table = p2m_table;
> +        free_amd_iommu_pgtable(hd->arch.root_table);
> +        hd->arch.root_table = p2m_table;
>  
>          /* When sharing p2m with iommu, paging mode = 4 */
> -        hd->paging_mode = IOMMU_PAGING_MODE_LEVEL_4;
> +        hd->arch.paging_mode = IOMMU_PAGING_MODE_LEVEL_4;
>          AMD_IOMMU_DEBUG("Share p2m table with iommu: p2m table = %#lx\n",
>                          mfn_x(pgd_mfn));
>      }
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 79f4a77..aeefabb 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -120,7 +120,8 @@ static void amd_iommu_setup_domain_device(
>  
>      struct hvm_iommu *hd = domain_hvm_iommu(domain);
>  
> -    BUG_ON( !hd->root_table || !hd->paging_mode || !iommu->dev_table.buffer );
> +    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
> +            !iommu->dev_table.buffer );
>  
>      if ( iommu_passthrough && (domain->domain_id == 0) )
>          valid = 0;
> @@ -138,8 +139,8 @@ static void amd_iommu_setup_domain_device(
>      {
>          /* bind DTE to domain page-tables */
>          amd_iommu_set_root_page_table(
> -            (u32 *)dte, page_to_maddr(hd->root_table), domain->domain_id,
> -            hd->paging_mode, valid);
> +            (u32 *)dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
> +            hd->arch.paging_mode, valid);
>  
>          if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
>               iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
> @@ -151,8 +152,8 @@ static void amd_iommu_setup_domain_device(
>                          "root table = %#"PRIx64", "
>                          "domain = %d, paging mode = %d\n",
>                          req_id, pdev->type,
> -                        page_to_maddr(hd->root_table),
> -                        domain->domain_id, hd->paging_mode);
> +                        page_to_maddr(hd->arch.root_table),
> +                        domain->domain_id, hd->arch.paging_mode);
>      }
>  
>      spin_unlock_irqrestore(&iommu->lock, flags);
> @@ -225,17 +226,17 @@ int __init amd_iov_detect(void)
>  static int allocate_domain_resources(struct hvm_iommu *hd)
>  {
>      /* allocate root table */
> -    spin_lock(&hd->mapping_lock);
> -    if ( !hd->root_table )
> +    spin_lock(&hd->arch.mapping_lock);
> +    if ( !hd->arch.root_table )
>      {
> -        hd->root_table = alloc_amd_iommu_pgtable();
> -        if ( !hd->root_table )
> +        hd->arch.root_table = alloc_amd_iommu_pgtable();
> +        if ( !hd->arch.root_table )
>          {
> -            spin_unlock(&hd->mapping_lock);
> +            spin_unlock(&hd->arch.mapping_lock);
>              return -ENOMEM;
>          }
>      }
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>      return 0;
>  }
>  
> @@ -262,14 +263,14 @@ static int amd_iommu_domain_init(struct domain *d)
>      /* allocate page directroy */
>      if ( allocate_domain_resources(hd) != 0 )
>      {
> -        if ( hd->root_table )
> -            free_domheap_page(hd->root_table);
> +        if ( hd->arch.root_table )
> +            free_domheap_page(hd->arch.root_table);
>          return -ENOMEM;
>      }
>  
>      /* For pv and dom0, stick with get_paging_mode(max_page)
>       * For HVM dom0, use 2 level page table at first */
> -    hd->paging_mode = is_hvm_domain(d) ?
> +    hd->arch.paging_mode = is_hvm_domain(d) ?
>                        IOMMU_PAGING_MODE_LEVEL_2 :
>                        get_paging_mode(max_page);
>  
> @@ -332,7 +333,7 @@ void amd_iommu_disable_domain_device(struct domain 
> *domain,
>          AMD_IOMMU_DEBUG("Disable: device id = %#x, "
>                          "domain = %d, paging mode = %d\n",
>                          req_id,  domain->domain_id,
> -                        domain_hvm_iommu(domain)->paging_mode);
> +                        domain_hvm_iommu(domain)->arch.paging_mode);
>      }
>      spin_unlock_irqrestore(&iommu->lock, flags);
>  
> @@ -372,7 +373,7 @@ static int reassign_device(struct domain *source, struct 
> domain *target,
>  
>      /* IO page tables might be destroyed after pci-detach the last device
>       * In this case, we have to re-allocate root table for next pci-attach.*/
> -    if ( t->root_table == NULL )
> +    if ( t->arch.root_table == NULL )
>          allocate_domain_resources(t);
>  
>      amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
> @@ -454,13 +455,13 @@ static void deallocate_iommu_page_tables(struct domain 
> *d)
>      if ( iommu_use_hap_pt(d) )
>          return;
>  
> -    spin_lock(&hd->mapping_lock);
> -    if ( hd->root_table )
> +    spin_lock(&hd->arch.mapping_lock);
> +    if ( hd->arch.root_table )
>      {
> -        deallocate_next_page_table(hd->root_table, hd->paging_mode);
> -        hd->root_table = NULL;
> +        deallocate_next_page_table(hd->arch.root_table, hd->arch.paging_mode);
> +        hd->arch.root_table = NULL;
>      }
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>  }
>  
>  
> @@ -591,11 +592,11 @@ static void amd_dump_p2m_table(struct domain *d)
>  {
>      struct hvm_iommu *hd  = domain_hvm_iommu(d);
>  
> -    if ( !hd->root_table ) 
> +    if ( !hd->arch.root_table ) 
>          return;
>  
> -    printk("p2m table has %d levels\n", hd->paging_mode);
> -    amd_dump_p2m_table_level(hd->root_table, hd->paging_mode, 0, 0);
> +    printk("p2m table has %d levels\n", hd->arch.paging_mode);
> +    amd_dump_p2m_table_level(hd->arch.root_table, hd->arch.paging_mode, 0, 0);
>  }
>  
>  const struct iommu_ops amd_iommu_ops = {
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 8a2fdea..9cd996a 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -117,10 +117,11 @@ static void __init parse_iommu_param(char *s)
>  int iommu_domain_init(struct domain *d)
>  {
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
> +    int ret = 0;
>  
> -    spin_lock_init(&hd->mapping_lock);
> -    INIT_LIST_HEAD(&hd->g2m_ioport_list);
> -    INIT_LIST_HEAD(&hd->mapped_rmrrs);
> +    ret = arch_iommu_domain_init(d);
> +    if ( ret )
> +        return ret;
>  
>      if ( !iommu_enabled )
>          return 0;
> @@ -189,10 +190,7 @@ void iommu_teardown(struct domain *d)
>  
>  void iommu_domain_destroy(struct domain *d)
>  {
> -    struct hvm_iommu *hd  = domain_hvm_iommu(d);
> -    struct list_head *ioport_list, *rmrr_list, *tmp;
> -    struct g2m_ioport *ioport;
> -    struct mapped_rmrr *mrmrr;
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
>      if ( !iommu_enabled || !hd->platform_ops )
>          return;
> @@ -200,20 +198,8 @@ void iommu_domain_destroy(struct domain *d)
>      if ( need_iommu(d) )
>          iommu_teardown(d);
>  
> -    list_for_each_safe ( ioport_list, tmp, &hd->g2m_ioport_list )
> -    {
> -        ioport = list_entry(ioport_list, struct g2m_ioport, list);
> -        list_del(&ioport->list);
> -        xfree(ioport);
> -    }
> -
> -    list_for_each_safe ( rmrr_list, tmp, &hd->mapped_rmrrs )
> -    {
> -        mrmrr = list_entry(rmrr_list, struct mapped_rmrr, list);
> -        list_del(&mrmrr->list);
> -        xfree(mrmrr);
> -    }
> -}
> +    arch_iommu_domain_destroy(d);
> + }
>  
>  int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
>                     unsigned int flags)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> b/xen/drivers/passthrough/vtd/iommu.c
> index d4be75c..8efe6f9 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -248,16 +248,16 @@ static u64 addr_to_dma_page_maddr(struct domain 
> *domain, u64 addr, int alloc)
>      struct acpi_drhd_unit *drhd;
>      struct pci_dev *pdev;
>      struct hvm_iommu *hd = domain_hvm_iommu(domain);
> -    int addr_width = agaw_to_width(hd->agaw);
> +    int addr_width = agaw_to_width(hd->arch.agaw);
>      struct dma_pte *parent, *pte = NULL;
> -    int level = agaw_to_level(hd->agaw);
> +    int level = agaw_to_level(hd->arch.agaw);
>      int offset;
>      u64 pte_maddr = 0, maddr;
>      u64 *vaddr = NULL;
>  
>      addr &= (((u64)1) << addr_width) - 1;
> -    ASSERT(spin_is_locked(&hd->mapping_lock));
> -    if ( hd->pgd_maddr == 0 )
> +    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +    if ( hd->arch.pgd_maddr == 0 )
>      {
>          /*
>           * just get any passthrough device in the domainr - assume user
> @@ -265,11 +265,11 @@ static u64 addr_to_dma_page_maddr(struct domain 
> *domain, u64 addr, int alloc)
>           */
>          pdev = pci_get_pdev_by_domain(domain, -1, -1, -1);
>          drhd = acpi_find_matched_drhd_unit(pdev);
> -        if ( !alloc || ((hd->pgd_maddr = alloc_pgtable_maddr(drhd, 1)) == 0) )
> +        if ( !alloc || ((hd->arch.pgd_maddr = alloc_pgtable_maddr(drhd, 1)) 
> == 0) )
>              goto out;
>      }
>  
> -    parent = (struct dma_pte *)map_vtd_domain_page(hd->pgd_maddr);
> +    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
>      while ( level > 1 )
>      {
>          offset = address_level_offset(addr, level);
> @@ -579,7 +579,7 @@ static void __intel_iommu_iotlb_flush(struct domain *d, 
> unsigned long gfn,
>      {
>          iommu = drhd->iommu;
>  
> -        if ( !test_bit(iommu->index, &hd->iommu_bitmap) )
> +        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
>              continue;
>  
>          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> @@ -621,12 +621,12 @@ static void dma_pte_clear_one(struct domain *domain, 
> u64 addr)
>      u64 pg_maddr;
>      struct mapped_rmrr *mrmrr;
>  
> -    spin_lock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
>      /* get last level pte */
>      pg_maddr = addr_to_dma_page_maddr(domain, addr, 0);
>      if ( pg_maddr == 0 )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          return;
>      }
>  
> @@ -635,13 +635,13 @@ static void dma_pte_clear_one(struct domain *domain, 
> u64 addr)
>  
>      if ( !dma_pte_present(*pte) )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          unmap_vtd_domain_page(page);
>          return;
>      }
>  
>      dma_clear_pte(*pte);
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>      iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
>  
>      if ( !this_cpu(iommu_dont_flush_iotlb) )
> @@ -652,8 +652,8 @@ static void dma_pte_clear_one(struct domain *domain, u64 
> addr)
>      /* if the cleared address is between mapped RMRR region,
>       * remove the mapped RMRR
>       */
> -    spin_lock(&hd->mapping_lock);
> -    list_for_each_entry ( mrmrr, &hd->mapped_rmrrs, list )
> +    spin_lock(&hd->arch.mapping_lock);
> +    list_for_each_entry ( mrmrr, &hd->arch.mapped_rmrrs, list )
>      {
>          if ( addr >= mrmrr->base && addr <= mrmrr->end )
>          {
> @@ -662,7 +662,7 @@ static void dma_pte_clear_one(struct domain *domain, u64 
> addr)
>              break;
>          }
>      }
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>  }
>  
>  static void iommu_free_pagetable(u64 pt_maddr, int level)
> @@ -1247,7 +1247,7 @@ static int intel_iommu_domain_init(struct domain *d)
>  {
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    hd->agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +    hd->arch.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
>  
>      return 0;
>  }
> @@ -1344,16 +1344,16 @@ int domain_context_mapping_one(
>      }
>      else
>      {
> -        spin_lock(&hd->mapping_lock);
> +        spin_lock(&hd->arch.mapping_lock);
>  
>          /* Ensure we have pagetables allocated down to leaf PTE. */
> -        if ( hd->pgd_maddr == 0 )
> +        if ( hd->arch.pgd_maddr == 0 )
>          {
>              addr_to_dma_page_maddr(domain, 0, 1);
> -            if ( hd->pgd_maddr == 0 )
> +            if ( hd->arch.pgd_maddr == 0 )
>              {
>              nomem:
> -                spin_unlock(&hd->mapping_lock);
> +                spin_unlock(&hd->arch.mapping_lock);
>                  spin_unlock(&iommu->lock);
>                  unmap_vtd_domain_page(context_entries);
>                  return -ENOMEM;
> @@ -1361,7 +1361,7 @@ int domain_context_mapping_one(
>          }
>  
>          /* Skip top levels of page tables for 2- and 3-level DRHDs. */
> -        pgd_maddr = hd->pgd_maddr;
> +        pgd_maddr = hd->arch.pgd_maddr;
>          for ( agaw = level_to_agaw(4);
>                agaw != level_to_agaw(iommu->nr_pt_levels);
>                agaw-- )
> @@ -1379,7 +1379,7 @@ int domain_context_mapping_one(
>          else
>              context_set_translation_type(*context, CONTEXT_TT_MULTI_LEVEL);
>  
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>      }
>  
>      if ( context_set_domain_id(context, domain, iommu) )
> @@ -1405,7 +1405,7 @@ int domain_context_mapping_one(
>          iommu_flush_iotlb_dsi(iommu, 0, 1, flush_dev_iotlb);
>      }
>  
> -    set_bit(iommu->index, &hd->iommu_bitmap);
> +    set_bit(iommu->index, &hd->arch.iommu_bitmap);
>  
>      unmap_vtd_domain_page(context_entries);
>  
> @@ -1648,7 +1648,7 @@ static int domain_context_unmap(
>          struct hvm_iommu *hd = domain_hvm_iommu(domain);
>          int iommu_domid;
>  
> -        clear_bit(iommu->index, &hd->iommu_bitmap);
> +        clear_bit(iommu->index, &hd->arch.iommu_bitmap);
>  
>          iommu_domid = domain_iommu_domid(domain, iommu);
>          if ( iommu_domid == -1 )
> @@ -1707,10 +1707,10 @@ static void iommu_domain_teardown(struct domain *d)
>      if ( iommu_use_hap_pt(d) )
>          return;
>  
> -    spin_lock(&hd->mapping_lock);
> -    iommu_free_pagetable(hd->pgd_maddr, agaw_to_level(hd->agaw));
> -    hd->pgd_maddr = 0;
> -    spin_unlock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
> +    iommu_free_pagetable(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw));
> +    hd->arch.pgd_maddr = 0;
> +    spin_unlock(&hd->arch.mapping_lock);
>  }
>  
>  static int intel_iommu_map_page(
> @@ -1729,12 +1729,12 @@ static int intel_iommu_map_page(
>      if ( iommu_passthrough && (d->domain_id == 0) )
>          return 0;
>  
> -    spin_lock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
>  
>      pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1);
>      if ( pg_maddr == 0 )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          return -ENOMEM;
>      }
>      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> @@ -1751,14 +1751,14 @@ static int intel_iommu_map_page(
>  
>      if ( old.val == new.val )
>      {
> -        spin_unlock(&hd->mapping_lock);
> +        spin_unlock(&hd->arch.mapping_lock);
>          unmap_vtd_domain_page(page);
>          return 0;
>      }
>      *pte = new;
>  
>      iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
> -    spin_unlock(&hd->mapping_lock);
> +    spin_unlock(&hd->arch.mapping_lock);
>      unmap_vtd_domain_page(page);
>  
>      if ( !this_cpu(iommu_dont_flush_iotlb) )
> @@ -1792,7 +1792,7 @@ void iommu_pte_flush(struct domain *d, u64 gfn, u64 
> *pte,
>      for_each_drhd_unit ( drhd )
>      {
>          iommu = drhd->iommu;
> -        if ( !test_bit(iommu->index, &hd->iommu_bitmap) )
> +        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
>              continue;
>  
>          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> @@ -1833,7 +1833,7 @@ static void iommu_set_pgd(struct domain *d)
>          return;
>  
>      pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> -    hd->pgd_maddr = pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
> +    hd->arch.pgd_maddr = pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
>  }
>  
>  static int rmrr_identity_mapping(struct domain *d,
> @@ -1848,10 +1848,10 @@ static int rmrr_identity_mapping(struct domain *d,
>      ASSERT(rmrr->base_address < rmrr->end_address);
>  
>      /*
> -     * No need to acquire hd->mapping_lock, as the only theoretical race is
> +     * No need to acquire hd->arch.mapping_lock, as the only theoretical race 
> is
>       * with the insertion below (impossible due to holding pcidevs_lock).
>       */
> -    list_for_each_entry( mrmrr, &hd->mapped_rmrrs, list )
> +    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
>      {
>          if ( mrmrr->base == rmrr->base_address &&
>               mrmrr->end == rmrr->end_address )
> @@ -1876,9 +1876,9 @@ static int rmrr_identity_mapping(struct domain *d,
>          return -ENOMEM;
>      mrmrr->base = rmrr->base_address;
>      mrmrr->end = rmrr->end_address;
> -    spin_lock(&hd->mapping_lock);
> -    list_add_tail(&mrmrr->list, &hd->mapped_rmrrs);
> -    spin_unlock(&hd->mapping_lock);
> +    spin_lock(&hd->arch.mapping_lock);
> +    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
> +    spin_unlock(&hd->arch.mapping_lock);
>  
>      return 0;
>  }
> @@ -2423,8 +2423,8 @@ static void vtd_dump_p2m_table(struct domain *d)
>          return;
>  
>      hd = domain_hvm_iommu(d);
> -    printk("p2m table has %d levels\n", agaw_to_level(hd->agaw));
> -    vtd_dump_p2m_table_level(hd->pgd_maddr, agaw_to_level(hd->agaw), 0, 0);
> +    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.agaw));
> +    vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 
> 0, 0);
>  }
>  
>  const struct iommu_ops intel_iommu_ops = {
> diff --git a/xen/drivers/passthrough/x86/iommu.c 
> b/xen/drivers/passthrough/x86/iommu.c
> index c857ba8..68e308c 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -40,6 +40,47 @@ int __init iommu_setup_hpet_msi(struct msi_desc *msi)
>      return ops->setup_hpet_msi ? ops->setup_hpet_msi(msi) : -ENODEV;
>  }
>  
> +void iommu_share_p2m_table(struct domain* d)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +
> +    if ( iommu_enabled && is_hvm_domain(d) )
> +        ops->share_p2m(d);
> +}
> +
> +int arch_iommu_domain_init(struct domain *d)
> +{
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +
> +    spin_lock_init(&hd->arch.mapping_lock);
> +    INIT_LIST_HEAD(&hd->arch.g2m_ioport_list);
> +    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
> +
> +    return 0;
> +}
> +
> +void arch_iommu_domain_destroy(struct domain *d)
> +{
> +   struct hvm_iommu *hd  = domain_hvm_iommu(d);
> +   struct list_head *ioport_list, *rmrr_list, *tmp;
> +   struct g2m_ioport *ioport;
> +   struct mapped_rmrr *mrmrr;
> +
> +   list_for_each_safe ( ioport_list, tmp, &hd->arch.g2m_ioport_list )
> +   {
> +       ioport = list_entry(ioport_list, struct g2m_ioport, list);
> +       list_del(&ioport->list);
> +       xfree(ioport);
> +   }
> +
> +    list_for_each_safe ( rmrr_list, tmp, &hd->arch.mapped_rmrrs )
> +    {
> +        mrmrr = list_entry(rmrr_list, struct mapped_rmrr, list);
> +        list_del(&mrmrr->list);
> +        xfree(mrmrr);
> +    }
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-x86/hvm/iommu.h b/xen/include/asm-x86/hvm/iommu.h
> index d488edf..927a02d 100644
> --- a/xen/include/asm-x86/hvm/iommu.h
> +++ b/xen/include/asm-x86/hvm/iommu.h
> @@ -39,4 +39,32 @@ static inline int iommu_hardware_setup(void)
>      return 0;
>  }
>  
> +struct g2m_ioport {
> +    struct list_head list;
> +    unsigned int gport;
> +    unsigned int mport;
> +    unsigned int np;
> +};
> +
> +struct mapped_rmrr {
> +    struct list_head list;
> +    u64 base;
> +    u64 end;
> +};
> +
> +struct arch_hvm_iommu
> +{
> +    u64 pgd_maddr;                 /* io page directory machine address */
> +    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> +    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain 
> uses */
> +    /* amd iommu support */
> +    int paging_mode;
> +    struct page_info *root_table;
> +    struct guest_iommu *g_iommu;
> +
> +    struct list_head g2m_ioport_list;   /* guest to machine ioport mapping 
> */
> +    struct list_head mapped_rmrrs;
> +    spinlock_t mapping_lock;            /* io page table lock */
> +};
> +
>  #endif /* __ASM_X86_HVM_IOMMU_H__ */
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index 946291c..dc06ceb 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -17,7 +17,9 @@
>  
>  #define MAX_IOMMUS 32
>  
> -#include <asm/msi.h>
> +/* Does this domain have a P2M table we can use as its IOMMU pagetable? */
> +#define iommu_use_hap_pt(d) (hap_enabled(d) && iommu_hap_pt_share)
> +#define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
>  
>  void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, 
> unsigned int value);
>  unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
> diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
> index c9c10c1..f8f8a93 100644
> --- a/xen/include/xen/hvm/iommu.h
> +++ b/xen/include/xen/hvm/iommu.h
> @@ -23,31 +23,8 @@
>  #include <xen/iommu.h>
>  #include <asm/hvm/iommu.h>
>  
> -struct g2m_ioport {
> -    struct list_head list;
> -    unsigned int gport;
> -    unsigned int mport;
> -    unsigned int np;
> -};
> -
> -struct mapped_rmrr {
> -    struct list_head list;
> -    u64 base;
> -    u64 end;
> -};
> -
>  struct hvm_iommu {
> -    u64 pgd_maddr;                 /* io page directory machine address */
> -    spinlock_t mapping_lock;       /* io page table lock */
> -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> -    struct list_head g2m_ioport_list;  /* guest to machine ioport mapping */
> -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain 
> uses */
> -    struct list_head mapped_rmrrs;
> -
> -    /* amd iommu support */
> -    int paging_mode;
> -    struct page_info *root_table;
> -    struct guest_iommu *g_iommu;
> +    struct arch_hvm_iommu arch;
>  
>      /* iommu_ops */
>      const struct iommu_ops *platform_ops;
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index cf61d163..f556a7e 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -35,11 +35,6 @@ extern bool_t iommu_hap_pt_share;
>  extern bool_t iommu_debug;
>  extern bool_t amd_iommu_perdev_intremap;
>  
> -/* Does this domain have a P2M table we can use as its IOMMU pagetable? */
> -#define iommu_use_hap_pt(d) (hap_enabled(d) && iommu_hap_pt_share)
> -
> -#define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
> -
>  #define PAGE_SHIFT_4K       (12)
>  #define PAGE_SIZE_4K        (1UL << PAGE_SHIFT_4K)
>  #define PAGE_MASK_4K        (((u64)-1) << PAGE_SHIFT_4K)
> @@ -55,6 +50,9 @@ void iommu_dom0_init(struct domain *d);
>  void iommu_domain_destroy(struct domain *d);
>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn);
>  
> +void arch_iommu_domain_destroy(struct domain *d);
> +int arch_iommu_domain_init(struct domain *d);
> +
>  /* Function used internally, use iommu_domain_destroy */
>  void iommu_teardown(struct domain *d);
>  
> @@ -81,9 +79,6 @@ struct hvm_irq_dpci *domain_get_irq_dpci(const struct 
> domain *);
>  void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
>  bool_t pt_irq_need_timer(uint32_t flags);
>  
> -int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
> -void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
> -
>  #define PT_IRQ_TIME_OUT MILLISECS(8)
>  #endif /* HAS_PCI */
>  
> @@ -127,6 +122,11 @@ struct iommu_ops {
>      void (*dump_p2m_table)(struct domain *d);
>  };
>  
> +#ifdef HAS_PCI
> +int iommu_update_ire_from_msi(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
> +void iommu_read_msi_from_ire(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
> +#endif
> +
>  void iommu_suspend(void);
>  void iommu_resume(void);
>  void iommu_crash_shutdown(void);
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-11 15:49 ` [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment Julien Grall
@ 2014-03-11 16:55   ` Jan Beulich
  2014-03-18 16:33   ` Ian Campbell
  1 sibling, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2014-03-11 16:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, stefano.stabellini, ian.campbell, Xiantao Zhang, tim

>>> On 11.03.14 at 16:49, Julien Grall <julien.grall@linaro.org> wrote:
> Add IOMMU helpers to support device tree assignment/deassignment. This patch
> introduces 2 new fields in the dt_device_node:
>     - is_protected: Does the device is protected by an IOMMU
>     - next_assigned: Pointer to the next device assigned to the same
>     domain
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> Cc: Xiantao Zhang <xiantao.zhang@intel.com>

For the modifications to existing files:
Acked-by: Jan Beulich <jbeulich@suse.com>

> 
> ---
>     Changes in v3:
>         - Remove iommu_dt_domain_{init,destroy} call in common code. Let
>         architecture code to call them
>         - Fix indentation in xen/include/xen/hvm/iommu.h
>     Changes in v2:
>         - Patch added
> ---
>  xen/common/device_tree.c              |    4 ++
>  xen/drivers/passthrough/Makefile      |    1 +
>  xen/drivers/passthrough/device_tree.c |  106 
> +++++++++++++++++++++++++++++++++
>  xen/include/xen/device_tree.h         |   14 +++++
>  xen/include/xen/hvm/iommu.h           |    6 ++
>  xen/include/xen/iommu.h               |   16 +++++
>  6 files changed, 147 insertions(+)
>  create mode 100644 xen/drivers/passthrough/device_tree.c
> 
> diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> index 564f2bb..7c6b683 100644
> --- a/xen/common/device_tree.c
> +++ b/xen/common/device_tree.c
> @@ -1695,6 +1695,10 @@ static unsigned long __init unflatten_dt_node(const 
> void *fdt,
>          np->full_name = ((char *)np) + sizeof(struct dt_device_node);
>          /* By default dom0 owns the device */
>          np->used_by = 0;
> +        /* By default the device is not protected */
> +        np->is_protected = false;
> +        INIT_LIST_HEAD(&np->next_assigned);
> +
>          if ( new_format )
>          {
>              char *fn = np->full_name;
> diff --git a/xen/drivers/passthrough/Makefile 
> b/xen/drivers/passthrough/Makefile
> index 6e08f89..5a0a35e 100644
> --- a/xen/drivers/passthrough/Makefile
> +++ b/xen/drivers/passthrough/Makefile
> @@ -5,3 +5,4 @@ subdir-$(x86_64) += x86
>  obj-y += iommu.o
>  obj-$(x86) += io.o
>  obj-$(HAS_PCI) += pci.o
> +obj-$(HAS_DEVICE_TREE) += device_tree.o
> diff --git a/xen/drivers/passthrough/device_tree.c 
> b/xen/drivers/passthrough/device_tree.c
> new file mode 100644
> index 0000000..7384e73
> --- /dev/null
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -0,0 +1,106 @@
> +/*
> + * xen/drivers/passthrough/arm/device_tree.c
> + *
> + * Code to passthrough device tree node to a guest
> + *
> + * Julien Grall <julien.grall@linaro.org>
> + * Copyright (c) 2014 Linaro Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <xen/lib.h>
> +#include <xen/sched.h>
> +#include <xen/iommu.h>
> +#include <xen/device_tree.h>
> +
> +static spinlock_t dtdevs_lock = SPIN_LOCK_UNLOCKED;
> +
> +int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
> +{
> +    int rc = -EBUSY;
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +
> +    if ( !iommu_enabled || !hd->platform_ops )
> +        return -EINVAL;
> +
> +    if ( !dt_device_is_protected(dev) )
> +        return -EINVAL;
> +
> +    spin_lock(&dtdevs_lock);
> +
> +    if ( !list_empty(&dev->next_assigned) )
> +        goto fail;
> +
> +    rc = hd->platform_ops->assign_dt_device(d, dev);
> +
> +    if ( rc )
> +        goto fail;
> +
> +    list_add(&dev->next_assigned, &hd->dt_devices);
> +    dt_device_set_used_by(dev, d->domain_id);
> +
> +fail:
> +    spin_unlock(&dtdevs_lock);
> +
> +    return rc;
> +}
> +
> +int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev)
> +{
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +    int rc;
> +
> +    if ( !iommu_enabled || !hd->platform_ops )
> +        return -EINVAL;
> +
> +    if ( !dt_device_is_protected(dev) )
> +        return -EINVAL;
> +
> +    spin_lock(&dtdevs_lock);
> +
> +    rc = hd->platform_ops->reassign_dt_device(d, dom0, dev);
> +    if ( rc )
> +        goto fail;
> +
> +    dt_device_set_used_by(dev, dom0->domain_id);
> +
> +    list_del(&dev->next_assigned);
> +
> +fail:
> +    spin_unlock(&dtdevs_lock);
> +
> +    return rc;
> +}
> +
> +int iommu_dt_domain_init(struct domain *d)
> +{
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +
> +    INIT_LIST_HEAD(&hd->dt_devices);
> +
> +    return 0;
> +}
> +
> +void iommu_dt_domain_destroy(struct domain *d)
> +{
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +    struct dt_device_node *dev, *_dev;
> +    int rc;
> +
> +    list_for_each_entry_safe(dev, _dev, &hd->dt_devices, next_assigned)
> +    {
> +        rc = iommu_deassign_dt_device(d, dev);
> +        if ( rc )
> +            dprintk(XENLOG_ERR, "Failed to deassign %s in domain %u\n",
> +                    dt_node_full_name(dev), d->domain_id);
> +    }
> +}
> diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
> index d429e60..2aae047 100644
> --- a/xen/include/xen/device_tree.h
> +++ b/xen/include/xen/device_tree.h
> @@ -16,6 +16,7 @@
>  #include <xen/string.h>
>  #include <xen/types.h>
>  #include <xen/stdbool.h>
> +#include <xen/list.h>
>  
>  #define DEVICE_TREE_MAX_DEPTH 16
>  
> @@ -110,6 +111,9 @@ struct dt_device_node {
>      struct dt_device_node *next; /* TODO: Remove it. Only use to know the 
> last children */
>      struct dt_device_node *allnext;
>  
> +    /* IOMMU specific fields */
> +    bool is_protected; /* Tell if the device is protected by an IOMMU */
> +    struct list_head next_assigned;
>  };
>  
>  #define MAX_PHANDLE_ARGS 16
> @@ -325,6 +329,16 @@ static inline domid_t dt_device_used_by(const struct 
> dt_device_node *device)
>      return device->used_by;
>  }
>  
> +static inline void dt_device_set_protected(struct dt_device_node *device)
> +{
> +    device->is_protected = true;
> +}
> +
> +static inline bool dt_device_is_protected(const struct dt_device_node 
> *device)
> +{
> +    return device->is_protected;
> +}
> +
>  static inline bool_t dt_property_name_is_equal(const struct dt_property 
> *pp,
>                                                 const char *name)
>  {
> diff --git a/xen/include/xen/hvm/iommu.h b/xen/include/xen/hvm/iommu.h
> index f8f8a93..1259e16 100644
> --- a/xen/include/xen/hvm/iommu.h
> +++ b/xen/include/xen/hvm/iommu.h
> @@ -21,6 +21,7 @@
>  #define __XEN_HVM_IOMMU_H__
>  
>  #include <xen/iommu.h>
> +#include <xen/list.h>
>  #include <asm/hvm/iommu.h>
>  
>  struct hvm_iommu {
> @@ -28,6 +29,11 @@ struct hvm_iommu {
>  
>      /* iommu_ops */
>      const struct iommu_ops *platform_ops;
> +
> +#ifdef HAS_DEVICE_TREE
> +    /* List of DT devices assigned to this domain */
> +    struct list_head dt_devices;
> +#endif
>  };
>  
>  #endif /* __XEN_HVM_IOMMU_H__ */
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index f556a7e..56f6c5c 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -82,6 +82,16 @@ bool_t pt_irq_need_timer(uint32_t flags);
>  #define PT_IRQ_TIME_OUT MILLISECS(8)
>  #endif /* HAS_PCI */
>  
> +#ifdef HAS_DEVICE_TREE
> +#include <xen/device_tree.h>
> +
> +int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev);
> +int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev);
> +int iommu_dt_domain_init(struct domain *d);
> +void iommu_dt_domain_destroy(struct domain *d);
> +
> +#endif /* HAS_DEVICE_TREE */
> +
>  #ifdef HAS_PCI
>  struct msi_desc;
>  struct msi_msg;
> @@ -103,6 +113,12 @@ struct iommu_ops {
>      int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
>      void (*read_msi_from_ire)(struct msi_desc *msi_desc, struct msi_msg 
> *msg);
>  #endif /* HAS_PCI */
> +#ifdef HAS_DEVICE_TREE
> +    int (*assign_dt_device)(struct domain *d, const struct dt_device_node 
> *dev);
> +    int (*reassign_dt_device)(struct domain *s, struct domain *t,
> +                              const struct dt_device_node *dev);
> +#endif
> +
>      void (*teardown)(struct domain *d);
>      int (*map_page)(struct domain *d, unsigned long gfn, unsigned long mfn,
>                      unsigned int flags);
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-11 16:50   ` Jan Beulich
@ 2014-03-11 17:09     ` Julien Grall
  2014-03-12  7:15       ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-11 17:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, stefano.stabellini, ian.campbell, Xiantao Zhang, tim

Hello Jan,

On 03/11/2014 04:50 PM, Jan Beulich wrote:
>>>> On 11.03.14 at 16:49, Julien Grall <julien.grall@linaro.org> wrote:
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> ...
>> +static int iommu_populate_page_table(struct domain *d)
>> +{
> 
> I continue to be of the opinion that this is misplaced here. There's
> nothing PCI-related in this function, and I doubt you can get away
> on ARM without similar code (if you can, this should go into
> .../x86/iommu.c imo).

On ARM, the page table is shared with the processor, we don't need to
populate the page table. Furthermore, this function is using
"arch.relmem_list", which is not implemented on ARM.

I can move it in x86/iommu.c and implemented the function is no-op on ARM.

> 
>> --- /dev/null
>> +++ b/xen/include/asm-x86/iommu.h
>> ...
>> +void iommu_set_dom0_mapping(struct domain *d);
> 
> How is this x86-specific?

iommu_set_dom0_mapping is implemented in vtd/x86/vtd.c and only used by
vtd/iommu.c.

I think this function should be renamed to iommu_vtd_set_dom0_mapping.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-11 17:09     ` Julien Grall
@ 2014-03-12  7:15       ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2014-03-12  7:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, stefano.stabellini, ian.campbell, Xiantao Zhang, tim

>>> On 11.03.14 at 18:09, Julien Grall <julien.grall@linaro.org> wrote:
> On 03/11/2014 04:50 PM, Jan Beulich wrote:
>>>>> On 11.03.14 at 16:49, Julien Grall <julien.grall@linaro.org> wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> ...
>>> +static int iommu_populate_page_table(struct domain *d)
>>> +{
>> 
>> I continue to be of the opinion that this is misplaced here. There's
>> nothing PCI-related in this function, and I doubt you can get away
>> on ARM without similar code (if you can, this should go into
>> .../x86/iommu.c imo).
> 
> On ARM, the page table is shared with the processor, we don't need to
> populate the page table. Furthermore, this function is using
> "arch.relmem_list", which is not implemented on ARM.
> 
> I can move it in x86/iommu.c and implemented the function is no-op on ARM.

Yes, please.

>>> --- /dev/null
>>> +++ b/xen/include/asm-x86/iommu.h
>>> ...
>>> +void iommu_set_dom0_mapping(struct domain *d);
>> 
>> How is this x86-specific?
> 
> iommu_set_dom0_mapping is implemented in vtd/x86/vtd.c and only used by
> vtd/iommu.c.
> 
> I think this function should be renamed to iommu_vtd_set_dom0_mapping.

Ah, indeed. But apart from naming the function properly, it being
VT-d only its declaration should be moved into
xen/drivers/passthrough/vtd/extern.h instead.

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu
  2014-03-11 15:49 ` [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu Julien Grall
@ 2014-03-18 16:19   ` Ian Campbell
  2014-03-18 16:32     ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Suravee Suthikulpanit, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> The structure hvm_iommu contains a shadow value of domain->domain_id. There
> is no reason to not directly use domain->domain_id.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Ian Campbell <ian.campbell@citrix.com>

(needs ack from one of the above though)

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle
  2014-03-11 15:49 ` [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle Julien Grall
@ 2014-03-18 16:20   ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:20 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> Code adapted from linux drivers/of/base.c (commit ef42c58).
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-11 15:49 ` [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM Julien Grall
@ 2014-03-18 16:22   ` Ian Campbell
  2014-03-18 17:28     ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:22 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> DOM0 on ARM will have the same requirements as DOM0 PVH when iommu is enabled.
> Both PVH and ARM guest has paging mode translate enabled, so Xen can use it
> to know if it needs to check the requirements.
> 
> Rename the function and remove "pvh" word in the panic message.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
> 
> ---
>     Changes in v2:
>         - IOMMU can be disabled on ARM if the platform doesn't have
>         IOMMU.
> ---
>  xen/drivers/passthrough/iommu.c |   13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index c70165a..3c63f87 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -130,13 +130,17 @@ int iommu_domain_init(struct domain *d)
>      return hd->platform_ops->init(d);
>  }
>  
> -static __init void check_dom0_pvh_reqs(struct domain *d)
> +static __init void check_dom0_reqs(struct domain *d)
>  {
> -    if ( !iommu_enabled )
> +    if ( !paging_mode_translate(d) )
> +        return;
> +
> +    if ( is_pvh_domain(d) && !iommu_enabled )

Is is_pvh_domain going to be exposed to common code on ARM?

Or would a arch_check_dom0_reqs be useful here?

>          panic("Presently, iommu must be enabled for pvh dom0\n");
>  
>      if ( iommu_passthrough )
> -        panic("For pvh dom0, dom0-passthrough must not be enabled\n");
> +        panic("Dom0 uses translate paging mode, dom0-passthrough must not be "

"paging translated mode" reads more natural to me.

> +              "enabled\n");
>  
>      iommu_dom0_strict = 1;
>  }
> @@ -145,8 +149,7 @@ void __init iommu_dom0_init(struct domain *d)
>  {
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>  
> -    if ( is_pvh_domain(d) )
> -        check_dom0_pvh_reqs(d);
> +    check_dom0_reqs(d);
>  
>      if ( !iommu_enabled )
>          return;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-11 15:49 ` [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code Julien Grall
  2014-03-11 16:50   ` Jan Beulich
@ 2014-03-18 16:24   ` Ian Campbell
  2014-03-18 17:36     ` Julien Grall
  1 sibling, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> The generic IOMMU framework code (xen/drivers/passthrough/iommu.c) contains
> functions specific to x86 and PCI.
> 
> Split the framework in 3 distincts files:
>     - iommu.c: contains generic functions shared between x86 and ARM
>                (when it will be supported)
>     - pci.c: contains specific functions for PCI passthrough
>     - x86/iommu.c: contains specific functions for x86
> 
> io.c contains x86 HVM specific code. Only compile for x86.

Move it to x86/io.c then ?

(no other comments above what Jan has already said)

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code
  2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
  2014-03-11 16:15   ` Julien Grall
  2014-03-11 16:53   ` Jan Beulich
@ 2014-03-18 16:27   ` Ian Campbell
  2014-03-18 19:40     ` Julien Grall
  2 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, Suravee Suthikulpanit, Shane Wang, Joseph Cihula,
	tim, stefano.stabellini, Jan Beulich, xen-devel, Gang Wei,
	Xiantao Zhang

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> +struct arch_hvm_iommu
> +{
> +    u64 pgd_maddr;                 /* io page directory machine address */
> +    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> +    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */

Blank line here for clarity?

> +    /* amd iommu support */
> +    int paging_mode;
> +    struct page_info *root_table;
> +    struct guest_iommu *g_iommu;
> +

I don't think the following are amd specific, in their original home
they were up with pgd_maddr and co any way. If they are to stay here
perhaps a new /* heading */ comment would help?

> +    struct list_head g2m_ioport_list;   /* guest to machine ioport mapping */
> +    struct list_head mapped_rmrrs;
> +    spinlock_t mapping_lock;            /* io page table lock */
> +};
> +

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu
  2014-03-18 16:19   ` Ian Campbell
@ 2014-03-18 16:32     ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2014-03-18 16:32 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Ian Campbell, Julien Grall
  Cc: xen-devel, stefano.stabellini, tim

>>> On 18.03.14 at 17:19, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> The structure hvm_iommu contains a shadow value of domain->domain_id. There
>> is no reason to not directly use domain->domain_id.
>> 
>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
> 
> (needs ack from one of the above though)

Right, I was expecting Suravee to take care of this, because I think
it was him (or someone else at AMD) who introduced that field (so
we don't overlook something non-obvious).

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-11 15:49 ` [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment Julien Grall
  2014-03-11 16:55   ` Jan Beulich
@ 2014-03-18 16:33   ` Ian Campbell
  2014-03-18 19:46     ` Julien Grall
  1 sibling, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> Add IOMMU helpers to support device tree assignment/deassignment. This patch
> introduces 2 new fields in the dt_device_node:
>     - is_protected: Does the device is protected by an IOMMU
>     - next_assigned: Pointer to the next device assigned to the same
>     domain

Am I correct that this list is not maintained for dom0? The behaviour of
dt_assign_device and dt_deassign_device seems to rely on it?

> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
> Cc: Jan Beulich <jbeulich@suse.com>

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture
  2014-03-11 15:49 ` [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture Julien Grall
@ 2014-03-18 16:40   ` Ian Campbell
  2014-03-18 19:58     ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> @@ -754,7 +766,7 @@ static int map_device(struct domain *d, const struct dt_device_node *dev)
>  }
>  
>  static int handle_node(struct domain *d, struct kernel_info *kinfo,
> -                       const struct dt_device_node *node)
> +                       struct dt_device_node *node)
>  {
>      static const struct dt_device_match skip_matches[] __initconst =
>      {
> @@ -775,7 +787,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
>          DT_MATCH_TIMER,
>          { /* sentinel */ },
>      };
> -    const struct dt_device_node *child;
> +    struct dt_device_node *child;

Why do these consts become unwanted?

> diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c
> new file mode 100644
> index 0000000..b0bd71d
> --- /dev/null
> +++ b/xen/drivers/passthrough/arm/iommu.c
[...]
> +int __init iommu_hardware_setup(void)
> +{
> +    struct dt_device_node *np;
> +    int rc;
> +    unsigned int num_iommus = 0;
> +
> +    dt_for_each_device_node(dt_host, np)

I can't find dt_host in this or any of the previous patches.

> +    {
> +        rc = device_init(np, DEVICE_IOMMU, NULL);
> +        if ( !rc )
> +            num_iommus++;
> +    }
> +
> +    return ( num_iommus > 0 ) ? 0 : -ENODEV;
> +}
> +
> +int arch_iommu_domain_init(struct domain *d)
> +{
> +    int ret;
> +
> +    ret = iommu_dt_domain_init(d);
> +
> +    return ret;

return iommu_dt-domain_init(d);
?

> diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h
> new file mode 100644
> index 0000000..81eec83
> --- /dev/null
> +++ b/xen/include/asm-arm/iommu.h
> [...]
> +#define domain_hvm_iommu(d) (&d->arch.hvm_domain.hvm_iommu)

Does this macro give us the freedom to avoid the term "hvm" a bit and
use d->arch.iommu?

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled
  2014-03-11 15:49 ` [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled Julien Grall
@ 2014-03-18 16:41   ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:41 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> When iommu={disable,off,no,false} is given to Xen command line, the IOMMU
> framework won't specify that the device shouldn't be passthrough to DOM0.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-03-11 15:49 ` [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node Julien Grall
@ 2014-03-18 16:48   ` Ian Campbell
  2014-03-18 20:09     ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:48 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> protected devices should not use it.
> 
> Only Xen is abled to know if an IOMMU protects the device. The new property
> "protected-devices" is a list of device phandles protected by an IOMMU.
> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> 
> ---
>     This patch *MUST NOT* be applied until we agreed on a device binding
>     the device tree folks. DOM0 can run safely with swiotlb on protected
>     devices while LVM is not used for guest disk.

LVM works these days I think.

> 
>     Changes in v2:
>         - Patch added
> ---
>  xen/arch/arm/domain_build.c |   51 ++++++++++++++++++++++++++++++++++++++-----
>  xen/arch/arm/kernel.h       |    3 +++
>  2 files changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 2438aa0..565784a 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -324,19 +324,22 @@ static int make_memory_node(const struct domain *d,
>      return res;
>  }
>  
> -static int make_hypervisor_node(struct domain *d,
> -                                void *fdt, const struct dt_device_node *parent)
> +static int make_hypervisor_node(struct domain *d, struct kernel_info *kinfo,
> +                                const struct dt_device_node *parent)
>  {
>      const char compat[] =
>          "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
>          "xen,xen";
>      __be32 reg[4];
>      gic_interrupt_t intr;
> -    __be32 *cells;
> +    __be32 *cells, *_cells;
>      int res;
>      int addrcells = dt_n_addr_cells(parent);
>      int sizecells = dt_n_size_cells(parent);
>      paddr_t gnttab_start, gnttab_size;
> +    const struct dt_device_node *dev;
> +    struct hvm_iommu *hd = domain_hvm_iommu(d);
> +    void *fdt = kinfo->fdt;
>  
>      DPRINT("Create hypervisor node\n");
>  
> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
>      if ( res )
>          return res;
>  
> +    if ( kinfo->num_dev_protected )
> +    {
> +        /* Don't need to take dtdevs_lock here */

Why not? Please explain in the comment.

> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
> +                              dt_size_to_cells(sizeof(dt_phandle)));
> +        if ( !cells )
> +            return -FDT_ERR_XEN(ENOMEM);
> +
> +        _cells = cells;

Odd numbers of leading _ are reserved for the compiler IIRC. Even
numbers are reserved for the libc/environment which is why we can get
away with such names in hypervisor context.

But lets just skirt the whole issue and pick a name which doesn't use a
leading _. cells_iter or c or something.

Is there no interface to say "make an fdt_property(name, size)"
returning the data to be filled in?

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers
  2014-03-11 15:49 ` [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers Julien Grall
@ 2014-03-18 16:54   ` Ian Campbell
  2014-03-18 20:25     ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 16:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> This patch add support for ARM architected SMMU driver. It's based on the
> linux drivers (drivers/iommu/arm-smmu) commit 89ac23cd.
> 
> The major differences with the Linux driver are:
>     - Fault by default if the SMMU is enabled to translate an
>     address (Linux is bypassing the SMMU)
>     - Using P2M page table instead of creating new one
>     - Dropped stage-1 support
>     - Dropped chained SMMUs support for now
>     - Reworking device assignment and the different structures
> 
> Xen is programming each IOMMU by:
>     - Using stage-2 mode translation
>     - Sharing the page table with the processor
>     - Injecting a fault if the device has made a wrong translation
> 
> Signed-off-by: Julien Grall<julien.grall@linaro.org>
> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
> Cc: Jan Beulich <jbeulich@suse.com>

I don't think it is sensible to try and review this code in great detail
given you no doubt did so as you imported it. I looked through the bits
which seemed like they were new Xen code rather than imported Linux
code. It mostly looks good, one question and when grammar nit.

> +static __init void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> +{
> +[...]
> +    /* Don't upgrade barriers */
> +    reg &= ~(SMMU_sCR0_BSU_MASK << SMMU_sCR0_BSU_SHIFT);

No? Is that safe when a vcpu migrates around pCPUs?

> +
> +static int __init smmu_init(struct dt_device_node *dev,
> +                            const void *data)
> +{
> +    struct arm_smmu_device *smmu;
> +    int res;
> +    u64 addr, size;
> +    unsigned int num_irqs, i;
> +    struct dt_phandle_args masterspec;
> +    struct rb_node *node;
> +
> +    /* Even if the device can't be initialized, we don't want to give to
> +     * dom0 the smmu device

"we don't want to give the smmu device to dom0"

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-18 16:22   ` Ian Campbell
@ 2014-03-18 17:28     ` Julien Grall
  2014-03-18 17:50       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 17:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 03/18/2014 04:22 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> DOM0 on ARM will have the same requirements as DOM0 PVH when iommu is enabled.
>> Both PVH and ARM guest has paging mode translate enabled, so Xen can use it
>> to know if it needs to check the requirements.
>>
>> Rename the function and remove "pvh" word in the panic message.
>>
>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
>>
>> ---
>>     Changes in v2:
>>         - IOMMU can be disabled on ARM if the platform doesn't have
>>         IOMMU.
>> ---
>>  xen/drivers/passthrough/iommu.c |   13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
>> index c70165a..3c63f87 100644
>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -130,13 +130,17 @@ int iommu_domain_init(struct domain *d)
>>      return hd->platform_ops->init(d);
>>  }
>>  
>> -static __init void check_dom0_pvh_reqs(struct domain *d)
>> +static __init void check_dom0_reqs(struct domain *d)
>>  {
>> -    if ( !iommu_enabled )
>> +    if ( !paging_mode_translate(d) )
>> +        return;
>> +
>> +    if ( is_pvh_domain(d) && !iommu_enabled )
> 
> Is is_pvh_domain going to be exposed to common code on ARM?

It will be exposed to common code. For now is_pvh_domain is part of
xen/sched.h, do you plan to move it in asm-x86?

> Or would a arch_check_dom0_reqs be useful here?

I will update patch #7 to create this function.

>>          panic("Presently, iommu must be enabled for pvh dom0\n");
>>  
>>      if ( iommu_passthrough )
>> -        panic("For pvh dom0, dom0-passthrough must not be enabled\n");
>> +        panic("Dom0 uses translate paging mode, dom0-passthrough must not be "
> 
> "paging translated mode" reads more natural to me.

I will change in the next version.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-18 16:24   ` Ian Campbell
@ 2014-03-18 17:36     ` Julien Grall
  2014-03-18 17:50       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 17:36 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 03/18/2014 04:24 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> The generic IOMMU framework code (xen/drivers/passthrough/iommu.c) contains
>> functions specific to x86 and PCI.
>>
>> Split the framework in 3 distincts files:
>>     - iommu.c: contains generic functions shared between x86 and ARM
>>                (when it will be supported)
>>     - pci.c: contains specific functions for PCI passthrough
>>     - x86/iommu.c: contains specific functions for x86
>>
>> io.c contains x86 HVM specific code. Only compile for x86.
> 
> Move it to x86/io.c then ?

For now it contains mostly x86 code. The file contains function handle
handle pt_irq. I'm not 100% sure if we will need some part for ARM.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-18 17:28     ` Julien Grall
@ 2014-03-18 17:50       ` Ian Campbell
  2014-03-18 18:19         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 17:50 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 17:28 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:22 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> DOM0 on ARM will have the same requirements as DOM0 PVH when iommu is enabled.
> >> Both PVH and ARM guest has paging mode translate enabled, so Xen can use it
> >> to know if it needs to check the requirements.
> >>
> >> Rename the function and remove "pvh" word in the panic message.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >> Acked-by: Jan Beulich <jbeulich@suse.com>
> >> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
> >>
> >> ---
> >>     Changes in v2:
> >>         - IOMMU can be disabled on ARM if the platform doesn't have
> >>         IOMMU.
> >> ---
> >>  xen/drivers/passthrough/iommu.c |   13 ++++++++-----
> >>  1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> >> index c70165a..3c63f87 100644
> >> --- a/xen/drivers/passthrough/iommu.c
> >> +++ b/xen/drivers/passthrough/iommu.c
> >> @@ -130,13 +130,17 @@ int iommu_domain_init(struct domain *d)
> >>      return hd->platform_ops->init(d);
> >>  }
> >>  
> >> -static __init void check_dom0_pvh_reqs(struct domain *d)
> >> +static __init void check_dom0_reqs(struct domain *d)
> >>  {
> >> -    if ( !iommu_enabled )
> >> +    if ( !paging_mode_translate(d) )
> >> +        return;
> >> +
> >> +    if ( is_pvh_domain(d) && !iommu_enabled )
> > 
> > Is is_pvh_domain going to be exposed to common code on ARM?
> 
> It will be exposed to common code. For now is_pvh_domain is part of
> xen/sched.h, do you plan to move it in asm-x86?

Oh, I hadn't realised it was already common. I don't plan to move it
myself.

> > Or would a arch_check_dom0_reqs be useful here?
> 
> I will update patch #7 to create this function.

I suppose given that is_pvh_domain is already common this isn't really
necessary, although maybe someone would thank you in the future for
reducing the common code use of is_pvh_domain a little...

> 
> >>          panic("Presently, iommu must be enabled for pvh dom0\n");
> >>  
> >>      if ( iommu_passthrough )
> >> -        panic("For pvh dom0, dom0-passthrough must not be enabled\n");
> >> +        panic("Dom0 uses translate paging mode, dom0-passthrough must not be "
> > 
> > "paging translated mode" reads more natural to me.
> 
> I will change in the next version.

Thanks.
Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-18 17:36     ` Julien Grall
@ 2014-03-18 17:50       ` Ian Campbell
  2014-03-18 18:21         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-18 17:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 17:36 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:24 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> The generic IOMMU framework code (xen/drivers/passthrough/iommu.c) contains
> >> functions specific to x86 and PCI.
> >>
> >> Split the framework in 3 distincts files:
> >>     - iommu.c: contains generic functions shared between x86 and ARM
> >>                (when it will be supported)
> >>     - pci.c: contains specific functions for PCI passthrough
> >>     - x86/iommu.c: contains specific functions for x86
> >>
> >> io.c contains x86 HVM specific code. Only compile for x86.
> > 
> > Move it to x86/io.c then ?
> 
> For now it contains mostly x86 code. The file contains function handle
> handle pt_irq. I'm not 100% sure if we will need some part for ARM.

Split the certainly-x86 code out into x86/io.c then and leave the
plausibly-common stuff in io.c?

Ian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-18 17:50       ` Ian Campbell
@ 2014-03-18 18:19         ` Julien Grall
  2014-03-19 10:01           ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 18:19 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, Xiantao Zhang, stefano.stabellini

On 03/18/2014 05:50 PM, Ian Campbell wrote:
> On Tue, 2014-03-18 at 17:28 +0000, Julien Grall wrote:
>> It will be exposed to common code. For now is_pvh_domain is part of
>> xen/sched.h, do you plan to move it in asm-x86?
> 
> Oh, I hadn't realised it was already common. I don't plan to move it
> myself.

It might be a good thing to prevent people using is_{hvm,pv,pvh}_domain
in common code.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-18 17:50       ` Ian Campbell
@ 2014-03-18 18:21         ` Julien Grall
  2014-03-19 10:02           ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 18:21 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On 03/18/2014 05:50 PM, Ian Campbell wrote:
>> For now it contains mostly x86 code. The file contains function handle
>> handle pt_irq. I'm not 100% sure if we will need some part for ARM.
> 
> Split the certainly-x86 code out into x86/io.c then and leave the
> plausibly-common stuff in io.c?

Do you mind if I let this part until someone will work device
passthrough will be done?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code
  2014-03-18 16:27   ` Ian Campbell
@ 2014-03-18 19:40     ` Julien Grall
  0 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-18 19:40 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Keir Fraser, Suravee Suthikulpanit, Shane Wang, Joseph Cihula,
	tim, stefano.stabellini, Jan Beulich, xen-devel, Gang Wei,
	Xiantao Zhang

Hi Ian,

On 03/18/2014 04:27 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> +struct arch_hvm_iommu
>> +{
>> +    u64 pgd_maddr;                 /* io page directory machine address */
>> +    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
>> +    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
> 
> Blank line here for clarity?

Sure.

>> +    /* amd iommu support */
>> +    int paging_mode;
>> +    struct page_info *root_table;
>> +    struct guest_iommu *g_iommu;
>> +
> 
> I don't think the following are amd specific, in their original home
> they were up with pgd_maddr and co any way. If they are to stay here
> perhaps a new /* heading */ comment would help?

I don't remember why I changed the order. I will go back to original
order in the next version.

Regards,


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-18 16:33   ` Ian Campbell
@ 2014-03-18 19:46     ` Julien Grall
  2014-03-19 10:12       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 19:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 03/18/2014 04:33 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> Add IOMMU helpers to support device tree assignment/deassignment. This patch
>> introduces 2 new fields in the dt_device_node:
>>     - is_protected: Does the device is protected by an IOMMU
>>     - next_assigned: Pointer to the next device assigned to the same
>>     domain
> 
> Am I correct that this list is not maintained for dom0? The behaviour of
> dt_assign_device and dt_deassign_device seems to rely on it?

DOM0 will call dt_assign_device for every device protected by the IOMMU
(see patch #9). So it will itself have a list maintained.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture
  2014-03-18 16:40   ` Ian Campbell
@ 2014-03-18 19:58     ` Julien Grall
  2014-03-19 10:29       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 19:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 03/18/2014 04:40 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> @@ -754,7 +766,7 @@ static int map_device(struct domain *d, const struct dt_device_node *dev)
>>  }
>>  
>>  static int handle_node(struct domain *d, struct kernel_info *kinfo,
>> -                       const struct dt_device_node *node)
>> +                       struct dt_device_node *node)
>>  {
>>      static const struct dt_device_match skip_matches[] __initconst =
>>      {
>> @@ -775,7 +787,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
>>          DT_MATCH_TIMER,
>>          { /* sentinel */ },
>>      };
>> -    const struct dt_device_node *child;
>> +    struct dt_device_node *child;
> 
> Why do these consts become unwanted?

Because map_device now calls iommu_assign_dt_device which will update
next_assigned in the structure dt_device_node.

>> diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c
>> new file mode 100644
>> index 0000000..b0bd71d
>> --- /dev/null
>> +++ b/xen/drivers/passthrough/arm/iommu.c
> [...]
>> +int __init iommu_hardware_setup(void)
>> +{
>> +    struct dt_device_node *np;
>> +    int rc;
>> +    unsigned int num_iommus = 0;
>> +
>> +    dt_for_each_device_node(dt_host, np)
> 
> I can't find dt_host in this or any of the previous patches.

dt_host was defined a while ago by the device tree code (see
xen/include/xen/device_tree.h).

>> +    {
>> +        rc = device_init(np, DEVICE_IOMMU, NULL);
>> +        if ( !rc )
>> +            num_iommus++;
>> +    }
>> +
>> +    return ( num_iommus > 0 ) ? 0 : -ENODEV;
>> +}
>> +
>> +int arch_iommu_domain_init(struct domain *d)
>> +{
>> +    int ret;
>> +
>> +    ret = iommu_dt_domain_init(d);
>> +
>> +    return ret;
> 
> return iommu_dt-domain_init(d);
> ?

I will do the change in the next version.

>> diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h
>> new file mode 100644
>> index 0000000..81eec83
>> --- /dev/null
>> +++ b/xen/include/asm-arm/iommu.h
>> [...]
>> +#define domain_hvm_iommu(d) (&d->arch.hvm_domain.hvm_iommu)
> 
> Does this macro give us the freedom to avoid the term "hvm" a bit and
> use d->arch.iommu?

It's possible, I just blindly copied from x86.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-03-18 16:48   ` Ian Campbell
@ 2014-03-18 20:09     ` Julien Grall
  2014-03-19 10:33       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 20:09 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

Hi Ian,

On 03/18/2014 04:48 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
>> protected devices should not use it.
>>
>> Only Xen is abled to know if an IOMMU protects the device. The new property
>> "protected-devices" is a list of device phandles protected by an IOMMU.
>>
>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>
>> ---
>>     This patch *MUST NOT* be applied until we agreed on a device binding
>>     the device tree folks. DOM0 can run safely with swiotlb on protected
>>     devices while LVM is not used for guest disk.
> 
> LVM works these days I think.

With this patch series applied, LVM will be broken if the hard drive is
protected by an IOMMU. It's the case on midway, the platform will crash
just after the guest begins to boot.

>> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
>>      if ( res )
>>          return res;
>>  
>> +    if ( kinfo->num_dev_protected )
>> +    {
>> +        /* Don't need to take dtdevs_lock here */
> 
> Why not? Please explain in the comment.

Because, building dom0 is only done with 1 CPU online (e.g CPU0). I
though it was obvious, I will update the comment.

>> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
>> +                              dt_size_to_cells(sizeof(dt_phandle)));
>> +        if ( !cells )
>> +            return -FDT_ERR_XEN(ENOMEM);
>> +
>> +        _cells = cells;
> 
> Odd numbers of leading _ are reserved for the compiler IIRC. Even
> numbers are reserved for the libc/environment which is why we can get
> away with such names in hypervisor context.
> 
> But lets just skirt the whole issue and pick a name which doesn't use a
> leading _. cells_iter or c or something.
> 
> Is there no interface to say "make an fdt_property(name, size)"
> returning the data to be filled in?

No, every helpers request to have an input data in parameters.

I will rename _cells into cells_iter.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers
  2014-03-18 16:54   ` Ian Campbell
@ 2014-03-18 20:25     ` Julien Grall
  2014-03-19 10:35       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-18 20:25 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 03/18/2014 04:54 PM, Ian Campbell wrote:
> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>> This patch add support for ARM architected SMMU driver. It's based on the
>> linux drivers (drivers/iommu/arm-smmu) commit 89ac23cd.
>>
>> The major differences with the Linux driver are:
>>     - Fault by default if the SMMU is enabled to translate an
>>     address (Linux is bypassing the SMMU)
>>     - Using P2M page table instead of creating new one
>>     - Dropped stage-1 support
>>     - Dropped chained SMMUs support for now
>>     - Reworking device assignment and the different structures
>>
>> Xen is programming each IOMMU by:
>>     - Using stage-2 mode translation
>>     - Sharing the page table with the processor
>>     - Injecting a fault if the device has made a wrong translation
>>
>> Signed-off-by: Julien Grall<julien.grall@linaro.org>
>> Cc: Xiantao Zhang <xiantao.zhang@intel.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
> 
> I don't think it is sensible to try and review this code in great detail
> given you no doubt did so as you imported it. I looked through the bits
> which seemed like they were new Xen code rather than imported Linux
> code. It mostly looks good, one question and when grammar nit.
> 
>> +static __init void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>> +{
>> +[...]
>> +    /* Don't upgrade barriers */
>> +    reg &= ~(SMMU_sCR0_BSU_MASK << SMMU_sCR0_BSU_SHIFT);
> 
> No? Is that safe when a vcpu migrates around pCPUs?

>From the SMMU doc 9.6.3, this field is only used when client devices are
not mapped to a translation context banks.

By default, the policy in Xen is to deny every transaction that doesn't
have a valid mapping. So we are safe.

I can update the comment if you want, or even better removing this code.

>> +
>> +static int __init smmu_init(struct dt_device_node *dev,
>> +                            const void *data)
>> +{
>> +    struct arm_smmu_device *smmu;
>> +    int res;
>> +    u64 addr, size;
>> +    unsigned int num_irqs, i;
>> +    struct dt_phandle_args masterspec;
>> +    struct rb_node *node;
>> +
>> +    /* Even if the device can't be initialized, we don't want to give to
>> +     * dom0 the smmu device
> 
> "we don't want to give the smmu device to dom0"

Will do the modification in the next version.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM
  2014-03-18 18:19         ` Julien Grall
@ 2014-03-19 10:01           ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:01 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 18:19 +0000, Julien Grall wrote:
> On 03/18/2014 05:50 PM, Ian Campbell wrote:
> > On Tue, 2014-03-18 at 17:28 +0000, Julien Grall wrote:
> >> It will be exposed to common code. For now is_pvh_domain is part of
> >> xen/sched.h, do you plan to move it in asm-x86?
> > 
> > Oh, I hadn't realised it was already common. I don't plan to move it
> > myself.
> 
> It might be a good thing to prevent people using is_{hvm,pv,pvh}_domain
> in common code.

Certainly it's a worth cause.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code
  2014-03-18 18:21         ` Julien Grall
@ 2014-03-19 10:02           ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:02 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 18:21 +0000, Julien Grall wrote:
> On 03/18/2014 05:50 PM, Ian Campbell wrote:
> >> For now it contains mostly x86 code. The file contains function handle
> >> handle pt_irq. I'm not 100% sure if we will need some part for ARM.
> > 
> > Split the certainly-x86 code out into x86/io.c then and leave the
> > plausibly-common stuff in io.c?
> 
> Do you mind if I let this part until someone will work device
> passthrough will be done?

OK.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-18 19:46     ` Julien Grall
@ 2014-03-19 10:12       ` Ian Campbell
  2014-03-19 10:42         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 19:46 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:33 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> Add IOMMU helpers to support device tree assignment/deassignment. This patch
> >> introduces 2 new fields in the dt_device_node:
> >>     - is_protected: Does the device is protected by an IOMMU
> >>     - next_assigned: Pointer to the next device assigned to the same
> >>     domain
> > 
> > Am I correct that this list is not maintained for dom0? The behaviour of
> > dt_assign_device and dt_deassign_device seems to rely on it?
> 
> DOM0 will call dt_assign_device for every device protected by the IOMMU
> (see patch #9). So it will itself have a list maintained.

Does this not mean that iommu_assign_dt_device will refuse to assign the
device to another domain (next_assigned list is not empty).

BTW, next_assigned is not a very good name for this list, since the
"nextness" is abstracted away. Consider just device_list.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture
  2014-03-18 19:58     ` Julien Grall
@ 2014-03-19 10:29       ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 19:58 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:40 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> @@ -754,7 +766,7 @@ static int map_device(struct domain *d, const struct dt_device_node *dev)
> >>  }
> >>  
> >>  static int handle_node(struct domain *d, struct kernel_info *kinfo,
> >> -                       const struct dt_device_node *node)
> >> +                       struct dt_device_node *node)
> >>  {
> >>      static const struct dt_device_match skip_matches[] __initconst =
> >>      {
> >> @@ -775,7 +787,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
> >>          DT_MATCH_TIMER,
> >>          { /* sentinel */ },
> >>      };
> >> -    const struct dt_device_node *child;
> >> +    struct dt_device_node *child;
> > 
> > Why do these consts become unwanted?
> 
> Because map_device now calls iommu_assign_dt_device which will update
> next_assigned in the structure dt_device_node.

OK, makes sense.

> >> diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c
> >> new file mode 100644
> >> index 0000000..b0bd71d
> >> --- /dev/null
> >> +++ b/xen/drivers/passthrough/arm/iommu.c
> > [...]
> >> +int __init iommu_hardware_setup(void)
> >> +{
> >> +    struct dt_device_node *np;
> >> +    int rc;
> >> +    unsigned int num_iommus = 0;
> >> +
> >> +    dt_for_each_device_node(dt_host, np)
> > 
> > I can't find dt_host in this or any of the previous patches.
> 
> dt_host was defined a while ago by the device tree code (see
> xen/include/xen/device_tree.h).

Doh, I didn't think to look at the existing code ;-)

> >> diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h
> >> new file mode 100644
> >> index 0000000..81eec83
> >> --- /dev/null
> >> +++ b/xen/include/asm-arm/iommu.h
> >> [...]
> >> +#define domain_hvm_iommu(d) (&d->arch.hvm_domain.hvm_iommu)
> > 
> > Does this macro give us the freedom to avoid the term "hvm" a bit and
> > use d->arch.iommu?
> 
> It's possible, I just blindly copied from x86.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-03-18 20:09     ` Julien Grall
@ 2014-03-19 10:33       ` Ian Campbell
  2014-04-03 21:51         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:33 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 03/18/2014 04:48 PM, Ian Campbell wrote:
> > On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> >> protected devices should not use it.
> >>
> >> Only Xen is abled to know if an IOMMU protects the device. The new property
> >> "protected-devices" is a list of device phandles protected by an IOMMU.
> >>
> >> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >>
> >> ---
> >>     This patch *MUST NOT* be applied until we agreed on a device binding
> >>     the device tree folks. DOM0 can run safely with swiotlb on protected
> >>     devices while LVM is not used for guest disk.
> > 
> > LVM works these days I think.
> 
> With this patch series applied, LVM will be broken if the hard drive is
> protected by an IOMMU.

How/why?

> It's the case on midway, the platform will crash just after the guest
> begins to boot.

This configuration works today (osstest tests it) so this would be a
regression. Can you sort this please?

> >> @@ -384,6 +387,39 @@ static int make_hypervisor_node(struct domain *d,
> >>      if ( res )
> >>          return res;
> >>  
> >> +    if ( kinfo->num_dev_protected )
> >> +    {
> >> +        /* Don't need to take dtdevs_lock here */
> > 
> > Why not? Please explain in the comment.
> 
> Because, building dom0 is only done with 1 CPU online (e.g CPU0). I
> though it was obvious, I will update the comment.

Locking is never obvious IMHO.

> >> +        cells = xmalloc_array(__be32, kinfo->num_dev_protected *
> >> +                              dt_size_to_cells(sizeof(dt_phandle)));
> >> +        if ( !cells )
> >> +            return -FDT_ERR_XEN(ENOMEM);
> >> +
> >> +        _cells = cells;
> > 
> > Odd numbers of leading _ are reserved for the compiler IIRC. Even
> > numbers are reserved for the libc/environment which is why we can get
> > away with such names in hypervisor context.
> > 
> > But lets just skirt the whole issue and pick a name which doesn't use a
> > leading _. cells_iter or c or something.
> > 
> > Is there no interface to say "make an fdt_property(name, size)"
> > returning the data to be filled in?
> 
> No, every helpers request to have an input data in parameters.

fdt_setprop_inplace isn't helpful because you need to be able to create
the empty prop, which doesn't exist. Oh well.


> I will rename _cells into cells_iter.

Thanks.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers
  2014-03-18 20:25     ` Julien Grall
@ 2014-03-19 10:35       ` Ian Campbell
  2014-03-19 10:44         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Tue, 2014-03-18 at 20:25 +0000, Julien Grall wrote:

> >> +static __init void arm_smmu_device_reset(struct arm_smmu_device *smmu)
> >> +{
> >> +[...]
> >> +    /* Don't upgrade barriers */
> >> +    reg &= ~(SMMU_sCR0_BSU_MASK << SMMU_sCR0_BSU_SHIFT);
> > 
> > No? Is that safe when a vcpu migrates around pCPUs?
> 
> From the SMMU doc 9.6.3, this field is only used when client devices are
> not mapped to a translation context banks.
> 
> By default, the policy in Xen is to deny every transaction that doesn't
> have a valid mapping. So we are safe.
> 
> I can update the comment if you want, or even better removing this code.

I think adding the explanation you just gave would do the job. Even if
you were to remove the code an explanation of why it doesn't mess with
the BSU mask would still be valuable I think (but I think setting it to
a known value even if it is unused would be best).

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-19 10:12       ` Ian Campbell
@ 2014-03-19 10:42         ` Julien Grall
  2014-03-19 10:54           ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-03-19 10:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 19/03/14 10:12, Ian Campbell wrote:
>> On 03/18/2014 04:33 PM, Ian Campbell wrote:
>>> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>>>> Add IOMMU helpers to support device tree assignment/deassignment. This patch
>>>> introduces 2 new fields in the dt_device_node:
>>>>      - is_protected: Does the device is protected by an IOMMU
>>>>      - next_assigned: Pointer to the next device assigned to the same
>>>>      domain
>>>
>>> Am I correct that this list is not maintained for dom0? The behaviour of
>>> dt_assign_device and dt_deassign_device seems to rely on it?
>>
>> DOM0 will call dt_assign_device for every device protected by the IOMMU
>> (see patch #9). So it will itself have a list maintained.
>
> Does this not mean that iommu_assign_dt_device will refuse to assign the
> device to another domain (next_assigned list is not empty).

Yes. After thinking, I'm not sure we want maintain a list for dom0.
At least the iommu_deassign_dt_device function is wrong because the 
device is not reassigned to dom0 (from the list point of view).


> BTW, next_assigned is not a very good name for this list, since the
> "nextness" is abstracted away. Consider just device_list.

I found device_list too generic. We don't know if it's the list of all 
the devices or just the ones of a specific domain.

What about domain_device_list?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers
  2014-03-19 10:35       ` Ian Campbell
@ 2014-03-19 10:44         ` Julien Grall
  0 siblings, 0 replies; 63+ messages in thread
From: Julien Grall @ 2014-03-19 10:44 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

Hi Ian,

On 19/03/14 10:35, Ian Campbell wrote:
> On Tue, 2014-03-18 at 20:25 +0000, Julien Grall wrote:
>
>>>> +static __init void arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>>> +{
>>>> +[...]
>>>> +    /* Don't upgrade barriers */
>>>> +    reg &= ~(SMMU_sCR0_BSU_MASK << SMMU_sCR0_BSU_SHIFT);
>>>
>>> No? Is that safe when a vcpu migrates around pCPUs?
>>
>>  From the SMMU doc 9.6.3, this field is only used when client devices are
>> not mapped to a translation context banks.
>>
>> By default, the policy in Xen is to deny every transaction that doesn't
>> have a valid mapping. So we are safe.
>>
>> I can update the comment if you want, or even better removing this code.
>
> I think adding the explanation you just gave would do the job. Even if
> you were to remove the code an explanation of why it doesn't mess with
> the BSU mask would still be valuable I think (but I think setting it to
> a known value even if it is unused would be best).

Ok. I will update the comment.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment
  2014-03-19 10:42         ` Julien Grall
@ 2014-03-19 10:54           ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-03-19 10:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, tim, Xiantao Zhang, stefano.stabellini

On Wed, 2014-03-19 at 10:42 +0000, Julien Grall wrote:
> > BTW, next_assigned is not a very good name for this list, since the
> > "nextness" is abstracted away. Consider just device_list.
> 
> I found device_list too generic. We don't know if it's the list of all 
> the devices or just the ones of a specific domain.
> 
> What about domain_device_list?

Just domain_list would do I think, device is before the -> in the uses.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-03-19 10:33       ` Ian Campbell
@ 2014-04-03 21:51         ` Julien Grall
  2014-04-04  9:40           ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-03 21:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

Hi Ian,

Sorry, I forgot to answer to this email...

On 19/03/14 10:33, Ian Campbell wrote:
> On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
>> Hi Ian,
>>
>> On 03/18/2014 04:48 PM, Ian Campbell wrote:
>>> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
>>>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
>>>> protected devices should not use it.
>>>>
>>>> Only Xen is abled to know if an IOMMU protects the device. The new property
>>>> "protected-devices" is a list of device phandles protected by an IOMMU.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>>>
>>>> ---
>>>>      This patch *MUST NOT* be applied until we agreed on a device binding
>>>>      the device tree folks. DOM0 can run safely with swiotlb on protected
>>>>      devices while LVM is not used for guest disk.
>>>
>>> LVM works these days I think.
>>
>> With this patch series applied, LVM will be broken if the hard drive is
>> protected by an IOMMU.
>
> How/why?

If the guest is using LVM for its block, bouncing via the swiotlb with 
IOMMU enabled will result to wrong mapping. If I remember correctly it's 
because SWIOTLB DMA address is a physical address and not an IPA. So the 
IOMMU will rejected the request.

So we have to by-pass swiotlb in this case.

>
>> It's the case on midway, the platform will crash just after the guest
>> begins to boot.
>
> This configuration works today (osstest tests it) so this would be a
> regression. Can you sort this please?

As said above with IOMMU enabled I won't be able to sort it until my 
patch series for Linux will be upstreamed.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-03 21:51         ` Julien Grall
@ 2014-04-04  9:40           ` Ian Campbell
  2014-04-04 10:25             ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-04-04  9:40 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Thu, 2014-04-03 at 22:51 +0100, Julien Grall wrote:
> Hi Ian,
> 
> Sorry, I forgot to answer to this email...
> 
> On 19/03/14 10:33, Ian Campbell wrote:
> > On Tue, 2014-03-18 at 20:09 +0000, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 03/18/2014 04:48 PM, Ian Campbell wrote:
> >>> On Tue, 2014-03-11 at 15:49 +0000, Julien Grall wrote:
> >>>> DOM0 is using the swiotlb to bounce DMA. With the IOMMU support in Xen,
> >>>> protected devices should not use it.
> >>>>
> >>>> Only Xen is abled to know if an IOMMU protects the device. The new property
> >>>> "protected-devices" is a list of device phandles protected by an IOMMU.
> >>>>
> >>>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >>>>
> >>>> ---
> >>>>      This patch *MUST NOT* be applied until we agreed on a device binding
> >>>>      the device tree folks. DOM0 can run safely with swiotlb on protected
> >>>>      devices while LVM is not used for guest disk.
> >>>
> >>> LVM works these days I think.
> >>
> >> With this patch series applied, LVM will be broken if the hard drive is
> >> protected by an IOMMU.
> >
> > How/why?
> 
> If the guest is using LVM for its block, bouncing via the swiotlb with 
> IOMMU enabled will result to wrong mapping. If I remember correctly it's 
> because SWIOTLB DMA address is a physical address and not an IPA. So the 
> IOMMU will rejected the request.

Can this not be made to work? The swiotlb shouldn't be breaking things
like this, regardless of whether there is an iommu which it doesn't know
about it should be returning sane results, which might imply the Xen
needs to be giving it sensible results.

At what point is the IOMMU rejecting the request? I didn't expect it to
be involved in the swiotlb path -- that's a copy into dom0 owned RAM
with a known 1:1 mapping (provided either by the literal use of a 1:1
mapping or by using the IOMMU to give the guest that impression).

> So we have to by-pass swiotlb in this case.
> 
> >
> >> It's the case on midway, the platform will crash just after the guest
> >> begins to boot.
> >
> > This configuration works today (osstest tests it) so this would be a
> > regression. Can you sort this please?
> 
> As said above with IOMMU enabled I won't be able to sort it until my 
> patch series for Linux will be upstreamed.

We really need to be able to manage this transition in a compatible way,
that means new kernels working on old hypervisors as well as old kernels
working on new hypervisors (it's obviously fine for this case to bounce
when it doesn't need to).

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04  9:40           ` Ian Campbell
@ 2014-04-04 10:25             ` Julien Grall
  2014-04-04 10:28               ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-04 10:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 04/04/2014 10:40 AM, Ian Campbell wrote:

> We really need to be able to manage this transition in a compatible way,
> that means new kernels working on old hypervisors as well as old kernels
> working on new hypervisors (it's obviously fine for this case to bounce
> when it doesn't need to).

It's not possible because a same platform can have both protected and
non-protected devices. The Linux has to know in some way if the DMA has
to be program with IPA or PA.

The only way is to disable the IOMMU, or don't use partition disk for
guest, until the DOM0 will be able to disable swiotlb.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 10:25             ` Julien Grall
@ 2014-04-04 10:28               ` Ian Campbell
  2014-04-04 10:39                 ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-04-04 10:28 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> 
> > We really need to be able to manage this transition in a compatible way,
> > that means new kernels working on old hypervisors as well as old kernels
> > working on new hypervisors (it's obviously fine for this case to bounce
> > when it doesn't need to).
> 
> It's not possible because a same platform can have both protected and
> non-protected devices. The Linux has to know in some way if the DMA has
> to be program with IPA or PA.

Then there must be a negotiation between Xen and the Linux kernel so Xen
can know which case to apply.

e.g. if the kernel does not advertise support for protected devices then
Xen must act as if no IOMMU was present. 

> The only way is to disable the IOMMU, or don't use partition disk for
> guest, until the DOM0 will be able to disable swiotlb.
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 10:28               ` Ian Campbell
@ 2014-04-04 10:39                 ` Julien Grall
  2014-04-04 10:48                   ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-04 10:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 04/04/2014 11:28 AM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
>>
>>> We really need to be able to manage this transition in a compatible way,
>>> that means new kernels working on old hypervisors as well as old kernels
>>> working on new hypervisors (it's obviously fine for this case to bounce
>>> when it doesn't need to).
>>
>> It's not possible because a same platform can have both protected and
>> non-protected devices. The Linux has to know in some way if the DMA has
>> to be program with IPA or PA.
> 
> Then there must be a negotiation between Xen and the Linux kernel so Xen
> can know which case to apply.
> 
> e.g. if the kernel does not advertise support for protected devices then
> Xen must act as if no IOMMU was present.

How the kernel can say "I'm supporting IOMMU"? New hypercall?

Xen has to program the IOMMU quite early (e.g before Linux is booting
and use the protected device).

Backporting my patch series to support protected devices is not a big
deal. What about disabling IOMMU by default on ARM until a good support
is made in Linux?

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 10:39                 ` Julien Grall
@ 2014-04-04 10:48                   ` Ian Campbell
  2014-04-04 11:01                     ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-04-04 10:48 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
> On 04/04/2014 11:28 AM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> >> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> >>
> >>> We really need to be able to manage this transition in a compatible way,
> >>> that means new kernels working on old hypervisors as well as old kernels
> >>> working on new hypervisors (it's obviously fine for this case to bounce
> >>> when it doesn't need to).
> >>
> >> It's not possible because a same platform can have both protected and
> >> non-protected devices. The Linux has to know in some way if the DMA has
> >> to be program with IPA or PA.
> > 
> > Then there must be a negotiation between Xen and the Linux kernel so Xen
> > can know which case to apply.
> > 
> > e.g. if the kernel does not advertise support for protected devices then
> > Xen must act as if no IOMMU was present.
> 
> How the kernel can say "I'm supporting IOMMU"? New hypercall?

On x86 we use the ELF notes to communicate it at build time. We don't
currently have a similar mechanism under ARM but perhaps we need to
invent one now.

There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
signal these sorts of things, if its not too late.

> Xen has to program the IOMMU quite early (e.g before Linux is booting
> and use the protected device).

In that case an ELF note type solution might be the only option.

However, since this stuff only comes to matter when the guest comes to
do grant mapping it might be that we can defer until runtime and require
that a modern kernel calls vm_assist before making any grant calls. If
it doesn't then it is assumed to be unable to cope with the iommu.

> Backporting my patch series to support protected devices is not a big
> deal. What about disabling IOMMU by default on ARM until a good support
> is made in Linux?

I'd rather avoid this if at all possible, upgrading Xen is not supposed
to require new dom0 kernel features and it is hard to describe "support
for protected devices" as a bug fix.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 10:48                   ` Ian Campbell
@ 2014-04-04 11:01                     ` Julien Grall
  2014-04-04 11:13                       ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-04 11:01 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 04/04/2014 11:48 AM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
>> On 04/04/2014 11:28 AM, Ian Campbell wrote:
>>> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
>>>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
>>>>
>>>>> We really need to be able to manage this transition in a compatible way,
>>>>> that means new kernels working on old hypervisors as well as old kernels
>>>>> working on new hypervisors (it's obviously fine for this case to bounce
>>>>> when it doesn't need to).
>>>>
>>>> It's not possible because a same platform can have both protected and
>>>> non-protected devices. The Linux has to know in some way if the DMA has
>>>> to be program with IPA or PA.
>>>
>>> Then there must be a negotiation between Xen and the Linux kernel so Xen
>>> can know which case to apply.
>>>
>>> e.g. if the kernel does not advertise support for protected devices then
>>> Xen must act as if no IOMMU was present.
>>
>> How the kernel can say "I'm supporting IOMMU"? New hypercall?
> 
> On x86 we use the ELF notes to communicate it at build time. We don't
> currently have a similar mechanism under ARM but perhaps we need to
> invent one now.
> 
> There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
> signal these sorts of things, if its not too late.
> 
>> Xen has to program the IOMMU quite early (e.g before Linux is booting
>> and use the protected device).
> 
> In that case an ELF note type solution might be the only option.
> 
> However, since this stuff only comes to matter when the guest comes to
> do grant mapping it might be that we can defer until runtime and require
> that a modern kernel calls vm_assist before making any grant calls. If
> it doesn't then it is assumed to be unable to cope with the iommu.

Using vm_assist means we can't anymore denied access to invalid
transaction by default. It sounds like we want to completely disable the
IOMMU, because in this case passthrough should not be enabled.

Futhermore, I can't predict what would happen if the device is used and
the kernel decides to call vm_assist (e.g protect devices). I suppose we
can break the device at this time.

It's not possible in Xen to know if the decide is used or not.

>> Backporting my patch series to support protected devices is not a big
>> deal. What about disabling IOMMU by default on ARM until a good support
>> is made in Linux?
> 
> I'd rather avoid this if at all possible, upgrading Xen is not supposed
> to require new dom0 kernel features and it is hard to describe "support
> for protected devices" as a bug fix.

But we can chose to disable IOMMU by default on ARM. And the user will
have to decide if it's safe or not to use IOMMU.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 11:01                     ` Julien Grall
@ 2014-04-04 11:13                       ` Ian Campbell
  2014-04-04 11:23                         ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-04-04 11:13 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-04-04 at 12:01 +0100, Julien Grall wrote:
> On 04/04/2014 11:48 AM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 11:39 +0100, Julien Grall wrote:
> >> On 04/04/2014 11:28 AM, Ian Campbell wrote:
> >>> On Fri, 2014-04-04 at 11:25 +0100, Julien Grall wrote:
> >>>> On 04/04/2014 10:40 AM, Ian Campbell wrote:
> >>>>
> >>>>> We really need to be able to manage this transition in a compatible way,
> >>>>> that means new kernels working on old hypervisors as well as old kernels
> >>>>> working on new hypervisors (it's obviously fine for this case to bounce
> >>>>> when it doesn't need to).
> >>>>
> >>>> It's not possible because a same platform can have both protected and
> >>>> non-protected devices. The Linux has to know in some way if the DMA has
> >>>> to be program with IPA or PA.
> >>>
> >>> Then there must be a negotiation between Xen and the Linux kernel so Xen
> >>> can know which case to apply.
> >>>
> >>> e.g. if the kernel does not advertise support for protected devices then
> >>> Xen must act as if no IOMMU was present.
> >>
> >> How the kernel can say "I'm supporting IOMMU"? New hypercall?
> > 
> > On x86 we use the ELF notes to communicate it at build time. We don't
> > currently have a similar mechanism under ARM but perhaps we need to
> > invent one now.
> > 
> > There is also __HYPERVISOR_vm_assist which is/was used on PV x86 to
> > signal these sorts of things, if its not too late.
> > 
> >> Xen has to program the IOMMU quite early (e.g before Linux is booting
> >> and use the protected device).
> > 
> > In that case an ELF note type solution might be the only option.
> > 
> > However, since this stuff only comes to matter when the guest comes to
> > do grant mapping it might be that we can defer until runtime and require
> > that a modern kernel calls vm_assist before making any grant calls. If
> > it doesn't then it is assumed to be unable to cope with the iommu.
> 
> Using vm_assist means we can't anymore denied access to invalid
> transaction by default. It sounds like we want to completely disable the
> IOMMU, because in this case passthrough should not be enabled.

We could enable both of those things only at the point where vm_assist
was called.

And yes, if the dom0 kernel isn't capable of doing stuff with the IOMMU
enable we should turn off passthrough too.

> Futhermore, I can't predict what would happen if the device is used and
> the kernel decides to call vm_assist (e.g protect devices). I suppose we
> can break the device at this time.

I think we can reasonably require that vm_assist be called super early,
i.e. before any DMA operations occur, in Linux terms I think
early_initcall(xen_guest_init) is likely early enough, or we could move
xen_guestr_init even sooner.

We also have the option of a build time feature flag in the image itself
like on x86. It is likely that going forward we will have other cases
where we wished we had such a thing so getting it in place now would be
useful.

> It's not possible in Xen to know if the decide is used or not.

"the decide"?

> 
> >> Backporting my patch series to support protected devices is not a big
> >> deal. What about disabling IOMMU by default on ARM until a good support
> >> is made in Linux?
> > 
> > I'd rather avoid this if at all possible, upgrading Xen is not supposed
> > to require new dom0 kernel features and it is hard to describe "support
> > for protected devices" as a bug fix.
> 
> But we can chose to disable IOMMU by default on ARM. And the user will
> have to decide if it's safe or not to use IOMMU.

That's a cop out and a rubbish user experience going forward.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 11:13                       ` Ian Campbell
@ 2014-04-04 11:23                         ` Julien Grall
  2014-04-04 12:45                           ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-04 11:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 04/04/2014 12:13 PM, Ian Campbell wrote:
> We could enable both of those things only at the point where vm_assist
> was called.

[..]

> I think we can reasonably require that vm_assist be called super early,
> i.e. before any DMA operations occur, in Linux terms I think
> early_initcall(xen_guest_init) is likely early enough, or we could move
> xen_guestr_init even sooner.

IHMO, this solution is far too complicate and we will have to fork from
the current IOMMU framework.

> We also have the option of a build time feature flag in the image itself
> like on x86. It is likely that going forward we will have other cases
> where we wished we had such a thing so getting it in place now would be
> useful.

I definitely prefer this solution which seems more cleaner than the others.

Is it possible to embedded notes in a zImage? How x86 handle other
format than ELF?

> 
>> It's not possible in Xen to know if the decide is used or not.
> 
> "the decide"?

The device, I mixed multiple sentences together, sorry.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 11:23                         ` Julien Grall
@ 2014-04-04 12:45                           ` Ian Campbell
  2014-04-04 13:10                             ` Julien Grall
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Campbell @ 2014-04-04 12:45 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
> On 04/04/2014 12:13 PM, Ian Campbell wrote:
> > We could enable both of those things only at the point where vm_assist
> > was called.
> 
> [..]
> 
> > I think we can reasonably require that vm_assist be called super early,
> > i.e. before any DMA operations occur, in Linux terms I think
> > early_initcall(xen_guest_init) is likely early enough, or we could move
> > xen_guestr_init even sooner.
> 
> IHMO, this solution is far too complicate and we will have to fork from
> the current IOMMU framework.

Eh? On the Linux side? You just call this hypercall super early so that
you are sure no drivers have initialised yet and you are done, there is
no need to fork anything AFAICT. And in any case if any generic
framework needs extending to better work with Xen then we should do
that, not fork it or make a worse design to sidestep the need.

Obviously there will be complexity on the Xen side, but I think far too
complicated will be an overstatement.

> > We also have the option of a build time feature flag in the image itself
> > like on x86. It is likely that going forward we will have other cases
> > where we wished we had such a thing so getting it in place now would be
> > useful.
> 
> I definitely prefer this solution which seems more cleaner than the others.
> 
> Is it possible to embedded notes in a zImage? How x86 handle other
> format than ELF?

The x86 bzImage format embeds an ELF file, which something which was
done long ago to improve things for Xen PV x86 use which wants to boot
an ELF.

ARM zImage doesn't currently have anything like that so you would be
looking at an image format extension.

> >> It's not possible in Xen to know if the decide is used or not.
> > 
> > "the decide"?
> 
> The device, I mixed multiple sentences together, sorry.
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 12:45                           ` Ian Campbell
@ 2014-04-04 13:10                             ` Julien Grall
  2014-04-04 13:18                               ` Ian Campbell
  0 siblings, 1 reply; 63+ messages in thread
From: Julien Grall @ 2014-04-04 13:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, tim, stefano.stabellini

On 04/04/2014 01:45 PM, Ian Campbell wrote:
> On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
>> On 04/04/2014 12:13 PM, Ian Campbell wrote:
>>> We could enable both of those things only at the point where vm_assist
>>> was called.
>>
>> [..]
>>
>>> I think we can reasonably require that vm_assist be called super early,
>>> i.e. before any DMA operations occur, in Linux terms I think
>>> early_initcall(xen_guest_init) is likely early enough, or we could move
>>> xen_guestr_init even sooner.
>>
>> IHMO, this solution is far too complicate and we will have to fork from
>> the current IOMMU framework.
> 
> Eh? On the Linux side? You just call this hypercall super early so that
> you are sure no drivers have initialised yet and you are done, there is
> no need to fork anything AFAICT. And in any case if any generic
> framework needs extending to better work with Xen then we should do
> that, not fork it or make a worse design to sidestep the need.
> 
> Obviously there will be complexity on the Xen side, but I think far too
> complicated will be an overstatement.

I was talking about Xen side. The current IOMMU platform would need some
rework to support it.

For me this solution is far too complicate compare to the other
solutions (e.g ELF note).

> 
>>> We also have the option of a build time feature flag in the image itself
>>> like on x86. It is likely that going forward we will have other cases
>>> where we wished we had such a thing so getting it in place now would be
>>> useful.
>>
>> I definitely prefer this solution which seems more cleaner than the others.
>>
>> Is it possible to embedded notes in a zImage? How x86 handle other
>> format than ELF?
> 
> The x86 bzImage format embeds an ELF file, which something which was
> done long ago to improve things for Xen PV x86 use which wants to boot
> an ELF.
> 
> ARM zImage doesn't currently have anything like that so you would be
> looking at an image format extension.

Ok. So it will be easy to add support for ELF image. However for zImage,
do you think Linux ARM maintainers will accept a change in the format?

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node
  2014-04-04 13:10                             ` Julien Grall
@ 2014-04-04 13:18                               ` Ian Campbell
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Campbell @ 2014-04-04 13:18 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, tim, stefano.stabellini

On Fri, 2014-04-04 at 14:10 +0100, Julien Grall wrote:
> On 04/04/2014 01:45 PM, Ian Campbell wrote:
> > On Fri, 2014-04-04 at 12:23 +0100, Julien Grall wrote:
> >> On 04/04/2014 12:13 PM, Ian Campbell wrote:
> >>> We could enable both of those things only at the point where vm_assist
> >>> was called.
> >>
> >> [..]
> >>
> >>> I think we can reasonably require that vm_assist be called super early,
> >>> i.e. before any DMA operations occur, in Linux terms I think
> >>> early_initcall(xen_guest_init) is likely early enough, or we could move
> >>> xen_guestr_init even sooner.
> >>
> >> IHMO, this solution is far too complicate and we will have to fork from
> >> the current IOMMU framework.
> > 
> > Eh? On the Linux side? You just call this hypercall super early so that
> > you are sure no drivers have initialised yet and you are done, there is
> > no need to fork anything AFAICT. And in any case if any generic
> > framework needs extending to better work with Xen then we should do
> > that, not fork it or make a worse design to sidestep the need.
> > 
> > Obviously there will be complexity on the Xen side, but I think far too
> > complicated will be an overstatement.
> 
> I was talking about Xen side. The current IOMMU platform would need some
> rework to support it.
> 
> For me this solution is far too complicate compare to the other
> solutions (e.g ELF note).
> 
> > 
> >>> We also have the option of a build time feature flag in the image itself
> >>> like on x86. It is likely that going forward we will have other cases
> >>> where we wished we had such a thing so getting it in place now would be
> >>> useful.
> >>
> >> I definitely prefer this solution which seems more cleaner than the others.
> >>
> >> Is it possible to embedded notes in a zImage? How x86 handle other
> >> format than ELF?
> > 
> > The x86 bzImage format embeds an ELF file, which something which was
> > done long ago to improve things for Xen PV x86 use which wants to boot
> > an ELF.
> > 
> > ARM zImage doesn't currently have anything like that so you would be
> > looking at an image format extension.
> 
> Ok. So it will be easy to add support for ELF image. However for zImage,
> do you think Linux ARM maintainers will accept a change in the format?

I've no idea. I'm 100% positive that they will require it to be done in
a backwards compatible way. I don't know how amenable the zImage format
is to such extensions.

On x86 IIRC the kexec people were interested in being able to get an ELF
out of a bzImage, you might find that the same is true on ARM, which
might help with motivating the change.

Ian.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2014-04-04 13:19 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-11 15:49 [PATCH v3 00/13] IOMMU support for ARM Julien Grall
2014-03-11 15:49 ` [PATCH v3 01/13] xen/common: grant-table: only call IOMMU if paging mode translate is disabled Julien Grall
2014-03-11 15:49 ` [PATCH v3 02/13] xen/passthrough: amd: Remove domain_id from hvm_iommu Julien Grall
2014-03-18 16:19   ` Ian Campbell
2014-03-18 16:32     ` Jan Beulich
2014-03-11 15:49 ` [PATCH v3 03/13] xen/dts: Add dt_property_read_bool Julien Grall
2014-03-11 15:49 ` [PATCH v3 04/13] xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle Julien Grall
2014-03-18 16:20   ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 05/13] xen/passthrough: rework dom0_pvh_reqs to use it also on ARM Julien Grall
2014-03-18 16:22   ` Ian Campbell
2014-03-18 17:28     ` Julien Grall
2014-03-18 17:50       ` Ian Campbell
2014-03-18 18:19         ` Julien Grall
2014-03-19 10:01           ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 06/13] xen/passthrough: iommu: Split generic IOMMU code Julien Grall
2014-03-11 16:50   ` Jan Beulich
2014-03-11 17:09     ` Julien Grall
2014-03-12  7:15       ` Jan Beulich
2014-03-18 16:24   ` Ian Campbell
2014-03-18 17:36     ` Julien Grall
2014-03-18 17:50       ` Ian Campbell
2014-03-18 18:21         ` Julien Grall
2014-03-19 10:02           ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 07/13] xen/passthrough: iommu: Introduce arch specific code Julien Grall
2014-03-11 16:15   ` Julien Grall
2014-03-11 16:53   ` Jan Beulich
2014-03-18 16:27   ` Ian Campbell
2014-03-18 19:40     ` Julien Grall
2014-03-11 15:49 ` [PATCH v3 08/13] xen/passthrough: iommu: Basic support of device tree assignment Julien Grall
2014-03-11 16:55   ` Jan Beulich
2014-03-18 16:33   ` Ian Campbell
2014-03-18 19:46     ` Julien Grall
2014-03-19 10:12       ` Ian Campbell
2014-03-19 10:42         ` Julien Grall
2014-03-19 10:54           ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 09/13] xen/passthrough: Introduce IOMMU ARM architecture Julien Grall
2014-03-18 16:40   ` Ian Campbell
2014-03-18 19:58     ` Julien Grall
2014-03-19 10:29       ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 10/13] MAINTAINERS: Add drivers/passthrough/arm Julien Grall
2014-03-11 15:49 ` [PATCH v3 11/13] xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled Julien Grall
2014-03-18 16:41   ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 12/13] xen/arm: Add the property "protected-devices" in the hypervisor node Julien Grall
2014-03-18 16:48   ` Ian Campbell
2014-03-18 20:09     ` Julien Grall
2014-03-19 10:33       ` Ian Campbell
2014-04-03 21:51         ` Julien Grall
2014-04-04  9:40           ` Ian Campbell
2014-04-04 10:25             ` Julien Grall
2014-04-04 10:28               ` Ian Campbell
2014-04-04 10:39                 ` Julien Grall
2014-04-04 10:48                   ` Ian Campbell
2014-04-04 11:01                     ` Julien Grall
2014-04-04 11:13                       ` Ian Campbell
2014-04-04 11:23                         ` Julien Grall
2014-04-04 12:45                           ` Ian Campbell
2014-04-04 13:10                             ` Julien Grall
2014-04-04 13:18                               ` Ian Campbell
2014-03-11 15:49 ` [PATCH v3 13/13] drivers/passthrough: arm: Add support for SMMU drivers Julien Grall
2014-03-18 16:54   ` Ian Campbell
2014-03-18 20:25     ` Julien Grall
2014-03-19 10:35       ` Ian Campbell
2014-03-19 10:44         ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.