All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM
@ 2017-07-26 15:09 Oleksandr Tyshchenko
  2017-07-26 15:09 ` [RFC PATCH v1 1/7] iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support Oleksandr Tyshchenko
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hi, all.

The purpose of this patch series is to add IPMMU-VMSA support to Xen on ARM.
It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car Gen3 SoCs (ARM).
And this IOMMU can't share the page table with the CPU since it doesn't use the same page-table format
as the CPU on ARM therefore I name it "Non-shared" IOMMU.
This all means that current patch series must be based on "Non-shared" IOMMU support [1]
for the IPMMU-VMSA to be functional inside Xen.

The IPMMU-VMSA driver as well as the ARM LPAE allocator were directly ported from BSP for Linux the vendor provides:
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git rcar-3.5.3

Patch series was rebased on Xen 4.9.0 release and tested on Renesas R-Car Gen3 H3 ES2.0/M3 based boards
with devices assigned to different domains.

You can find patch series here:
repo: https://github.com/otyshchenko1/xen.git branch: ipmmu_v2

P.S. There is one more patch which needs to be brought back to life [2]
Any reasons why this patch hasn't been upstremed yet?

Thank you.

[1] [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM
https://www.mail-archive.com/xen-devel@lists.xen.org/msg115901.html

[2] [Xen-devel] [PATCH v8 02/28] xen: Add log2 functionality
https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00031.html

Oleksandr Tyshchenko (7):
  iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support
  iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  iommu/arm: ipmmu-vmsa: Add io-pgtables support
  iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables
  iommu/arm: Build IPMMU-VMSA related stuff
  iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously
  iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it

 xen/drivers/passthrough/arm/Makefile         |    3 +
 xen/drivers/passthrough/arm/io-pgtable-arm.c | 1331 +++++++++++++
 xen/drivers/passthrough/arm/io-pgtable.c     |   91 +
 xen/drivers/passthrough/arm/io-pgtable.h     |  220 +++
 xen/drivers/passthrough/arm/ipmmu-vmsa.c     | 2611 ++++++++++++++++++++++++++
 5 files changed, 4256 insertions(+)
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable-arm.c
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable.c
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable.h
 create mode 100644 xen/drivers/passthrough/arm/ipmmu-vmsa.c

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 1/7] iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
@ 2017-07-26 15:09 ` Oleksandr Tyshchenko
  2017-07-26 15:09 ` [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver Oleksandr Tyshchenko
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

It is VMSA-compatible IOMMU that integrated in the
newest Renesas SoCs (ARM). Copy Linux IPMMU driver as is for now.
Next patches will show what is going on.

Linux driver was taken here:
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git rcar-3.5.3

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/ipmmu-vmsa.c | 1647 ++++++++++++++++++++++++++++++
 1 file changed, 1647 insertions(+)
 create mode 100644 xen/drivers/passthrough/arm/ipmmu-vmsa.c

diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
new file mode 100644
index 0000000..2b380ff
--- /dev/null
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -0,0 +1,1647 @@
+/*
+ * IPMMU VMSA
+ *
+ * Copyright (C) 2014 Renesas Electronics Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ */
+
+#include <linux/bitmap.h>
+#include <linux/delay.h>
+#include <linux/dma-iommu.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_iommu.h>
+#include <linux/platform_device.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+#include <asm/dma-iommu.h>
+#include <asm/pgalloc.h>
+#endif
+
+#include "io-pgtable.h"
+
+#define IPMMU_CTX_MAX 8
+
+struct ipmmu_features {
+	bool use_ns_alias_offset;
+	bool has_cache_leaf_nodes;
+	bool has_eight_ctx;
+	bool setup_imbuscr;
+	bool twobit_imttbcr_sl0;
+};
+
+#ifdef CONFIG_RCAR_DDR_BACKUP
+struct hw_register {
+	char *reg_name;
+	unsigned int reg_offset;
+	unsigned int reg_data;
+};
+#endif
+
+struct ipmmu_vmsa_device {
+	struct device *dev;
+	void __iomem *base;
+	struct list_head list;
+	const struct ipmmu_features *features;
+	bool is_leaf;
+	unsigned int num_utlbs;
+	unsigned int num_ctx;
+	spinlock_t lock;			/* Protects ctx and domains[] */
+	DECLARE_BITMAP(ctx, IPMMU_CTX_MAX);
+	struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	struct hw_register *reg_backup[IPMMU_CTX_MAX];
+#endif
+
+	struct dma_iommu_mapping *mapping;
+};
+
+struct ipmmu_vmsa_domain {
+	struct ipmmu_vmsa_device *mmu;
+	struct ipmmu_vmsa_device *root;
+	struct iommu_domain io_domain;
+
+	struct io_pgtable_cfg cfg;
+	struct io_pgtable_ops *iop;
+
+	unsigned int context_id;
+	spinlock_t lock;			/* Protects mappings */
+};
+
+struct ipmmu_vmsa_archdata {
+	struct ipmmu_vmsa_device *mmu;
+	unsigned int *utlbs;
+	unsigned int num_utlbs;
+	struct device *dev;
+	struct list_head list;
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	unsigned int *utlbs_val;
+	unsigned int *asids_val;
+#endif
+};
+
+static DEFINE_SPINLOCK(ipmmu_devices_lock);
+static LIST_HEAD(ipmmu_devices);
+
+static DEFINE_SPINLOCK(ipmmu_slave_devices_lock);
+static LIST_HEAD(ipmmu_slave_devices);
+
+static struct ipmmu_vmsa_domain *to_vmsa_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct ipmmu_vmsa_domain, io_domain);
+}
+
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
+{
+	return dev->archdata.iommu;
+}
+static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
+{
+	dev->archdata.iommu = p;
+}
+#else
+static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
+{
+	return NULL;
+}
+static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
+{
+}
+#endif
+
+#define TLB_LOOP_TIMEOUT		100	/* 100us */
+
+/* -----------------------------------------------------------------------------
+ * Registers Definition
+ */
+
+#define IM_NS_ALIAS_OFFSET		0x800
+
+#define IM_CTX_SIZE			0x40
+
+#define IMCTR				0x0000
+#define IMCTR_TRE			(1 << 17)
+#define IMCTR_AFE			(1 << 16)
+#define IMCTR_RTSEL_MASK		(3 << 4)
+#define IMCTR_RTSEL_SHIFT		4
+#define IMCTR_TREN			(1 << 3)
+#define IMCTR_INTEN			(1 << 2)
+#define IMCTR_FLUSH			(1 << 1)
+#define IMCTR_MMUEN			(1 << 0)
+
+#define IMCAAR				0x0004
+
+#define IMTTBCR				0x0008
+#define IMTTBCR_EAE			(1 << 31)
+#define IMTTBCR_PMB			(1 << 30)
+#define IMTTBCR_SH1_NON_SHAREABLE	(0 << 28)
+#define IMTTBCR_SH1_OUTER_SHAREABLE	(2 << 28)
+#define IMTTBCR_SH1_INNER_SHAREABLE	(3 << 28)
+#define IMTTBCR_SH1_MASK		(3 << 28)
+#define IMTTBCR_ORGN1_NC		(0 << 26)
+#define IMTTBCR_ORGN1_WB_WA		(1 << 26)
+#define IMTTBCR_ORGN1_WT		(2 << 26)
+#define IMTTBCR_ORGN1_WB		(3 << 26)
+#define IMTTBCR_ORGN1_MASK		(3 << 26)
+#define IMTTBCR_IRGN1_NC		(0 << 24)
+#define IMTTBCR_IRGN1_WB_WA		(1 << 24)
+#define IMTTBCR_IRGN1_WT		(2 << 24)
+#define IMTTBCR_IRGN1_WB		(3 << 24)
+#define IMTTBCR_IRGN1_MASK		(3 << 24)
+#define IMTTBCR_TSZ1_MASK		(7 << 16)
+#define IMTTBCR_TSZ1_SHIFT		16
+#define IMTTBCR_SH0_NON_SHAREABLE	(0 << 12)
+#define IMTTBCR_SH0_OUTER_SHAREABLE	(2 << 12)
+#define IMTTBCR_SH0_INNER_SHAREABLE	(3 << 12)
+#define IMTTBCR_SH0_MASK		(3 << 12)
+#define IMTTBCR_ORGN0_NC		(0 << 10)
+#define IMTTBCR_ORGN0_WB_WA		(1 << 10)
+#define IMTTBCR_ORGN0_WT		(2 << 10)
+#define IMTTBCR_ORGN0_WB		(3 << 10)
+#define IMTTBCR_ORGN0_MASK		(3 << 10)
+#define IMTTBCR_IRGN0_NC		(0 << 8)
+#define IMTTBCR_IRGN0_WB_WA		(1 << 8)
+#define IMTTBCR_IRGN0_WT		(2 << 8)
+#define IMTTBCR_IRGN0_WB		(3 << 8)
+#define IMTTBCR_IRGN0_MASK		(3 << 8)
+#define IMTTBCR_SL0_LVL_2		(0 << 4)
+#define IMTTBCR_SL0_LVL_1		(1 << 4)
+#define IMTTBCR_TSZ0_MASK		(7 << 0)
+#define IMTTBCR_TSZ0_SHIFT		O
+
+#define IMTTBCR_SL0_TWOBIT_LVL_3	(0 << 6)
+#define IMTTBCR_SL0_TWOBIT_LVL_2	(1 << 6)
+#define IMTTBCR_SL0_TWOBIT_LVL_1	(2 << 6)
+
+#define IMBUSCR				0x000c
+#define IMBUSCR_DVM			(1 << 2)
+#define IMBUSCR_BUSSEL_SYS		(0 << 0)
+#define IMBUSCR_BUSSEL_CCI		(1 << 0)
+#define IMBUSCR_BUSSEL_IMCAAR		(2 << 0)
+#define IMBUSCR_BUSSEL_CCI_IMCAAR	(3 << 0)
+#define IMBUSCR_BUSSEL_MASK		(3 << 0)
+
+#define IMTTLBR0			0x0010
+#define IMTTUBR0			0x0014
+#define IMTTLBR1			0x0018
+#define IMTTUBR1			0x001c
+
+#define IMSTR				0x0020
+#define IMSTR_ERRLVL_MASK		(3 << 12)
+#define IMSTR_ERRLVL_SHIFT		12
+#define IMSTR_ERRCODE_TLB_FORMAT	(1 << 8)
+#define IMSTR_ERRCODE_ACCESS_PERM	(4 << 8)
+#define IMSTR_ERRCODE_SECURE_ACCESS	(5 << 8)
+#define IMSTR_ERRCODE_MASK		(7 << 8)
+#define IMSTR_MHIT			(1 << 4)
+#define IMSTR_ABORT			(1 << 2)
+#define IMSTR_PF			(1 << 1)
+#define IMSTR_TF			(1 << 0)
+
+#define IMMAIR0				0x0028
+#define IMMAIR1				0x002c
+#define IMMAIR_ATTR_MASK		0xff
+#define IMMAIR_ATTR_DEVICE		0x04
+#define IMMAIR_ATTR_NC			0x44
+#define IMMAIR_ATTR_WBRWA		0xff
+#define IMMAIR_ATTR_SHIFT(n)		((n) << 3)
+#define IMMAIR_ATTR_IDX_NC		0
+#define IMMAIR_ATTR_IDX_WBRWA		1
+#define IMMAIR_ATTR_IDX_DEV		2
+
+#define IMEAR				0x0030
+
+#define IMPCTR				0x0200
+#define IMPSTR				0x0208
+#define IMPEAR				0x020c
+#define IMPMBA(n)			(0x0280 + ((n) * 4))
+#define IMPMBD(n)			(0x02c0 + ((n) * 4))
+
+#define IMUCTR(n)			(0x0300 + ((n) * 16))
+#define IMUCTR2(n)			(0x0600 + ((n) * 16))
+#define IMUCTR_FIXADDEN			(1 << 31)
+#define IMUCTR_FIXADD_MASK		(0xff << 16)
+#define IMUCTR_FIXADD_SHIFT		16
+#define IMUCTR_TTSEL_MMU(n)		((n) << 4)
+#define IMUCTR_TTSEL_PMB		(8 << 4)
+#define IMUCTR_TTSEL_MASK		(15 << 4)
+#define IMUCTR_FLUSH			(1 << 1)
+#define IMUCTR_MMUEN			(1 << 0)
+
+#define IMUASID(n)			(0x0308 + ((n) * 16))
+#define IMUASID2(n)			(0x0608 + ((n) * 16))
+#define IMUASID_ASID8_MASK		(0xff << 8)
+#define IMUASID_ASID8_SHIFT		8
+#define IMUASID_ASID0_MASK		(0xff << 0)
+#define IMUASID_ASID0_SHIFT		0
+
+#ifdef CONFIG_RCAR_DDR_BACKUP
+#define HW_REGISTER_BACKUP_SIZE		ARRAY_SIZE(root_pgtable0_reg)
+static struct hw_register root_pgtable0_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable1_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable2_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable3_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable4_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable5_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable6_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register root_pgtable7_reg[] = {
+	{"IMTTLBR0",	IMTTLBR0,	0},
+	{"IMTTUBR0",	IMTTUBR0,	0},
+	{"IMTTBCR",	IMTTBCR,	0},
+	{"IMTTLBR1",	IMTTLBR1,	0},
+	{"IMTTUBR1",	IMTTUBR1,	0},
+	{"IMMAIR0",	IMMAIR0,	0},
+	{"IMMAIR1",	IMMAIR1,	0},
+	{"IMCTR",	IMCTR,		0},
+};
+
+static struct hw_register *root_pgtable[IPMMU_CTX_MAX] = {
+	root_pgtable0_reg,
+	root_pgtable1_reg,
+	root_pgtable2_reg,
+	root_pgtable3_reg,
+	root_pgtable4_reg,
+	root_pgtable5_reg,
+	root_pgtable6_reg,
+	root_pgtable7_reg,
+};
+
+#endif
+/* -----------------------------------------------------------------------------
+ * Root device handling
+ */
+
+static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
+{
+	if (mmu->features->has_cache_leaf_nodes)
+		return mmu->is_leaf ? false : true;
+	else
+		return true; /* older IPMMU hardware treated as single root */
+}
+
+static struct ipmmu_vmsa_device *ipmmu_find_root(struct ipmmu_vmsa_device *leaf)
+{
+	struct ipmmu_vmsa_device *mmu = NULL;
+
+	if (ipmmu_is_root(leaf))
+		return leaf;
+
+	spin_lock(&ipmmu_devices_lock);
+
+	list_for_each_entry(mmu, &ipmmu_devices, list) {
+		if (ipmmu_is_root(mmu))
+			break;
+	}
+
+	spin_unlock(&ipmmu_devices_lock);
+	return mmu;
+}
+
+/* -----------------------------------------------------------------------------
+ * Read/Write Access
+ */
+
+static u32 ipmmu_read(struct ipmmu_vmsa_device *mmu, unsigned int offset)
+{
+	return ioread32(mmu->base + offset);
+}
+
+static void ipmmu_write(struct ipmmu_vmsa_device *mmu, unsigned int offset,
+			u32 data)
+{
+	iowrite32(data, mmu->base + offset);
+}
+
+static u32 ipmmu_ctx_read(struct ipmmu_vmsa_domain *domain, unsigned int reg)
+{
+	return ipmmu_read(domain->root, domain->context_id * IM_CTX_SIZE + reg);
+}
+
+static void ipmmu_ctx_write(struct ipmmu_vmsa_domain *domain, unsigned int reg,
+			    u32 data)
+{
+	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
+}
+
+static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned int reg,
+			     u32 data)
+{
+	if (domain->mmu != domain->root)
+		ipmmu_write(domain->mmu,
+			    domain->context_id * IM_CTX_SIZE + reg, data);
+
+	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
+}
+
+/* -----------------------------------------------------------------------------
+ * TLB and microTLB Management
+ */
+
+/* Wait for any pending TLB invalidations to complete */
+static void ipmmu_tlb_sync(struct ipmmu_vmsa_domain *domain)
+{
+	unsigned int count = 0;
+
+	while (ipmmu_ctx_read(domain, IMCTR) & IMCTR_FLUSH) {
+		cpu_relax();
+		if (++count == TLB_LOOP_TIMEOUT) {
+			dev_err_ratelimited(domain->mmu->dev,
+			"TLB sync timed out -- MMU may be deadlocked\n");
+			return;
+		}
+		udelay(1);
+	}
+}
+
+static void ipmmu_tlb_invalidate(struct ipmmu_vmsa_domain *domain)
+{
+	u32 reg;
+
+	reg = ipmmu_ctx_read(domain, IMCTR);
+	reg |= IMCTR_FLUSH;
+	ipmmu_ctx_write2(domain, IMCTR, reg);
+
+	ipmmu_tlb_sync(domain);
+}
+
+/*
+ * Enable MMU translation for the microTLB.
+ */
+static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain *domain,
+			      unsigned int utlb)
+{
+	struct ipmmu_vmsa_device *mmu = domain->mmu;
+	unsigned int offset;
+
+	/*
+	 * TODO: Reference-count the microTLB as several bus masters can be
+	 * connected to the same microTLB.
+	 */
+
+	/* TODO: What should we set the ASID to ? */
+	offset = (utlb < 32) ? IMUASID(utlb) : IMUASID2(utlb - 32);
+	ipmmu_write(mmu, offset, 0);
+
+	/* TODO: Do we need to flush the microTLB ? */
+	offset = (utlb < 32) ? IMUCTR(utlb) : IMUCTR2(utlb - 32);
+	ipmmu_write(mmu, offset,
+		    IMUCTR_TTSEL_MMU(domain->context_id) | IMUCTR_FLUSH |
+		    IMUCTR_MMUEN);
+}
+
+/*
+ * Disable MMU translation for the microTLB.
+ */
+static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain,
+			       unsigned int utlb)
+{
+	struct ipmmu_vmsa_device *mmu = domain->mmu;
+	unsigned int offset;
+
+	offset = (utlb < 32) ? IMUCTR(utlb) : IMUCTR2(utlb - 32);
+	ipmmu_write(mmu, offset, 0);
+}
+
+static void ipmmu_tlb_flush_all(void *cookie)
+{
+	struct ipmmu_vmsa_domain *domain = cookie;
+
+	ipmmu_tlb_invalidate(domain);
+}
+
+static void ipmmu_tlb_add_flush(unsigned long iova, size_t size,
+				size_t granule, bool leaf, void *cookie)
+{
+	/* The hardware doesn't support selective TLB flush. */
+}
+
+static struct iommu_gather_ops ipmmu_gather_ops = {
+	.tlb_flush_all = ipmmu_tlb_flush_all,
+	.tlb_add_flush = ipmmu_tlb_add_flush,
+	.tlb_sync = ipmmu_tlb_flush_all,
+};
+
+/* -----------------------------------------------------------------------------
+ * Domain/Context Management
+ */
+
+static int ipmmu_domain_allocate_context(struct ipmmu_vmsa_device *mmu,
+					 struct ipmmu_vmsa_domain *domain)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&mmu->lock, flags);
+
+	ret = find_first_zero_bit(mmu->ctx, mmu->num_ctx);
+	if (ret != mmu->num_ctx) {
+		mmu->domains[ret] = domain;
+		set_bit(ret, mmu->ctx);
+	} else
+		ret = -EBUSY;
+
+	spin_unlock_irqrestore(&mmu->lock, flags);
+
+	return ret;
+}
+
+static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
+{
+	u64 ttbr;
+	u32 tmp;
+	int ret;
+
+	/*
+	 * Allocate the page table operations.
+	 *
+	 * VMSA states in section B3.6.3 "Control of Secure or Non-secure memory
+	 * access, Long-descriptor format" that the NStable bit being set in a
+	 * table descriptor will result in the NStable and NS bits of all child
+	 * entries being ignored and considered as being set. The IPMMU seems
+	 * not to comply with this, as it generates a secure access page fault
+	 * if any of the NStable and NS bits isn't set when running in
+	 * non-secure mode.
+	 */
+	domain->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS;
+	domain->cfg.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
+	domain->cfg.ias = 32;
+	domain->cfg.oas = 40;
+	domain->cfg.tlb = &ipmmu_gather_ops;
+	domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32);
+	domain->io_domain.geometry.force_aperture = true;
+	/*
+	 * TODO: Add support for coherent walk through CCI with DVM and remove
+	 * cache handling. For now, delegate it to the io-pgtable code.
+	 */
+	domain->cfg.iommu_dev = domain->root->dev;
+
+	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
+					   domain);
+	if (!domain->iop)
+		return -EINVAL;
+
+	/*
+	 * Find an unused context.
+	 */
+	ret = ipmmu_domain_allocate_context(domain->root, domain);
+	if (ret < 0) {
+		free_io_pgtable_ops(domain->iop);
+		return ret;
+	}
+
+	domain->context_id = ret;
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	domain->root->reg_backup[ret] = root_pgtable[ret];
+#endif
+
+	/* TTBR0 */
+	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
+	ipmmu_ctx_write(domain, IMTTLBR0, ttbr);
+	ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
+
+	/*
+	 * TTBCR
+	 * We use long descriptors with inner-shareable WBWA tables and allocate
+	 * the whole 32-bit VA space to TTBR0.
+	 */
+
+	if (domain->root->features->twobit_imttbcr_sl0)
+		tmp = IMTTBCR_SL0_TWOBIT_LVL_1;
+	else
+		tmp = IMTTBCR_SL0_LVL_1;
+
+	ipmmu_ctx_write(domain, IMTTBCR, IMTTBCR_EAE |
+			IMTTBCR_SH0_INNER_SHAREABLE | IMTTBCR_ORGN0_WB_WA |
+			IMTTBCR_IRGN0_WB_WA | tmp);
+
+	/* MAIR0 */
+	ipmmu_ctx_write(domain, IMMAIR0, domain->cfg.arm_lpae_s1_cfg.mair[0]);
+
+	/* IMBUSCR */
+	if (domain->root->features->setup_imbuscr)
+		ipmmu_ctx_write(domain, IMBUSCR,
+				ipmmu_ctx_read(domain, IMBUSCR) &
+				~(IMBUSCR_DVM | IMBUSCR_BUSSEL_MASK));
+	/*
+	 * IMSTR
+	 * Clear all interrupt flags.
+	 */
+	ipmmu_ctx_write(domain, IMSTR, ipmmu_ctx_read(domain, IMSTR));
+
+	/*
+	 * IMCTR
+	 * Enable the MMU and interrupt generation. The long-descriptor
+	 * translation table format doesn't use TEX remapping. Don't enable AF
+	 * software management as we have no use for it. Flush the TLB as
+	 * required when modifying the context registers.
+	 */
+	ipmmu_ctx_write2(domain, IMCTR,
+			 IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
+
+	return 0;
+}
+
+static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
+				      unsigned int context_id)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&mmu->lock, flags);
+
+	clear_bit(context_id, mmu->ctx);
+	mmu->domains[context_id] = NULL;
+
+	spin_unlock_irqrestore(&mmu->lock, flags);
+}
+
+static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
+{
+	/*
+	 * Disable the context. Flush the TLB as required when modifying the
+	 * context registers.
+	 *
+	 * TODO: Is TLB flush really needed ?
+	 */
+	ipmmu_ctx_write2(domain, IMCTR, IMCTR_FLUSH);
+	ipmmu_tlb_sync(domain);
+
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	domain->root->reg_backup[domain->context_id] = NULL;
+#endif
+
+	ipmmu_domain_free_context(domain->root, domain->context_id);
+}
+
+/* -----------------------------------------------------------------------------
+ * Fault Handling
+ */
+
+static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
+{
+	const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF | IMSTR_TF;
+	struct ipmmu_vmsa_device *mmu = domain->mmu;
+	u32 status;
+	u32 iova;
+
+	status = ipmmu_ctx_read(domain, IMSTR);
+	if (!(status & err_mask))
+		return IRQ_NONE;
+
+	iova = ipmmu_ctx_read(domain, IMEAR);
+
+	/*
+	 * Clear the error status flags. Unlike traditional interrupt flag
+	 * registers that must be cleared by writing 1, this status register
+	 * seems to require 0. The error address register must be read before,
+	 * otherwise its value will be 0.
+	 */
+	ipmmu_ctx_write(domain, IMSTR, 0);
+
+	/* Log fatal errors. */
+	if (status & IMSTR_MHIT)
+		dev_err_ratelimited(mmu->dev, "Multiple TLB hits @0x%08x\n",
+				    iova);
+	if (status & IMSTR_ABORT)
+		dev_err_ratelimited(mmu->dev, "Page Table Walk Abort @0x%08x\n",
+				    iova);
+
+	if (!(status & (IMSTR_PF | IMSTR_TF)))
+		return IRQ_NONE;
+
+	/*
+	 * Try to handle page faults and translation faults.
+	 *
+	 * TODO: We need to look up the faulty device based on the I/O VA. Use
+	 * the IOMMU device for now.
+	 */
+	if (!report_iommu_fault(&domain->io_domain, mmu->dev, iova, 0))
+		return IRQ_HANDLED;
+
+	dev_err_ratelimited(mmu->dev,
+			    "Unhandled fault: status 0x%08x iova 0x%08x\n",
+			    status, iova);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ipmmu_irq(int irq, void *dev)
+{
+	struct ipmmu_vmsa_device *mmu = dev;
+	irqreturn_t status = IRQ_NONE;
+	unsigned int i;
+	unsigned long flags;
+
+	spin_lock_irqsave(&mmu->lock, flags);
+
+	/*
+	 * Check interrupts for all active contexts.
+	 */
+	for (i = 0; i < mmu->num_ctx; i++) {
+		if (!mmu->domains[i])
+			continue;
+		if (ipmmu_domain_irq(mmu->domains[i]) == IRQ_HANDLED)
+			status = IRQ_HANDLED;
+	}
+
+	spin_unlock_irqrestore(&mmu->lock, flags);
+
+	return status;
+}
+
+/* -----------------------------------------------------------------------------
+ * IOMMU Operations
+ */
+
+static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
+{
+	struct ipmmu_vmsa_domain *domain;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+
+	spin_lock_init(&domain->lock);
+
+	return &domain->io_domain;
+}
+
+static void ipmmu_domain_free(struct iommu_domain *io_domain)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	/*
+	 * Free the domain resources. We assume that all devices have already
+	 * been detached.
+	 */
+	ipmmu_domain_destroy_context(domain);
+	free_io_pgtable_ops(domain->iop);
+	kfree(domain);
+}
+
+static int ipmmu_attach_device(struct iommu_domain *io_domain,
+			       struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
+	struct ipmmu_vmsa_device *root, *mmu = archdata->mmu;
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	unsigned long flags;
+	unsigned int i;
+	int ret = 0;
+
+	if (!mmu) {
+		dev_err(dev, "Cannot attach to IPMMU\n");
+		return -ENXIO;
+	}
+
+	root = ipmmu_find_root(archdata->mmu);
+	if (!root) {
+		dev_err(dev, "Unable to locate root IPMMU\n");
+		return -EAGAIN;
+	}
+
+	spin_lock_irqsave(&domain->lock, flags);
+
+	if (!domain->mmu) {
+		/* The domain hasn't been used yet, initialize it. */
+		domain->mmu = mmu;
+		domain->root = root;
+		ret = ipmmu_domain_init_context(domain);
+		if (ret < 0) {
+			dev_err(dev, "Unable to initialize IPMMU context\n");
+			domain->mmu = NULL;
+		} else {
+			dev_info(dev, "Using IPMMU context %u\n",
+				 domain->context_id);
+		}
+	} else if (domain->mmu != mmu) {
+		/*
+		 * Something is wrong, we can't attach two devices using
+		 * different IOMMUs to the same domain.
+		 */
+		dev_err(dev, "Can't attach IPMMU %s to domain on IPMMU %s\n",
+			dev_name(mmu->dev), dev_name(domain->mmu->dev));
+		ret = -EINVAL;
+	} else {
+			dev_info(dev, "Reusing IPMMU context %u\n",
+				 domain->context_id);
+	}
+
+	spin_unlock_irqrestore(&domain->lock, flags);
+
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < archdata->num_utlbs; ++i)
+		ipmmu_utlb_enable(domain, archdata->utlbs[i]);
+
+	return 0;
+}
+
+static void ipmmu_detach_device(struct iommu_domain *io_domain,
+				struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	unsigned int i;
+
+	for (i = 0; i < archdata->num_utlbs; ++i)
+		ipmmu_utlb_disable(domain, archdata->utlbs[i]);
+
+	/*
+	 * TODO: Optimize by disabling the context when no device is attached.
+	 */
+}
+
+static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
+		     phys_addr_t paddr, size_t size, int prot)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	if (!domain)
+		return -ENODEV;
+
+	return domain->iop->map(domain->iop, iova, paddr, size, prot);
+}
+
+static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
+			  size_t size)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	return domain->iop->unmap(domain->iop, iova, size);
+}
+
+static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain,
+				      dma_addr_t iova)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	/* TODO: Is locking needed ? */
+
+	return domain->iop->iova_to_phys(domain->iop, iova);
+}
+
+static struct device *ipmmu_find_sibling_device(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
+	struct ipmmu_vmsa_archdata *sibling_archdata = NULL;
+	bool found = false;
+
+	spin_lock(&ipmmu_slave_devices_lock);
+
+	list_for_each_entry(sibling_archdata, &ipmmu_slave_devices, list) {
+		if (archdata == sibling_archdata)
+			continue;
+		if (sibling_archdata->mmu == archdata->mmu) {
+			found = true;
+			break;
+		}
+	}
+
+	spin_unlock(&ipmmu_slave_devices_lock);
+
+	return found ? sibling_archdata->dev : NULL;
+}
+
+static struct iommu_group *ipmmu_find_group(struct device *dev)
+{
+	struct iommu_group *group;
+	struct device *sibling;
+
+	sibling = ipmmu_find_sibling_device(dev);
+	if (sibling)
+		group = iommu_group_get(sibling);
+	if (!sibling || IS_ERR(group))
+		group = generic_device_group(dev);
+
+	return group;
+}
+
+static int ipmmu_find_utlbs(struct ipmmu_vmsa_device *mmu, struct device *dev,
+			    unsigned int *utlbs, unsigned int num_utlbs)
+{
+	unsigned int i;
+
+	for (i = 0; i < num_utlbs; ++i) {
+		struct of_phandle_args args;
+		int ret;
+
+		ret = of_parse_phandle_with_args(dev->of_node, "iommus",
+						 "#iommu-cells", i, &args);
+		if (ret < 0)
+			return ret;
+
+		of_node_put(args.np);
+
+		if (args.np != mmu->dev->of_node || args.args_count != 1)
+			return -EINVAL;
+
+		utlbs[i] = args.args[0];
+	}
+
+	return 0;
+}
+
+static int ipmmu_init_platform_device(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata;
+	struct ipmmu_vmsa_device *mmu;
+	unsigned int *utlbs;
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	unsigned int *utlbs_val, *asids_val;
+#endif
+	unsigned int i;
+	int num_utlbs;
+	int ret = -ENODEV;
+
+	/* Find the master corresponding to the device. */
+
+	num_utlbs = of_count_phandle_with_args(dev->of_node, "iommus",
+					       "#iommu-cells");
+	if (num_utlbs < 0)
+		return -ENODEV;
+
+	utlbs = kcalloc(num_utlbs, sizeof(*utlbs), GFP_KERNEL);
+	if (!utlbs)
+		return -ENOMEM;
+
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	utlbs_val = kcalloc(num_utlbs, sizeof(*utlbs_val), GFP_KERNEL);
+	if (!utlbs_val)
+		return -ENOMEM;
+	asids_val = kcalloc(num_utlbs, sizeof(*asids_val), GFP_KERNEL);
+	if (!asids_val)
+		return -ENOMEM;
+#endif
+
+	spin_lock(&ipmmu_devices_lock);
+
+	list_for_each_entry(mmu, &ipmmu_devices, list) {
+		ret = ipmmu_find_utlbs(mmu, dev, utlbs, num_utlbs);
+		if (!ret) {
+			/*
+			 * TODO Take a reference to the MMU to protect
+			 * against device removal.
+			 */
+			break;
+		}
+	}
+
+	spin_unlock(&ipmmu_devices_lock);
+
+	if (ret < 0)
+		goto error;
+
+	for (i = 0; i < num_utlbs; ++i) {
+		if (utlbs[i] >= mmu->num_utlbs) {
+			ret = -EINVAL;
+			goto error;
+		}
+	}
+
+	archdata = kzalloc(sizeof(*archdata), GFP_KERNEL);
+	if (!archdata) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	archdata->mmu = mmu;
+	archdata->utlbs = utlbs;
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	archdata->utlbs_val = utlbs_val;
+	archdata->asids_val = asids_val;
+#endif
+	archdata->num_utlbs = num_utlbs;
+	archdata->dev = dev;
+	set_archdata(dev, archdata);
+	return 0;
+
+error:
+	kfree(utlbs);
+	return ret;
+}
+
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+
+static int ipmmu_add_device(struct device *dev)
+{
+	struct ipmmu_vmsa_device *mmu = NULL;
+	struct iommu_group *group;
+	int ret;
+
+	if (to_archdata(dev)) {
+		dev_warn(dev, "IOMMU driver already assigned to device %s\n",
+			 dev_name(dev));
+		return -EINVAL;
+	}
+
+	/* Create a device group and add the device to it. */
+	group = iommu_group_alloc();
+	if (IS_ERR(group)) {
+		dev_err(dev, "Failed to allocate IOMMU group\n");
+		ret = PTR_ERR(group);
+		goto error;
+	}
+
+	ret = iommu_group_add_device(group, dev);
+	iommu_group_put(group);
+
+	if (ret < 0) {
+		dev_err(dev, "Failed to add device to IPMMU group\n");
+		group = NULL;
+		goto error;
+	}
+
+	ret = ipmmu_init_platform_device(dev);
+	if (ret < 0)
+		goto error;
+
+	/*
+	 * Create the ARM mapping, used by the ARM DMA mapping core to allocate
+	 * VAs. This will allocate a corresponding IOMMU domain.
+	 *
+	 * TODO:
+	 * - Create one mapping per context (TLB).
+	 * - Make the mapping size configurable ? We currently use a 2GB mapping
+	 *   at a 1GB offset to ensure that NULL VAs will fault.
+	 */
+	mmu = to_archdata(dev)->mmu;
+	if (!mmu->mapping) {
+		struct dma_iommu_mapping *mapping;
+
+		mapping = arm_iommu_create_mapping(&platform_bus_type,
+						   SZ_1G, SZ_2G);
+		if (IS_ERR(mapping)) {
+			dev_err(mmu->dev, "failed to create ARM IOMMU mapping\n");
+			ret = PTR_ERR(mapping);
+			goto error;
+		}
+
+		mmu->mapping = mapping;
+	}
+
+	/* Attach the ARM VA mapping to the device. */
+	ret = arm_iommu_attach_device(dev, mmu->mapping);
+	if (ret < 0) {
+		dev_err(dev, "Failed to attach device to VA mapping\n");
+		goto error;
+	}
+
+	return 0;
+
+error:
+	if (mmu)
+		arm_iommu_release_mapping(mmu->mapping);
+
+	set_archdata(dev, NULL);
+
+	if (!IS_ERR_OR_NULL(group))
+		iommu_group_remove_device(dev);
+
+	return ret;
+}
+
+static void ipmmu_remove_device(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
+
+	arm_iommu_detach_device(dev);
+	iommu_group_remove_device(dev);
+
+	kfree(archdata->utlbs);
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	kfree(archdata->utlbs_val);
+	kfree(archdata->asids_val);
+#endif
+	kfree(archdata);
+
+	set_archdata(dev, NULL);
+}
+
+static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
+{
+	if (type != IOMMU_DOMAIN_UNMANAGED)
+		return NULL;
+
+	return __ipmmu_domain_alloc(type);
+}
+
+static const struct iommu_ops ipmmu_ops = {
+	.domain_alloc = ipmmu_domain_alloc,
+	.domain_free = ipmmu_domain_free,
+	.attach_dev = ipmmu_attach_device,
+	.detach_dev = ipmmu_detach_device,
+	.map = ipmmu_map,
+	.unmap = ipmmu_unmap,
+	.map_sg = default_iommu_map_sg,
+	.iova_to_phys = ipmmu_iova_to_phys,
+	.add_device = ipmmu_add_device,
+	.remove_device = ipmmu_remove_device,
+	.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
+};
+
+#endif /* !CONFIG_ARM && CONFIG_IOMMU_DMA */
+
+#ifdef CONFIG_IOMMU_DMA
+
+static struct iommu_domain *ipmmu_domain_alloc_dma(unsigned type)
+{
+	struct iommu_domain *io_domain = NULL;
+
+	switch (type) {
+	case IOMMU_DOMAIN_UNMANAGED:
+		io_domain = __ipmmu_domain_alloc(type);
+		break;
+
+	case IOMMU_DOMAIN_DMA:
+		io_domain = __ipmmu_domain_alloc(type);
+		if (io_domain)
+			iommu_get_dma_cookie(io_domain);
+		break;
+	}
+
+	return io_domain;
+}
+
+static void ipmmu_domain_free_dma(struct iommu_domain *io_domain)
+{
+	switch (io_domain->type) {
+	case IOMMU_DOMAIN_DMA:
+		iommu_put_dma_cookie(io_domain);
+		/* fall-through */
+	default:
+		ipmmu_domain_free(io_domain);
+		break;
+	}
+}
+
+static int ipmmu_add_device_dma(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
+	struct iommu_group *group;
+
+	/* only accept devices with iommus property */
+	if (of_count_phandle_with_args(dev->of_node, "iommus",
+				       "#iommu-cells") < 0)
+		return -ENODEV;
+
+	group = iommu_group_get_for_dev(dev);
+	if (IS_ERR(group))
+		return PTR_ERR(group);
+
+	archdata = dev->archdata.iommu;
+	spin_lock(&ipmmu_slave_devices_lock);
+	list_add(&archdata->list, &ipmmu_slave_devices);
+	spin_unlock(&ipmmu_slave_devices_lock);
+	return 0;
+}
+
+static void ipmmu_remove_device_dma(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
+
+	spin_lock(&ipmmu_slave_devices_lock);
+	list_del(&archdata->list);
+	spin_unlock(&ipmmu_slave_devices_lock);
+
+	iommu_group_remove_device(dev);
+}
+
+static struct iommu_group *ipmmu_device_group_dma(struct device *dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	ret = ipmmu_init_platform_device(dev);
+	if (!ret)
+		group = ipmmu_find_group(dev);
+	else
+		group = ERR_PTR(ret);
+
+	return group;
+}
+
+static int ipmmu_of_xlate_dma(struct device *dev,
+			      struct of_phandle_args *spec)
+{
+	/* If the IPMMU device is disabled in DT then return error
+	 * to make sure the of_iommu code does not install ops
+	 * even though the iommu device is disabled
+	 */
+	if (!of_device_is_available(spec->np))
+		return -ENODEV;
+
+	return 0;
+}
+
+static const struct iommu_ops ipmmu_ops = {
+	.domain_alloc = ipmmu_domain_alloc_dma,
+	.domain_free = ipmmu_domain_free_dma,
+	.attach_dev = ipmmu_attach_device,
+	.detach_dev = ipmmu_detach_device,
+	.map = ipmmu_map,
+	.unmap = ipmmu_unmap,
+	.map_sg = default_iommu_map_sg,
+	.iova_to_phys = ipmmu_iova_to_phys,
+	.add_device = ipmmu_add_device_dma,
+	.remove_device = ipmmu_remove_device_dma,
+	.device_group = ipmmu_device_group_dma,
+	.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
+	.of_xlate = ipmmu_of_xlate_dma,
+};
+
+#endif /* CONFIG_IOMMU_DMA */
+
+/* -----------------------------------------------------------------------------
+ * Probe/remove and init
+ */
+
+static void ipmmu_device_reset(struct ipmmu_vmsa_device *mmu)
+{
+	unsigned int i;
+
+	/* Disable all contexts. */
+	for (i = 0; i < mmu->num_ctx; ++i)
+		ipmmu_write(mmu, i * IM_CTX_SIZE + IMCTR, 0);
+}
+
+static const struct ipmmu_features ipmmu_features_default = {
+	.use_ns_alias_offset = true,
+	.has_cache_leaf_nodes = false,
+	.has_eight_ctx = false,
+	.setup_imbuscr = true,
+	.twobit_imttbcr_sl0 = false,
+};
+
+static const struct ipmmu_features ipmmu_features_rcar_gen3 = {
+	.use_ns_alias_offset = false,
+	.has_cache_leaf_nodes = true,
+	.has_eight_ctx = true,
+	.setup_imbuscr = false,
+	.twobit_imttbcr_sl0 = true,
+};
+
+static const struct of_device_id ipmmu_of_ids[] = {
+	{
+		.compatible = "renesas,ipmmu-vmsa",
+		.data = &ipmmu_features_default,
+	}, {
+		.compatible = "renesas,ipmmu-r8a7795",
+		.data = &ipmmu_features_rcar_gen3,
+	}, {
+		.compatible = "renesas,ipmmu-r8a7796",
+		.data = &ipmmu_features_rcar_gen3,
+	}, {
+		/* Terminator */
+	},
+};
+
+MODULE_DEVICE_TABLE(of, ipmmu_of_ids);
+
+static int ipmmu_probe(struct platform_device *pdev)
+{
+	struct ipmmu_vmsa_device *mmu;
+	const struct of_device_id *match;
+	struct resource *res;
+	int irq;
+	int ret;
+
+	match = of_match_node(ipmmu_of_ids, pdev->dev.of_node);
+	if (!match)
+		return -EINVAL;
+
+	mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
+	if (!mmu) {
+		dev_err(&pdev->dev, "cannot allocate device data\n");
+		return -ENOMEM;
+	}
+
+	mmu->dev = &pdev->dev;
+	mmu->num_utlbs = 48;
+	spin_lock_init(&mmu->lock);
+	bitmap_zero(mmu->ctx, IPMMU_CTX_MAX);
+	mmu->features = match->data;
+	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+
+	/* Map I/O memory and request IRQ. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	mmu->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(mmu->base))
+		return PTR_ERR(mmu->base);
+
+	/*
+	 * The IPMMU has two register banks, for secure and non-secure modes.
+	 * The bank mapped at the beginning of the IPMMU address space
+	 * corresponds to the running mode of the CPU. When running in secure
+	 * mode the non-secure register bank is also available at an offset.
+	 *
+	 * Secure mode operation isn't clearly documented and is thus currently
+	 * not implemented in the driver. Furthermore, preliminary tests of
+	 * non-secure operation with the main register bank were not successful.
+	 * Offset the registers base unconditionally to point to the non-secure
+	 * alias space for now.
+	 */
+	if (mmu->features->use_ns_alias_offset)
+		mmu->base += IM_NS_ALIAS_OFFSET;
+
+	/*
+	 * The number of contexts varies with generation and instance.
+	 * Newer SoCs get a total of 8 contexts enabled, older ones just one.
+	 */
+	if (mmu->features->has_eight_ctx)
+		mmu->num_ctx = 8;
+	else
+		mmu->num_ctx = 1;
+
+	WARN_ON(mmu->num_ctx > IPMMU_CTX_MAX);
+
+	irq = platform_get_irq(pdev, 0);
+
+	/*
+	 * Determine if this IPMMU instance is a leaf device by checking
+	 * if the renesas,ipmmu-main property exists or not.
+	 */
+	if (mmu->features->has_cache_leaf_nodes &&
+	    of_find_property(pdev->dev.of_node, "renesas,ipmmu-main", NULL))
+		mmu->is_leaf = true;
+
+	/* Root devices have mandatory IRQs */
+	if (ipmmu_is_root(mmu)) {
+		if (irq < 0) {
+			dev_err(&pdev->dev, "no IRQ found\n");
+			return irq;
+		}
+
+		ret = devm_request_irq(&pdev->dev, irq, ipmmu_irq, 0,
+				       dev_name(&pdev->dev), mmu);
+		if (ret < 0) {
+			dev_err(&pdev->dev, "failed to request IRQ %d\n", irq);
+			return ret;
+		}
+
+		ipmmu_device_reset(mmu);
+	}
+
+	/*
+	 * We can't create the ARM mapping here as it requires the bus to have
+	 * an IOMMU, which only happens when bus_set_iommu() is called in
+	 * ipmmu_init() after the probe function returns.
+	 */
+
+	spin_lock(&ipmmu_devices_lock);
+	list_add(&mmu->list, &ipmmu_devices);
+	spin_unlock(&ipmmu_devices_lock);
+
+	platform_set_drvdata(pdev, mmu);
+
+	return 0;
+}
+
+static int ipmmu_remove(struct platform_device *pdev)
+{
+	struct ipmmu_vmsa_device *mmu = platform_get_drvdata(pdev);
+
+	spin_lock(&ipmmu_devices_lock);
+	list_del(&mmu->list);
+	spin_unlock(&ipmmu_devices_lock);
+
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+	arm_iommu_release_mapping(mmu->mapping);
+#endif
+
+	ipmmu_device_reset(mmu);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP
+#ifdef CONFIG_RCAR_DDR_BACKUP
+static int ipmmu_utlbs_backup(struct ipmmu_vmsa_device *mmu)
+{
+	unsigned int i;
+	struct ipmmu_vmsa_device *slave_mmu = NULL;
+	struct ipmmu_vmsa_archdata *slave_dev = NULL;
+
+	pr_debug("%s: Handle UTLB backup\n", dev_name(mmu->dev));
+
+	spin_lock(&ipmmu_slave_devices_lock);
+
+	list_for_each_entry(slave_dev, &ipmmu_slave_devices, list) {
+		slave_mmu = slave_dev->mmu;
+
+		if (slave_mmu != mmu)
+			continue;
+
+		for (i = 0; i < slave_dev->num_utlbs; ++i) {
+			slave_dev->utlbs_val[i] =
+				ipmmu_read(slave_mmu,
+					IMUCTR(slave_dev->utlbs[i]));
+			slave_dev->asids_val[i] =
+				ipmmu_read(slave_mmu,
+					IMUASID(slave_dev->utlbs[i]));
+			pr_debug("%d: Backup UTLB[%d]: 0x%x, ASID[%d]: %d\n",
+				i, slave_dev->utlbs[i], slave_dev->utlbs_val[i],
+				slave_dev->utlbs[i],
+				slave_dev->asids_val[i]);
+		}
+	}
+
+	spin_unlock(&ipmmu_slave_devices_lock);
+
+	return 0;
+}
+
+static int ipmmu_utlbs_restore(struct ipmmu_vmsa_device *mmu)
+{
+	unsigned int i;
+	struct ipmmu_vmsa_device *slave_mmu = NULL;
+	struct ipmmu_vmsa_archdata *slave_dev = NULL;
+
+	pr_debug("%s: Handle UTLB restore\n", dev_name(mmu->dev));
+
+	spin_lock(&ipmmu_slave_devices_lock);
+
+	list_for_each_entry(slave_dev, &ipmmu_slave_devices, list) {
+		slave_mmu = slave_dev->mmu;
+
+		if (slave_mmu != mmu)
+			continue;
+
+		for (i = 0; i < slave_dev->num_utlbs; ++i) {
+			ipmmu_write(slave_mmu, IMUASID(slave_dev->utlbs[i]),
+					slave_dev->asids_val[i]);
+			ipmmu_write(slave_mmu,
+				IMUCTR(slave_dev->utlbs[i]),
+				(slave_dev->utlbs_val[i] | IMUCTR_FLUSH));
+			pr_debug("%d: Restore UTLB[%d]: 0x%x, ASID[%d]: %d\n",
+				i, slave_dev->utlbs[i],
+				ipmmu_read(slave_mmu,
+					IMUCTR(slave_dev->utlbs[i])),
+				slave_dev->utlbs[i],
+				ipmmu_read(slave_mmu,
+				IMUASID(slave_dev->utlbs[i])));
+		}
+	}
+
+	spin_unlock(&ipmmu_slave_devices_lock);
+
+	return 0;
+}
+
+static int ipmmu_domain_backup_context(struct ipmmu_vmsa_domain *domain)
+{
+	struct ipmmu_vmsa_device *mmu = domain->root;
+	struct hw_register *reg = mmu->reg_backup[domain->context_id];
+	unsigned int i;
+
+	pr_info("%s: Handle domain context backup\n", dev_name(mmu->dev));
+
+	for (i = 0; i < HW_REGISTER_BACKUP_SIZE; i++) {
+		reg[i].reg_data = ipmmu_ctx_read(domain, reg[i].reg_offset);
+
+		pr_info("%s: reg_data 0x%x, reg_offset 0x%x\n",
+				reg[i].reg_name,
+				reg[i].reg_data,
+				reg[i].reg_offset);
+	}
+
+	return 0;
+}
+
+static int ipmmu_domain_restore_context(struct ipmmu_vmsa_domain *domain)
+{
+	struct ipmmu_vmsa_device *mmu = domain->root;
+	struct hw_register *reg = mmu->reg_backup[domain->context_id];
+	unsigned int i;
+
+	pr_info("%s: Handle domain context restore\n", dev_name(mmu->dev));
+
+	for (i = 0; i < HW_REGISTER_BACKUP_SIZE; i++) {
+		if (reg[i].reg_offset != IMCTR) {
+			ipmmu_ctx_write(domain,
+				reg[i].reg_offset,
+				reg[i].reg_data);
+
+			pr_info("%s: reg_data 0x%x, reg_offset 0x%x\n",
+				reg[i].reg_name,
+				ipmmu_ctx_read(domain, reg[i].reg_offset),
+				reg[i].reg_offset);
+
+		} else {
+			ipmmu_ctx_write2(domain,
+				reg[i].reg_offset,
+				reg[i].reg_data | IMCTR_FLUSH);
+
+			pr_info("%s: reg_data 0x%x, reg_offset 0x%x\n",
+				reg[i].reg_name,
+				ipmmu_ctx_read(domain,
+					reg[i].reg_offset),
+				reg[i].reg_offset);
+		}
+	}
+
+	return 0;
+}
+#endif
+
+static int ipmmu_suspend(struct device *dev)
+{
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	int ctx;
+	unsigned int i;
+	struct ipmmu_vmsa_device *mmu = dev_get_drvdata(dev);
+
+	pr_debug("%s: %s\n", __func__, dev_name(dev));
+
+	/* Only backup UTLB in IPMMU cache devices*/
+	if (!ipmmu_is_root(mmu))
+		ipmmu_utlbs_backup(mmu);
+
+	ctx = find_first_zero_bit(mmu->ctx, mmu->num_ctx);
+
+	for (i = 0; i < ctx; i++) {
+		pr_info("Handle ctx %d\n", i);
+		ipmmu_domain_backup_context(mmu->domains[i]);
+	}
+#endif
+
+	return 0;
+}
+
+static int ipmmu_resume(struct device *dev)
+{
+#ifdef CONFIG_RCAR_DDR_BACKUP
+	int ctx;
+	unsigned int i;
+	struct ipmmu_vmsa_device *mmu = dev_get_drvdata(dev);
+
+	pr_debug("%s: %s\n", __func__, dev_name(dev));
+
+	ctx = find_first_zero_bit(mmu->ctx, mmu->num_ctx);
+
+	for (i = 0; i < ctx; i++) {
+		pr_info("Handle ctx %d\n", i);
+		ipmmu_domain_restore_context(mmu->domains[i]);
+	}
+
+	/* Only backup UTLB in IPMMU cache devices*/
+	if (!ipmmu_is_root(mmu))
+		ipmmu_utlbs_restore(mmu);
+#endif
+
+	return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(ipmmu_pm_ops,
+			ipmmu_suspend, ipmmu_resume);
+#define DEV_PM_OPS (&ipmmu_pm_ops)
+#else
+#define DEV_PM_OPS NULL
+#endif /* CONFIG_PM_SLEEP */
+
+static struct platform_driver ipmmu_driver = {
+	.driver = {
+		.name = "ipmmu-vmsa",
+		.pm	= DEV_PM_OPS,
+		.of_match_table = of_match_ptr(ipmmu_of_ids),
+	},
+	.probe = ipmmu_probe,
+	.remove	= ipmmu_remove,
+};
+
+static int __init ipmmu_init(void)
+{
+	static bool setup_done;
+	int ret;
+
+	if (setup_done)
+		return 0;
+
+	ret = platform_driver_register(&ipmmu_driver);
+	if (ret < 0)
+		return ret;
+
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+	if (!iommu_present(&platform_bus_type))
+		bus_set_iommu(&platform_bus_type, &ipmmu_ops);
+#endif
+
+	setup_done = true;
+	return 0;
+}
+
+static void __exit ipmmu_exit(void)
+{
+	return platform_driver_unregister(&ipmmu_driver);
+}
+
+subsys_initcall(ipmmu_init);
+module_exit(ipmmu_exit);
+
+#ifdef CONFIG_IOMMU_DMA
+static int __init ipmmu_vmsa_iommu_of_setup(struct device_node *np)
+{
+	static const struct iommu_ops *ops = &ipmmu_ops;
+
+	ipmmu_init();
+
+	of_iommu_set_ops(np, (struct iommu_ops *)ops);
+	if (!iommu_present(&platform_bus_type))
+		bus_set_iommu(&platform_bus_type, ops);
+
+	return 0;
+}
+
+IOMMU_OF_DECLARE(ipmmu_vmsa_iommu_of, "renesas,ipmmu-vmsa",
+		 ipmmu_vmsa_iommu_of_setup);
+IOMMU_OF_DECLARE(ipmmu_r8a7795_iommu_of, "renesas,ipmmu-r8a7795",
+		 ipmmu_vmsa_iommu_of_setup);
+IOMMU_OF_DECLARE(ipmmu_r8a7796_iommu_of, "renesas,ipmmu-r8a7796",
+		 ipmmu_vmsa_iommu_of_setup);
+#endif
+
+MODULE_DESCRIPTION("IOMMU API for Renesas VMSA-compatible IPMMU");
+MODULE_AUTHOR("Laurent Pinchart <laurent.pinchart@ideasonboard.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
  2017-07-26 15:09 ` [RFC PATCH v1 1/7] iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support Oleksandr Tyshchenko
@ 2017-07-26 15:09 ` Oleksandr Tyshchenko
  2017-08-08 11:34   ` Julien Grall
  2017-07-26 15:10 ` [RFC PATCH v1 3/7] iommu/arm: ipmmu-vmsa: Add io-pgtables support Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Modify the Linux IPMMU driver to be functional inside Xen.
All devices within a single Xen domain must use the same
IOMMU context no matter what IOMMU domains they are attached to.
This is the main difference between drivers in Linux
and Xen. Having 8 separate contexts allow us to passthrough
devices to 8 guest domain at the same time.

Also wrap following code in #if 0:
- All DMA related stuff
- Linux PM callbacks
- Driver remove callback
- iommu_group management

Maybe, it would be more correct to move different Linux2Xen wrappers,
define-s, helpers from IPMMU-VMSA and SMMU to some common file
before introducing IPMMU-VMSA patch series. And this common file
might be reused by possible future IOMMUs on ARM.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/ipmmu-vmsa.c | 984 +++++++++++++++++++++++++++++--
 1 file changed, 948 insertions(+), 36 deletions(-)

diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index 2b380ff..e54b507 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -6,31 +6,212 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; version 2 of the License.
+ *
+ * Based on Linux drivers/iommu/ipmmu-vmsa.c
+ * => commit f4747eba89c9b5d90fdf0a5458866283c47395d8
+ * (iommu/ipmmu-vmsa: Restrict IOMMU Domain Geometry to 32-bit address space)
+ *
+ * Xen modification:
+ * Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
+ * Copyright (C) 2016-2017 EPAM Systems Inc.
  */
 
-#include <linux/bitmap.h>
-#include <linux/delay.h>
-#include <linux/dma-iommu.h>
-#include <linux/dma-mapping.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/interrupt.h>
-#include <linux/io.h>
-#include <linux/iommu.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/of_iommu.h>
-#include <linux/platform_device.h>
-#include <linux/sizes.h>
-#include <linux/slab.h>
-
-#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
-#include <asm/dma-iommu.h>
-#include <asm/pgalloc.h>
-#endif
+#include <xen/config.h>
+#include <xen/delay.h>
+#include <xen/errno.h>
+#include <xen/err.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/vmap.h>
+#include <xen/rbtree.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+#include <asm/atomic.h>
+#include <asm/device.h>
+#include <asm/io.h>
+#include <asm/platform.h>
 
 #include "io-pgtable.h"
 
+/* TODO:
+ * 1. Optimize xen_domain->lock usage.
+ * 2. Show domain_id in every printk which is per Xen domain.
+ *
+ */
+
+/***** Start of Xen specific code *****/
+
+#define IOMMU_READ	(1 << 0)
+#define IOMMU_WRITE	(1 << 1)
+#define IOMMU_CACHE	(1 << 2) /* DMA cache coherency */
+#define IOMMU_NOEXEC	(1 << 3)
+#define IOMMU_MMIO	(1 << 4) /* e.g. things like MSI doorbells */
+
+#define __fls(x) (fls(x) - 1)
+#define __ffs(x) (ffs(x) - 1)
+
+#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
+
+#define ioread32 readl
+#define iowrite32 writel
+
+#define dev_info dev_notice
+
+#define devm_request_irq(unused, irq, func, flags, name, dev) \
+	request_irq(irq, flags, func, name, dev)
+
+/* Alias to Xen device tree helpers */
+#define device_node dt_device_node
+#define of_phandle_args dt_phandle_args
+#define of_device_id dt_device_match
+#define of_match_node dt_match_node
+#define of_parse_phandle_with_args dt_parse_phandle_with_args
+#define of_find_property dt_find_property
+#define of_count_phandle_with_args dt_count_phandle_with_args
+
+/* Xen: Helpers to get device MMIO and IRQs */
+struct resource
+{
+	u64 addr;
+	u64 size;
+	unsigned int type;
+};
+
+#define resource_size(res) (res)->size;
+
+#define platform_device dt_device_node
+
+#define IORESOURCE_MEM 0
+#define IORESOURCE_IRQ 1
+
+static struct resource *platform_get_resource(struct platform_device *pdev,
+					      unsigned int type,
+					      unsigned int num)
+{
+	/*
+	 * The resource is only used between 2 calls of platform_get_resource.
+	 * It's quite ugly but it's avoid to add too much code in the part
+	 * imported from Linux
+	 */
+	static struct resource res;
+	int ret = 0;
+
+	res.type = type;
+
+	switch (type) {
+	case IORESOURCE_MEM:
+		ret = dt_device_get_address(pdev, num, &res.addr, &res.size);
+
+		return ((ret) ? NULL : &res);
+
+	case IORESOURCE_IRQ:
+		ret = platform_get_irq(pdev, num);
+		if (ret < 0)
+			return NULL;
+
+		res.addr = ret;
+		res.size = 1;
+
+		return &res;
+
+	default:
+		return NULL;
+	}
+}
+
+enum irqreturn {
+	IRQ_NONE	= (0 << 0),
+	IRQ_HANDLED	= (1 << 0),
+};
+
+typedef enum irqreturn irqreturn_t;
+
+/* Device logger functions */
+#define dev_print(dev, lvl, fmt, ...)						\
+	 printk(lvl "ipmmu: %s: " fmt, dt_node_full_name(dev_to_dt(dev)), ## __VA_ARGS__)
+
+#define dev_dbg(dev, fmt, ...) dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
+#define dev_notice(dev, fmt, ...) dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
+#define dev_warn(dev, fmt, ...) dev_print(dev, XENLOG_WARNING, fmt, ## __VA_ARGS__)
+#define dev_err(dev, fmt, ...) dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
+
+#define dev_err_ratelimited(dev, fmt, ...)					\
+	 dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
+
+#define dev_name(dev) dt_node_full_name(dev_to_dt(dev))
+
+/* Alias to Xen allocation helpers */
+#define kfree xfree
+#define kmalloc(size, flags)		_xmalloc(size, sizeof(void *))
+#define kzalloc(size, flags)		_xzalloc(size, sizeof(void *))
+#define devm_kzalloc(dev, size, flags)	_xzalloc(size, sizeof(void *))
+#define kmalloc_array(size, n, flags)	_xmalloc_array(size, sizeof(void *), n)
+#define kcalloc(size, n, flags)		_xzalloc_array(size, sizeof(void *), n)
+
+static void __iomem *devm_ioremap_resource(struct device *dev,
+					   struct resource *res)
+{
+	void __iomem *ptr;
+
+	if (!res || res->type != IORESOURCE_MEM) {
+		dev_err(dev, "Invalid resource\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	ptr = ioremap_nocache(res->addr, res->size);
+	if (!ptr) {
+		dev_err(dev,
+			"ioremap failed (addr 0x%"PRIx64" size 0x%"PRIx64")\n",
+			res->addr, res->size);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return ptr;
+}
+
+/* Xen doesn't handle IOMMU fault */
+#define report_iommu_fault(...)	1
+
+#define MODULE_DEVICE_TABLE(type, name)
+#define module_param_named(name, value, type, perm)
+#define MODULE_PARM_DESC(_parm, desc)
+
+/* Xen: Dummy iommu_domain */
+struct iommu_domain
+{
+	atomic_t ref;
+	/* Used to link iommu_domain contexts for a same domain.
+	 * There is at least one per-IPMMU to used by the domain.
+	 * */
+	struct list_head		list;
+};
+
+/* Xen: Describes informations required for a Xen domain */
+struct ipmmu_vmsa_xen_domain {
+	spinlock_t			lock;
+	/* List of context (i.e iommu_domain) associated to this domain */
+	struct list_head		contexts;
+	struct iommu_domain		*base_context;
+};
+
+/*
+ * Xen: Information about each device stored in dev->archdata.iommu
+ *
+ * On Linux the dev->archdata.iommu only stores the arch specific information,
+ * but, on Xen, we also have to store the iommu domain.
+ */
+struct ipmmu_vmsa_xen_device {
+	struct iommu_domain *domain;
+	struct ipmmu_vmsa_archdata *archdata;
+};
+
+#define dev_iommu(dev) ((struct ipmmu_vmsa_xen_device *)dev->archdata.iommu)
+#define dev_iommu_domain(dev) (dev_iommu(dev)->domain)
+
+/***** Start of Linux IPMMU code *****/
+
 #define IPMMU_CTX_MAX 8
 
 struct ipmmu_features {
@@ -64,7 +245,9 @@ struct ipmmu_vmsa_device {
 	struct hw_register *reg_backup[IPMMU_CTX_MAX];
 #endif
 
+#if 0 /* Xen: Not needed */
 	struct dma_iommu_mapping *mapping;
+#endif
 };
 
 struct ipmmu_vmsa_domain {
@@ -77,6 +260,9 @@ struct ipmmu_vmsa_domain {
 
 	unsigned int context_id;
 	spinlock_t lock;			/* Protects mappings */
+
+	/* Xen: Domain associated to this configuration */
+	struct domain *d;
 };
 
 struct ipmmu_vmsa_archdata {
@@ -94,14 +280,20 @@ struct ipmmu_vmsa_archdata {
 static DEFINE_SPINLOCK(ipmmu_devices_lock);
 static LIST_HEAD(ipmmu_devices);
 
+#if 0 /* Xen: Not needed */
 static DEFINE_SPINLOCK(ipmmu_slave_devices_lock);
 static LIST_HEAD(ipmmu_slave_devices);
+#endif
 
 static struct ipmmu_vmsa_domain *to_vmsa_domain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct ipmmu_vmsa_domain, io_domain);
 }
 
+/*
+ * Xen: Rewrite Linux helpers to manipulate with archdata on Xen.
+ */
+#if 0
 #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
 {
@@ -120,6 +312,16 @@ static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
 {
 }
 #endif
+#else
+static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
+{
+	return dev_iommu(dev)->archdata;
+}
+static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
+{
+	dev_iommu(dev)->archdata = p;
+}
+#endif
 
 #define TLB_LOOP_TIMEOUT		100	/* 100us */
 
@@ -355,6 +557,10 @@ static struct hw_register *root_pgtable[IPMMU_CTX_MAX] = {
 
 static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
 {
+	/* Xen: Fix */
+	if (!mmu)
+		return false;
+
 	if (mmu->features->has_cache_leaf_nodes)
 		return mmu->is_leaf ? false : true;
 	else
@@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct ipmmu_vmsa_domain *domain, unsigned int reg,
 	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
 }
 
-static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned int reg,
+/* Xen: Write the context for cache IPMMU only. */
+static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned int reg,
 			     u32 data)
 {
 	if (domain->mmu != domain->root)
-		ipmmu_write(domain->mmu,
-			    domain->context_id * IM_CTX_SIZE + reg, data);
+		ipmmu_write(domain->mmu, domain->context_id * IM_CTX_SIZE + reg, data);
+}
 
-	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
+/*
+ * Xen: Write the context for both root IPMMU and all cache IPMMUs
+ * that assigned to this Xen domain.
+ */
+static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned int reg,
+			     u32 data)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(domain->d)->arch.priv;
+	struct iommu_domain *io_domain;
+
+	list_for_each_entry(io_domain, &xen_domain->contexts, list)
+		ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
+
+	ipmmu_ctx_write(domain, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
 {
 	struct ipmmu_vmsa_domain *domain = cookie;
 
+	/* Xen: Just return if context_id has non-existent value */
+	if (domain->context_id >= domain->root->num_ctx)
+		return;
+
 	ipmmu_tlb_invalidate(domain);
 }
 
@@ -549,8 +773,10 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	domain->cfg.ias = 32;
 	domain->cfg.oas = 40;
 	domain->cfg.tlb = &ipmmu_gather_ops;
+#if 0 /* Xen: Not needed */
 	domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32);
 	domain->io_domain.geometry.force_aperture = true;
+#endif
 	/*
 	 * TODO: Add support for coherent walk through CCI with DVM and remove
 	 * cache handling. For now, delegate it to the io-pgtable code.
@@ -562,6 +788,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	if (!domain->iop)
 		return -EINVAL;
 
+	/* Xen: Initialize context_id with non-existent value */
+	domain->context_id = domain->root->num_ctx;
+
 	/*
 	 * Find an unused context.
 	 */
@@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 
 	/* TTBR0 */
 	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
+
+	/* Xen: */
+	dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd 0x%"PRIx64")\n",
+			domain->d->domain_id, domain->context_id, ttbr);
+
 	ipmmu_ctx_write(domain, IMTTLBR0, ttbr);
 	ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
 
@@ -616,8 +850,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 * translation table format doesn't use TEX remapping. Don't enable AF
 	 * software management as we have no use for it. Flush the TLB as
 	 * required when modifying the context registers.
+	 * Xen: Enable the context for the root IPMMU only.
 	 */
-	ipmmu_ctx_write2(domain, IMCTR,
+	ipmmu_ctx_write(domain, IMCTR,
 			 IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
 
 	return 0;
@@ -638,13 +873,18 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
 
 static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
 {
+	/* Xen: Just return if context_id has non-existent value */
+	if (domain->context_id >= domain->root->num_ctx)
+		return;
+
 	/*
 	 * Disable the context. Flush the TLB as required when modifying the
 	 * context registers.
 	 *
 	 * TODO: Is TLB flush really needed ?
+	 * Xen: Disable the context for the root IPMMU only.
 	 */
-	ipmmu_ctx_write2(domain, IMCTR, IMCTR_FLUSH);
+	ipmmu_ctx_write(domain, IMCTR, IMCTR_FLUSH);
 	ipmmu_tlb_sync(domain);
 
 #ifdef CONFIG_RCAR_DDR_BACKUP
@@ -652,12 +892,16 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
 #endif
 
 	ipmmu_domain_free_context(domain->root, domain->context_id);
+
+	/* Xen: Initialize context_id with non-existent value */
+	domain->context_id = domain->root->num_ctx;
 }
 
 /* -----------------------------------------------------------------------------
  * Fault Handling
  */
 
+/* Xen: Show domain_id in every printk */
 static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 {
 	const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF | IMSTR_TF;
@@ -681,11 +925,11 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 
 	/* Log fatal errors. */
 	if (status & IMSTR_MHIT)
-		dev_err_ratelimited(mmu->dev, "Multiple TLB hits @0x%08x\n",
-				    iova);
+		dev_err_ratelimited(mmu->dev, "d%d: Multiple TLB hits @0x%08x\n",
+				domain->d->domain_id, iova);
 	if (status & IMSTR_ABORT)
-		dev_err_ratelimited(mmu->dev, "Page Table Walk Abort @0x%08x\n",
-				    iova);
+		dev_err_ratelimited(mmu->dev, "d%d: Page Table Walk Abort @0x%08x\n",
+				domain->d->domain_id, iova);
 
 	if (!(status & (IMSTR_PF | IMSTR_TF)))
 		return IRQ_NONE;
@@ -700,8 +944,8 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 		return IRQ_HANDLED;
 
 	dev_err_ratelimited(mmu->dev,
-			    "Unhandled fault: status 0x%08x iova 0x%08x\n",
-			    status, iova);
+			"d%d: Unhandled fault: status 0x%08x iova 0x%08x\n",
+			domain->d->domain_id, status, iova);
 
 	return IRQ_HANDLED;
 }
@@ -730,6 +974,16 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
 	return status;
 }
 
+/* Xen: Interrupt handlers wrapper */
+static void ipmmu_irq_xen(int irq, void *dev,
+				      struct cpu_user_regs *regs)
+{
+	ipmmu_irq(irq, dev);
+}
+
+#define ipmmu_irq ipmmu_irq_xen
+
+#if 0 /* Xen: Not needed */
 /* -----------------------------------------------------------------------------
  * IOMMU Operations
  */
@@ -759,6 +1013,7 @@ static void ipmmu_domain_free(struct iommu_domain *io_domain)
 	free_io_pgtable_ops(domain->iop);
 	kfree(domain);
 }
+#endif
 
 static int ipmmu_attach_device(struct iommu_domain *io_domain,
 			       struct device *dev)
@@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain *io_domain,
 		/* The domain hasn't been used yet, initialize it. */
 		domain->mmu = mmu;
 		domain->root = root;
+
+/*
+ * Xen: We have already initialized and enabled context for root IPMMU
+ * for this Xen domain. Enable context for given cache IPMMU only.
+ * Flush the TLB as required when modifying the context registers.
+ */
+#if 0
 		ret = ipmmu_domain_init_context(domain);
+#endif
+		ipmmu_ctx_write1(domain, IMCTR,
+				ipmmu_ctx_read(domain, IMCTR) | IMCTR_FLUSH);
+
+		dev_info(dev, "Using IPMMU context %u\n", domain->context_id);
+#if 0 /* Xen: Not needed */
 		if (ret < 0) {
 			dev_err(dev, "Unable to initialize IPMMU context\n");
 			domain->mmu = NULL;
@@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain *io_domain,
 			dev_info(dev, "Using IPMMU context %u\n",
 				 domain->context_id);
 		}
+#endif
 	} else if (domain->mmu != mmu) {
 		/*
 		 * Something is wrong, we can't attach two devices using
@@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct iommu_domain *io_domain,
 	 */
 }
 
+/*
+ * Xen: The current implementation of these callbacks is insufficient for us
+ * since they are intended to be called from Linux IOMMU core that
+ * has already done all required actions such as doing various checks,
+ * splitting into memory block the hardware supports and so on.
+ * So, overwrite them with more completely functions.
+ */
+#if 0
 static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot)
 {
@@ -862,7 +1139,177 @@ static phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain,
 
 	return domain->iop->iova_to_phys(domain->iop, iova);
 }
+#endif
+
+static size_t ipmmu_pgsize(struct iommu_domain *io_domain,
+		unsigned long addr_merge, size_t size)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	unsigned int pgsize_idx;
+	size_t pgsize;
+
+	/* Max page size that still fits into 'size' */
+	pgsize_idx = __fls(size);
+
+	/* need to consider alignment requirements ? */
+	if (likely(addr_merge)) {
+		/* Max page size allowed by address */
+		unsigned int align_pgsize_idx = __ffs(addr_merge);
+		pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+	}
+
+	/* build a mask of acceptable page sizes */
+	pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+	/* throw away page sizes not supported by the hardware */
+	pgsize &= domain->cfg.pgsize_bitmap;
+
+	/* make sure we're still sane */
+	BUG_ON(!pgsize);
+
+	/* pick the biggest page */
+	pgsize_idx = __fls(pgsize);
+	pgsize = 1UL << pgsize_idx;
+
+	return pgsize;
+}
+
+phys_addr_t ipmmu_iova_to_phys(struct iommu_domain *io_domain, dma_addr_t iova)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	if (unlikely(domain->iop->iova_to_phys == NULL))
+		return 0;
+
+	return domain->iop->iova_to_phys(domain->iop, iova);
+}
+
+size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova, size_t size)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	size_t unmapped_page, unmapped = 0;
+	dma_addr_t max_iova;
+	unsigned int min_pagesz;
+
+	if (unlikely(domain->iop->unmap == NULL ||
+			domain->cfg.pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->cfg.pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		printk("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
 
+	/*
+	 * the sum of virtual address and size must be inside the IOVA space
+	 * that hardware supports
+	 */
+	max_iova = (1UL << domain->cfg.ias) - 1;
+	if ((dma_addr_t)iova + size > max_iova) {
+		printk("out-of-bound: iova 0x%lx + size 0x%zx > max_iova 0x%"PRIx64"\n",
+			   iova, size, max_iova);
+		/* TODO Return -EINVAL instead */
+		return 0;
+	}
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t pgsize = ipmmu_pgsize(io_domain, iova, size - unmapped);
+
+		unmapped_page = domain->iop->unmap(domain->iop, iova, pgsize);
+		if (!unmapped_page)
+			break;
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
+}
+
+int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
+		phys_addr_t paddr, size_t size, int prot)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	unsigned long orig_iova = iova;
+	dma_addr_t max_iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	int ret = 0;
+
+	if (unlikely(domain->iop->map == NULL ||
+			domain->cfg.pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->cfg.pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		printk("unaligned: iova 0x%lx pa 0x%"PRIx64" size 0x%zx min_pagesz 0x%x\n",
+		       iova, paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	/*
+	 * the sum of virtual address and size must be inside the IOVA space
+	 * that hardware supports
+	 */
+	max_iova = (1UL << domain->cfg.ias) - 1;
+	if ((dma_addr_t)iova + size > max_iova) {
+		printk("out-of-bound: iova 0x%lx + size 0x%zx > max_iova 0x%"PRIx64"\n",
+		       iova, size, max_iova);
+		/* TODO Return -EINVAL instead */
+		return 0;
+	}
+
+	while (size) {
+		size_t pgsize = ipmmu_pgsize(io_domain, iova | paddr, size);
+
+		ret = domain->iop->map(domain->iop, iova, paddr, pgsize, prot);
+		if (ret == -EEXIST) {
+			phys_addr_t exist_paddr = ipmmu_iova_to_phys(io_domain, iova);
+			if (exist_paddr == paddr)
+				ret = 0;
+			else if (exist_paddr) {
+				printk("remap: iova 0x%lx pa 0x%"PRIx64" pgsize 0x%zx\n",
+						iova, paddr, pgsize);
+				ipmmu_unmap(io_domain, iova, pgsize);
+				ret = domain->iop->map(domain->iop, iova, paddr, pgsize, prot);
+			}
+		}
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret && orig_size != size)
+		ipmmu_unmap(io_domain, orig_iova, orig_size - size);
+
+	return ret;
+}
+
+#if 0 /* Xen: Not needed */
 static struct device *ipmmu_find_sibling_device(struct device *dev)
 {
 	struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
@@ -898,6 +1345,7 @@ static struct iommu_group *ipmmu_find_group(struct device *dev)
 
 	return group;
 }
+#endif
 
 static int ipmmu_find_utlbs(struct ipmmu_vmsa_device *mmu, struct device *dev,
 			    unsigned int *utlbs, unsigned int num_utlbs)
@@ -913,7 +1361,9 @@ static int ipmmu_find_utlbs(struct ipmmu_vmsa_device *mmu, struct device *dev,
 		if (ret < 0)
 			return ret;
 
+#if 0 /* Xen: Not needed */
 		of_node_put(args.np);
+#endif
 
 		if (args.np != mmu->dev->of_node || args.args_count != 1)
 			return -EINVAL;
@@ -924,6 +1374,19 @@ static int ipmmu_find_utlbs(struct ipmmu_vmsa_device *mmu, struct device *dev,
 	return 0;
 }
 
+/* Xen: To roll back actions that took place it init */
+static __maybe_unused void ipmmu_destroy_platform_device(struct device *dev)
+{
+	struct ipmmu_vmsa_archdata *archdata = to_archdata(dev);
+
+	if (!archdata)
+		return;
+
+	kfree(archdata->utlbs);
+	kfree(archdata);
+	set_archdata(dev, NULL);
+}
+
 static int ipmmu_init_platform_device(struct device *dev)
 {
 	struct ipmmu_vmsa_archdata *archdata;
@@ -996,6 +1459,11 @@ static int ipmmu_init_platform_device(struct device *dev)
 	archdata->num_utlbs = num_utlbs;
 	archdata->dev = dev;
 	set_archdata(dev, archdata);
+
+	/* Xen: */
+	dev_notice(dev, "initialized master device (IPMMU %s micro-TLBs %u)\n",
+			dev_name(mmu->dev), num_utlbs);
+
 	return 0;
 
 error:
@@ -1003,6 +1471,7 @@ error:
 	return ret;
 }
 
+#if 0 /* Xen: Not needed */
 #if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
 
 static int ipmmu_add_device(struct device *dev)
@@ -1233,6 +1702,7 @@ static const struct iommu_ops ipmmu_ops = {
 };
 
 #endif /* CONFIG_IOMMU_DMA */
+#endif
 
 /* -----------------------------------------------------------------------------
  * Probe/remove and init
@@ -1274,12 +1744,20 @@ static const struct of_device_id ipmmu_of_ids[] = {
 		.compatible = "renesas,ipmmu-r8a7796",
 		.data = &ipmmu_features_rcar_gen3,
 	}, {
+		/* Xen: It is not clear how to deal with it */
+		.compatible = "renesas,ipmmu-pmb-r8a7795",
+		.data = NULL,
+	}, {
 		/* Terminator */
 	},
 };
 
 MODULE_DEVICE_TABLE(of, ipmmu_of_ids);
 
+/*
+ * Xen: We don't have refcount for allocated memory so manually free memory
+ * when an error occured.
+ */
 static int ipmmu_probe(struct platform_device *pdev)
 {
 	struct ipmmu_vmsa_device *mmu;
@@ -1303,13 +1781,17 @@ static int ipmmu_probe(struct platform_device *pdev)
 	spin_lock_init(&mmu->lock);
 	bitmap_zero(mmu->ctx, IPMMU_CTX_MAX);
 	mmu->features = match->data;
+#if 0 /* Xen: Not needed */
 	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+#endif
 
 	/* Map I/O memory and request IRQ. */
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	mmu->base = devm_ioremap_resource(&pdev->dev, res);
-	if (IS_ERR(mmu->base))
-		return PTR_ERR(mmu->base);
+	if (IS_ERR(mmu->base)) {
+		ret = PTR_ERR(mmu->base);
+		goto out;
+	}
 
 	/*
 	 * The IPMMU has two register banks, for secure and non-secure modes.
@@ -1351,14 +1833,15 @@ static int ipmmu_probe(struct platform_device *pdev)
 	if (ipmmu_is_root(mmu)) {
 		if (irq < 0) {
 			dev_err(&pdev->dev, "no IRQ found\n");
-			return irq;
+			ret = irq;
+			goto out;
 		}
 
 		ret = devm_request_irq(&pdev->dev, irq, ipmmu_irq, 0,
 				       dev_name(&pdev->dev), mmu);
 		if (ret < 0) {
 			dev_err(&pdev->dev, "failed to request IRQ %d\n", irq);
-			return ret;
+			goto out;
 		}
 
 		ipmmu_device_reset(mmu);
@@ -1374,11 +1857,25 @@ static int ipmmu_probe(struct platform_device *pdev)
 	list_add(&mmu->list, &ipmmu_devices);
 	spin_unlock(&ipmmu_devices_lock);
 
+#if 0 /* Xen: Not needed */
 	platform_set_drvdata(pdev, mmu);
+#endif
+
+	/* Xen: */
+	dev_notice(&pdev->dev, "registered %s IPMMU\n",
+		ipmmu_is_root(mmu) ? "root" : "cache");
 
 	return 0;
+
+out:
+	if (!IS_ERR(mmu->base))
+		iounmap(mmu->base);
+	kfree(mmu);
+
+	return ret;
 }
 
+#if 0 /* Xen: Not needed */
 static int ipmmu_remove(struct platform_device *pdev)
 {
 	struct ipmmu_vmsa_device *mmu = platform_get_drvdata(pdev);
@@ -1645,3 +2142,418 @@ IOMMU_OF_DECLARE(ipmmu_r8a7796_iommu_of, "renesas,ipmmu-r8a7796",
 MODULE_DESCRIPTION("IOMMU API for Renesas VMSA-compatible IPMMU");
 MODULE_AUTHOR("Laurent Pinchart <laurent.pinchart@ideasonboard.com>");
 MODULE_LICENSE("GPL v2");
+#endif
+
+/***** Start of Xen specific code *****/
+
+static int __must_check ipmmu_vmsa_iotlb_flush(struct domain *d,
+		unsigned long gfn, unsigned int page_count)
+{
+	return 0;
+}
+
+static struct iommu_domain *ipmmu_vmsa_get_domain(struct domain *d,
+						struct device *dev)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	struct iommu_domain *io_domain;
+	struct ipmmu_vmsa_device *mmu;
+
+	mmu = to_archdata(dev)->mmu;
+	if (!mmu)
+		return NULL;
+
+	/*
+	 * Loop through the &xen_domain->contexts to locate a context
+	 * assigned to this IPMMU
+	 */
+	list_for_each_entry(io_domain, &xen_domain->contexts, list) {
+		if (to_vmsa_domain(io_domain)->mmu == mmu)
+			return io_domain;
+	}
+
+	return NULL;
+}
+
+static void ipmmu_vmsa_destroy_domain(struct iommu_domain *io_domain)
+{
+	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+
+	list_del(&io_domain->list);
+
+	if (domain->mmu != domain->root) {
+		/*
+		 * Disable the context for cache IPMMU only. Flush the TLB as required
+		 * when modifying the context registers.
+		 */
+		ipmmu_ctx_write1(domain, IMCTR, IMCTR_FLUSH);
+	} else {
+		/*
+		 * Free main domain resources. We assume that all devices have already
+		 * been detached.
+		 */
+		ipmmu_domain_destroy_context(domain);
+		free_io_pgtable_ops(domain->iop);
+	}
+
+	kfree(domain);
+}
+
+static int ipmmu_vmsa_assign_dev(struct domain *d, u8 devfn,
+			       struct device *dev, u32 flag)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	struct iommu_domain *io_domain;
+	struct ipmmu_vmsa_domain *domain;
+	int ret = 0;
+
+	if (!xen_domain || !xen_domain->base_context)
+		return -EINVAL;
+
+	if (!dev->archdata.iommu) {
+		dev->archdata.iommu = xzalloc(struct ipmmu_vmsa_xen_device);
+		if (!dev->archdata.iommu)
+			return -ENOMEM;
+	}
+
+	if (!to_archdata(dev)) {
+		ret = ipmmu_init_platform_device(dev);
+		if (ret)
+			return ret;
+	}
+
+	spin_lock(&xen_domain->lock);
+
+	if (dev_iommu_domain(dev)) {
+		dev_err(dev, "already attached to IPMMU domain\n");
+		ret = -EEXIST;
+		goto out;
+	}
+
+	/*
+	 * Check to see if a context bank (iommu_domain) already exists for
+	 * this Xen domain under the same IPMMU
+	 */
+	io_domain = ipmmu_vmsa_get_domain(d, dev);
+	if (!io_domain) {
+		domain = xzalloc(struct ipmmu_vmsa_domain);
+		if (!domain) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		spin_lock_init(&domain->lock);
+
+		domain->d = d;
+		domain->context_id = to_vmsa_domain(xen_domain->base_context)->context_id;
+
+		io_domain = &domain->io_domain;
+
+		/* Chain the new context to the Xen domain */
+		list_add(&io_domain->list, &xen_domain->contexts);
+	}
+
+	ret = ipmmu_attach_device(io_domain, dev);
+	if (ret) {
+		if (io_domain->ref.counter == 0)
+			ipmmu_vmsa_destroy_domain(io_domain);
+	} else {
+		atomic_inc(&io_domain->ref);
+		dev_iommu_domain(dev) = io_domain;
+	}
+
+out:
+	spin_unlock(&xen_domain->lock);
+
+	return ret;
+}
+
+static int ipmmu_vmsa_deassign_dev(struct domain *d, struct device *dev)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	struct iommu_domain *io_domain = dev_iommu_domain(dev);
+
+	if (!io_domain || to_vmsa_domain(io_domain)->d != d) {
+		dev_err(dev, " not attached to domain %d\n", d->domain_id);
+		return -ESRCH;
+	}
+
+	spin_lock(&xen_domain->lock);
+
+	ipmmu_detach_device(io_domain, dev);
+	dev_iommu_domain(dev) = NULL;
+	atomic_dec(&io_domain->ref);
+
+	if (io_domain->ref.counter == 0)
+		ipmmu_vmsa_destroy_domain(io_domain);
+
+	spin_unlock(&xen_domain->lock);
+
+	return 0;
+}
+
+static int ipmmu_vmsa_reassign_dev(struct domain *s, struct domain *t,
+				 u8 devfn,  struct device *dev)
+{
+	int ret = 0;
+
+	/* Don't allow remapping on other domain than hwdom */
+	if (t && t != hardware_domain)
+		return -EPERM;
+
+	if (t == s)
+		return 0;
+
+	ret = ipmmu_vmsa_deassign_dev(s, dev);
+	if (ret)
+		return ret;
+
+	if (t) {
+		/* No flags are defined for ARM. */
+		ret = ipmmu_vmsa_assign_dev(t, devfn, dev, 0);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int ipmmu_vmsa_alloc_page_table(struct domain *d)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	struct ipmmu_vmsa_domain *domain;
+	struct ipmmu_vmsa_device *root;
+	int ret;
+
+	if (xen_domain->base_context)
+		return 0;
+
+	root = ipmmu_find_root(NULL);
+	if (!root) {
+		printk("d%d: Unable to locate root IPMMU\n", d->domain_id);
+		return -EAGAIN;
+	}
+
+	domain = xzalloc(struct ipmmu_vmsa_domain);
+	if (!domain)
+		return -ENOMEM;
+
+	spin_lock_init(&domain->lock);
+	INIT_LIST_HEAD(&domain->io_domain.list);
+	domain->d = d;
+	domain->root = root;
+
+	spin_lock(&xen_domain->lock);
+	ret = ipmmu_domain_init_context(domain);
+	if (ret < 0) {
+		dev_err(root->dev, "d%d: Unable to initialize IPMMU context\n",
+				d->domain_id);
+		spin_unlock(&xen_domain->lock);
+		xfree(domain);
+		return ret;
+	}
+	xen_domain->base_context = &domain->io_domain;
+	spin_unlock(&xen_domain->lock);
+
+	return 0;
+}
+
+static int ipmmu_vmsa_domain_init(struct domain *d, bool use_iommu)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain;
+
+	xen_domain = xzalloc(struct ipmmu_vmsa_xen_domain);
+	if (!xen_domain)
+		return -ENOMEM;
+
+	spin_lock_init(&xen_domain->lock);
+	INIT_LIST_HEAD(&xen_domain->contexts);
+
+	dom_iommu(d)->arch.priv = xen_domain;
+
+	if (use_iommu) {
+		int ret = ipmmu_vmsa_alloc_page_table(d);
+
+		if (ret) {
+			xfree(xen_domain);
+			dom_iommu(d)->arch.priv = NULL;
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void __hwdom_init ipmmu_vmsa_hwdom_init(struct domain *d)
+{
+}
+
+static void ipmmu_vmsa_domain_teardown(struct domain *d)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+
+	if (!xen_domain)
+		return;
+
+	spin_lock(&xen_domain->lock);
+	if (xen_domain->base_context) {
+		ipmmu_vmsa_destroy_domain(xen_domain->base_context);
+		xen_domain->base_context = NULL;
+	}
+	spin_unlock(&xen_domain->lock);
+
+	ASSERT(list_empty(&xen_domain->contexts));
+	xfree(xen_domain);
+	dom_iommu(d)->arch.priv = NULL;
+}
+
+static int __must_check ipmmu_vmsa_map_pages(struct domain *d,
+		unsigned long gfn, unsigned long mfn, unsigned int order,
+		unsigned int flags)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	size_t size = PAGE_SIZE * (1UL << order);
+	int ret, prot = 0;
+
+	if (!xen_domain || !xen_domain->base_context)
+		return -EINVAL;
+
+	if (flags & IOMMUF_writable)
+		prot |= IOMMU_WRITE;
+	if (flags & IOMMUF_readable)
+		prot |= IOMMU_READ;
+
+	spin_lock(&xen_domain->lock);
+	ret = ipmmu_map(xen_domain->base_context, pfn_to_paddr(gfn),
+			pfn_to_paddr(mfn), size, prot);
+	spin_unlock(&xen_domain->lock);
+
+	return ret;
+}
+
+static int __must_check ipmmu_vmsa_unmap_pages(struct domain *d,
+		unsigned long gfn, unsigned int order)
+{
+	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
+	size_t ret, size = PAGE_SIZE * (1UL << order);
+
+	if (!xen_domain || !xen_domain->base_context)
+		return -EINVAL;
+
+	spin_lock(&xen_domain->lock);
+	ret = ipmmu_unmap(xen_domain->base_context, pfn_to_paddr(gfn), size);
+	spin_unlock(&xen_domain->lock);
+
+	/*
+	 * We don't check how many bytes were actually unmapped. Otherwise we
+	 * should have raised an error every time we hit an area that isn't mapped.
+	 * And the p2m's attempt to unmap the same page twice can lead to crash or
+	 * panic. We think it is better to have corresponding warns inside
+	 * page table allocator for complaining about that rather than
+	 * breaking the whole system.
+	 */
+	return IS_ERR_VALUE(ret) ? ret : 0;
+}
+
+static void ipmmu_vmsa_dump_p2m_table(struct domain *d)
+{
+	/* TODO: This platform callback should be implemented. */
+}
+
+static const struct iommu_ops ipmmu_vmsa_iommu_ops = {
+	.init = ipmmu_vmsa_domain_init,
+	.hwdom_init = ipmmu_vmsa_hwdom_init,
+	.teardown = ipmmu_vmsa_domain_teardown,
+	.iotlb_flush = ipmmu_vmsa_iotlb_flush,
+	.assign_device = ipmmu_vmsa_assign_dev,
+	.reassign_device = ipmmu_vmsa_reassign_dev,
+	.map_pages = ipmmu_vmsa_map_pages,
+	.unmap_pages = ipmmu_vmsa_unmap_pages,
+	.dump_p2m_table = ipmmu_vmsa_dump_p2m_table,
+};
+
+static __init const struct ipmmu_vmsa_device *find_ipmmu(const struct device *dev)
+{
+	struct ipmmu_vmsa_device *mmu;
+	bool found = false;
+
+	spin_lock(&ipmmu_devices_lock);
+	list_for_each_entry(mmu, &ipmmu_devices, list) {
+		if (mmu->dev == dev) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&ipmmu_devices_lock);
+
+	return (found) ? mmu : NULL;
+}
+
+static __init void populate_ipmmu_masters(const struct ipmmu_vmsa_device *mmu)
+{
+	struct dt_device_node *np;
+
+	dt_for_each_device_node(dt_host, np) {
+		if (mmu->dev->of_node != dt_parse_phandle(np, "iommus", 0))
+			continue;
+
+		/* Let Xen know that the device is protected by an IPMMU */
+		dt_device_set_protected(np);
+
+		dev_notice(mmu->dev, "found master device %s\n", dt_node_full_name(np));
+	}
+}
+
+/* TODO: What to do if we failed to init cache/root IPMMU? */
+static __init int ipmmu_vmsa_init(struct dt_device_node *dev,
+				   const void *data)
+{
+	int rc;
+	const struct ipmmu_vmsa_device *mmu;
+	static bool set_ops_done = false;
+
+	/*
+	 * Even if the device can't be initialized, we don't want to
+	 * give the IPMMU device to dom0.
+	 */
+	dt_device_set_used_by(dev, DOMID_XEN);
+
+	rc = ipmmu_probe(dev);
+	if (rc) {
+		dev_err(&dev->dev, "failed to init IPMMU\n");
+		return rc;
+	}
+
+	/*
+	 * Since IPMMU is composed of two parts (a number of cache IPMMUs and
+	 * the root IPMMU) this function will be called more than once.
+	 * Use the flag below to avoid setting IOMMU ops if they already set.
+	 */
+	if (!set_ops_done) {
+		iommu_set_ops(&ipmmu_vmsa_iommu_ops);
+		set_ops_done = true;
+	}
+
+	/* Find the last IPMMU added. */
+	mmu = find_ipmmu(dt_to_dev(dev));
+	BUG_ON(mmu == NULL);
+
+	/* Mark all masters that connected to the last IPMMU as protected. */
+	populate_ipmmu_masters(mmu);
+
+	/*
+	 * The IPMMU can't utilize P2M table since it doesn't use the same
+	 * page-table format as the CPU.
+	 */
+	if (iommu_hap_pt_share) {
+		iommu_hap_pt_share = false;
+		dev_notice(&dev->dev,
+			"disable sharing P2M table between the CPU and IPMMU\n");
+	}
+
+	return 0;
+}
+
+DT_DEVICE_START(ipmmu, "Renesas IPMMU-VMSA", DEVICE_IOMMU)
+	.dt_match = ipmmu_of_ids,
+	.init = ipmmu_vmsa_init,
+DT_DEVICE_END
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 3/7] iommu/arm: ipmmu-vmsa: Add io-pgtables support
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
  2017-07-26 15:09 ` [RFC PATCH v1 1/7] iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support Oleksandr Tyshchenko
  2017-07-26 15:09 ` [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver Oleksandr Tyshchenko
@ 2017-07-26 15:10 ` Oleksandr Tyshchenko
  2017-07-26 15:10 ` [RFC PATCH v1 4/7] iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The Linux IPMMU driver which is being ported to Xen relies on this
Linux framework. Moreover, as the IPMMU is a non-shared IOMMU
we must have a way of manipulating it's page table.

So, copy it as is for now to simplify things, but we will have to find
common ground about how this stuff should look like.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/io-pgtable-arm.c | 1084 ++++++++++++++++++++++++++
 xen/drivers/passthrough/arm/io-pgtable.c     |   79 ++
 xen/drivers/passthrough/arm/io-pgtable.h     |  208 +++++
 3 files changed, 1371 insertions(+)
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable-arm.c
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable.c
 create mode 100644 xen/drivers/passthrough/arm/io-pgtable.h

diff --git a/xen/drivers/passthrough/arm/io-pgtable-arm.c b/xen/drivers/passthrough/arm/io-pgtable-arm.c
new file mode 100644
index 0000000..f5c90e1
--- /dev/null
+++ b/xen/drivers/passthrough/arm/io-pgtable-arm.c
@@ -0,0 +1,1084 @@
+/*
+ * CPU-agnostic ARM page table allocator.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#define pr_fmt(fmt)	"arm-lpae io-pgtable: " fmt
+
+#include <linux/iommu.h>
+#include <linux/kernel.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+
+#include <asm/barrier.h>
+
+#include "io-pgtable.h"
+
+#define ARM_LPAE_MAX_ADDR_BITS		48
+#define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
+#define ARM_LPAE_MAX_LEVELS		4
+
+/* Struct accessors */
+#define io_pgtable_to_data(x)						\
+	container_of((x), struct arm_lpae_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)					\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+/*
+ * For consistency with the architecture, we always consider
+ * ARM_LPAE_MAX_LEVELS levels, with the walk starting at level n >=0
+ */
+#define ARM_LPAE_START_LVL(d)		(ARM_LPAE_MAX_LEVELS - (d)->levels)
+
+/*
+ * Calculate the right shift amount to get to the portion describing level l
+ * in a virtual address mapped by the pagetable in d.
+ */
+#define ARM_LPAE_LVL_SHIFT(l,d)						\
+	((((d)->levels - ((l) - ARM_LPAE_START_LVL(d) + 1))		\
+	  * (d)->bits_per_level) + (d)->pg_shift)
+
+#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
+
+#define ARM_LPAE_PAGES_PER_PGD(d)					\
+	DIV_ROUND_UP((d)->pgd_size, ARM_LPAE_GRANULE(d))
+
+/*
+ * Calculate the index at level l used to map virtual address a using the
+ * pagetable in d.
+ */
+#define ARM_LPAE_PGD_IDX(l,d)						\
+	((l) == ARM_LPAE_START_LVL(d) ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
+
+#define ARM_LPAE_LVL_IDX(a,l,d)						\
+	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
+	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
+
+/* Calculate the block/page mapping size at level l for pagetable in d. */
+#define ARM_LPAE_BLOCK_SIZE(l,d)					\
+	(1 << (ilog2(sizeof(arm_lpae_iopte)) +				\
+		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
+
+/* Page table bits */
+#define ARM_LPAE_PTE_TYPE_SHIFT		0
+#define ARM_LPAE_PTE_TYPE_MASK		0x3
+
+#define ARM_LPAE_PTE_TYPE_BLOCK		1
+#define ARM_LPAE_PTE_TYPE_TABLE		3
+#define ARM_LPAE_PTE_TYPE_PAGE		3
+
+#define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
+#define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
+#define ARM_LPAE_PTE_SH_NS		(((arm_lpae_iopte)0) << 8)
+#define ARM_LPAE_PTE_SH_OS		(((arm_lpae_iopte)2) << 8)
+#define ARM_LPAE_PTE_SH_IS		(((arm_lpae_iopte)3) << 8)
+#define ARM_LPAE_PTE_NS			(((arm_lpae_iopte)1) << 5)
+#define ARM_LPAE_PTE_VALID		(((arm_lpae_iopte)1) << 0)
+
+#define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)
+/* Ignore the contiguous bit for block splitting */
+#define ARM_LPAE_PTE_ATTR_HI_MASK	(((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_MASK		(ARM_LPAE_PTE_ATTR_LO_MASK |	\
+					 ARM_LPAE_PTE_ATTR_HI_MASK)
+
+/* Stage-1 PTE */
+#define ARM_LPAE_PTE_AP_UNPRIV		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_AP_RDONLY		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_ATTRINDX_SHIFT	2
+#define ARM_LPAE_PTE_nG			(((arm_lpae_iopte)1) << 11)
+
+/* Stage-2 PTE */
+#define ARM_LPAE_PTE_HAP_FAULT		(((arm_lpae_iopte)0) << 6)
+#define ARM_LPAE_PTE_HAP_READ		(((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_HAP_WRITE		(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_MEMATTR_OIWB	(((arm_lpae_iopte)0xf) << 2)
+#define ARM_LPAE_PTE_MEMATTR_NC		(((arm_lpae_iopte)0x5) << 2)
+#define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
+
+/* Register bits */
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0xf
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+#define ARM_LPAE_MAIR_ATTR_IDX_NC	0
+#define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
+#define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
+
+/* IOPTE accessors */
+#define iopte_deref(pte,d)					\
+	(__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)	\
+	& ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+
+#define iopte_type(pte,l)					\
+	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
+
+#define iopte_prot(pte)	((pte) & ARM_LPAE_PTE_ATTR_MASK)
+
+#define iopte_leaf(pte,l)					\
+	(l == (ARM_LPAE_MAX_LEVELS - 1) ?			\
+		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) :	\
+		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
+
+#define iopte_to_pfn(pte,d)					\
+	(((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
+
+#define pfn_to_iopte(pfn,d)					\
+	(((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
+
+struct arm_lpae_io_pgtable {
+	struct io_pgtable	iop;
+
+	int			levels;
+	size_t			pgd_size;
+	unsigned long		pg_shift;
+	unsigned long		bits_per_level;
+
+	void			*pgd;
+};
+
+typedef u64 arm_lpae_iopte;
+
+static bool selftest_running = false;
+
+static dma_addr_t __arm_lpae_dma_addr(void *pages)
+{
+	return (dma_addr_t)virt_to_phys(pages);
+}
+
+static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+				    struct io_pgtable_cfg *cfg)
+{
+	struct device *dev = cfg->iommu_dev;
+	dma_addr_t dma;
+	void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO);
+
+	if (!pages)
+		return NULL;
+
+	if (!selftest_running) {
+		dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, dma))
+			goto out_free;
+		/*
+		 * We depend on the IOMMU being able to work with any physical
+		 * address directly, so if the DMA layer suggests otherwise by
+		 * translating or truncating them, that bodes very badly...
+		 */
+		if (dma != virt_to_phys(pages))
+			goto out_unmap;
+	}
+
+	return pages;
+
+out_unmap:
+	dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
+	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+out_free:
+	free_pages_exact(pages, size);
+	return NULL;
+}
+
+static void __arm_lpae_free_pages(void *pages, size_t size,
+				  struct io_pgtable_cfg *cfg)
+{
+	if (!selftest_running)
+		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
+				 size, DMA_TO_DEVICE);
+	free_pages_exact(pages, size);
+}
+
+static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
+			       struct io_pgtable_cfg *cfg)
+{
+	*ptep = pte;
+
+	if (!selftest_running)
+		dma_sync_single_for_device(cfg->iommu_dev,
+					   __arm_lpae_dma_addr(ptep),
+					   sizeof(pte), DMA_TO_DEVICE);
+}
+
+static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			    unsigned long iova, size_t size, int lvl,
+			    arm_lpae_iopte *ptep);
+
+static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+			     unsigned long iova, phys_addr_t paddr,
+			     arm_lpae_iopte prot, int lvl,
+			     arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte = prot;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	if (iopte_leaf(*ptep, lvl)) {
+		/* We require an unmap first */
+		WARN_ON(!selftest_running);
+		return -EEXIST;
+	} else if (iopte_type(*ptep, lvl) == ARM_LPAE_PTE_TYPE_TABLE) {
+		/*
+		 * We need to unmap and free the old table before
+		 * overwriting it with a block entry.
+		 */
+		arm_lpae_iopte *tblp;
+		size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+		tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
+			return -EINVAL;
+	}
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pte |= ARM_LPAE_PTE_NS;
+
+	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
+		pte |= ARM_LPAE_PTE_TYPE_PAGE;
+	else
+		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
+
+	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
+	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+
+	__arm_lpae_set_pte(ptep, pte, cfg);
+	return 0;
+}
+
+static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
+			  phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
+			  int lvl, arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte *cptep, pte;
+	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	/* Find our entry at the current level */
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+
+	/* If we can install a leaf entry at this level, then do so */
+	if (size == block_size && (size & cfg->pgsize_bitmap))
+		return arm_lpae_init_pte(data, iova, paddr, prot, lvl, ptep);
+
+	/* We can't allocate tables at the final level */
+	if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
+		return -EINVAL;
+
+	/* Grab a pointer to the next level */
+	pte = *ptep;
+	if (!pte) {
+		cptep = __arm_lpae_alloc_pages(ARM_LPAE_GRANULE(data),
+					       GFP_ATOMIC, cfg);
+		if (!cptep)
+			return -ENOMEM;
+
+		pte = __pa(cptep) | ARM_LPAE_PTE_TYPE_TABLE;
+		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+			pte |= ARM_LPAE_PTE_NSTABLE;
+		__arm_lpae_set_pte(ptep, pte, cfg);
+	} else {
+		cptep = iopte_deref(pte, data);
+	}
+
+	/* Rinse, repeat */
+	return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep);
+}
+
+static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
+					   int prot)
+{
+	arm_lpae_iopte pte;
+
+	if (data->iop.fmt == ARM_64_LPAE_S1 ||
+	    data->iop.fmt == ARM_32_LPAE_S1) {
+		pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pte |= ARM_LPAE_PTE_AP_RDONLY;
+
+		if (prot & IOMMU_MMIO)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (prot & IOMMU_CACHE)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+	} else {
+		pte = ARM_LPAE_PTE_HAP_FAULT;
+		if (prot & IOMMU_READ)
+			pte |= ARM_LPAE_PTE_HAP_READ;
+		if (prot & IOMMU_WRITE)
+			pte |= ARM_LPAE_PTE_HAP_WRITE;
+		if (prot & IOMMU_MMIO)
+			pte |= ARM_LPAE_PTE_MEMATTR_DEV;
+		else if (prot & IOMMU_CACHE)
+			pte |= ARM_LPAE_PTE_MEMATTR_OIWB;
+		else
+			pte |= ARM_LPAE_PTE_MEMATTR_NC;
+	}
+
+	if (prot & IOMMU_NOEXEC)
+		pte |= ARM_LPAE_PTE_XN;
+
+	return pte;
+}
+
+static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
+			phys_addr_t paddr, size_t size, int iommu_prot)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte *ptep = data->pgd;
+	int ret, lvl = ARM_LPAE_START_LVL(data);
+	arm_lpae_iopte prot;
+
+	/* If no access, then nothing to do */
+	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep);
+	/*
+	 * Synchronise all PTE updates for the new mapping before there's
+	 * a chance for anything to kick off a table walk for the new iova.
+	 */
+	wmb();
+
+	return ret;
+}
+
+static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+				    arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte *start, *end;
+	unsigned long table_size;
+
+	if (lvl == ARM_LPAE_START_LVL(data))
+		table_size = data->pgd_size;
+	else
+		table_size = ARM_LPAE_GRANULE(data);
+
+	start = ptep;
+
+	/* Only leaf entries at the last level */
+	if (lvl == ARM_LPAE_MAX_LEVELS - 1)
+		end = ptep;
+	else
+		end = (void *)ptep + table_size;
+
+	while (ptep != end) {
+		arm_lpae_iopte pte = *ptep++;
+
+		if (!pte || iopte_leaf(pte, lvl))
+			continue;
+
+		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+	}
+
+	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+}
+
+static void arm_lpae_free_pgtable(struct io_pgtable *iop)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
+
+	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
+	kfree(data);
+}
+
+static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
+				    unsigned long iova, size_t size,
+				    arm_lpae_iopte prot, int lvl,
+				    arm_lpae_iopte *ptep, size_t blk_size)
+{
+	unsigned long blk_start, blk_end;
+	phys_addr_t blk_paddr;
+	arm_lpae_iopte table = 0;
+
+	blk_start = iova & ~(blk_size - 1);
+	blk_end = blk_start + blk_size;
+	blk_paddr = iopte_to_pfn(*ptep, data) << data->pg_shift;
+
+	for (; blk_start < blk_end; blk_start += size, blk_paddr += size) {
+		arm_lpae_iopte *tablep;
+
+		/* Unmap! */
+		if (blk_start == iova)
+			continue;
+
+		/* __arm_lpae_map expects a pointer to the start of the table */
+		tablep = &table - ARM_LPAE_LVL_IDX(blk_start, lvl, data);
+		if (__arm_lpae_map(data, blk_start, blk_paddr, size, prot, lvl,
+				   tablep) < 0) {
+			if (table) {
+				/* Free the table we allocated */
+				tablep = iopte_deref(table, data);
+				__arm_lpae_free_pgtable(data, lvl + 1, tablep);
+			}
+			return 0; /* Bytes unmapped */
+		}
+	}
+
+	__arm_lpae_set_pte(ptep, table, &data->iop.cfg);
+	iova &= ~(blk_size - 1);
+	io_pgtable_tlb_add_flush(&data->iop, iova, blk_size, blk_size, true);
+	return size;
+}
+
+static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
+			    unsigned long iova, size_t size, int lvl,
+			    arm_lpae_iopte *ptep)
+{
+	arm_lpae_iopte pte;
+	struct io_pgtable *iop = &data->iop;
+	size_t blk_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+	/* Something went horribly wrong and we ran out of page table */
+	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+		return 0;
+
+	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+	pte = *ptep;
+	if (WARN_ON(!pte))
+		return 0;
+
+	/* If the size matches this level, we're in the right place */
+	if (size == blk_size) {
+		__arm_lpae_set_pte(ptep, 0, &iop->cfg);
+
+		if (!iopte_leaf(pte, lvl)) {
+			/* Also flush any partial walks */
+			io_pgtable_tlb_add_flush(iop, iova, size,
+						ARM_LPAE_GRANULE(data), false);
+			io_pgtable_tlb_sync(iop);
+			ptep = iopte_deref(pte, data);
+			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
+		} else {
+			io_pgtable_tlb_add_flush(iop, iova, size, size, true);
+		}
+
+		return size;
+	} else if (iopte_leaf(pte, lvl)) {
+		/*
+		 * Insert a table at the next level to map the old region,
+		 * minus the part we want to unmap
+		 */
+		return arm_lpae_split_blk_unmap(data, iova, size,
+						iopte_prot(pte), lvl, ptep,
+						blk_size);
+	}
+
+	/* Keep on walkin' */
+	ptep = iopte_deref(pte, data);
+	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+}
+
+static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
+			  size_t size)
+{
+	size_t unmapped;
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte *ptep = data->pgd;
+	int lvl = ARM_LPAE_START_LVL(data);
+
+	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);
+	if (unmapped)
+		io_pgtable_tlb_sync(&data->iop);
+
+	return unmapped;
+}
+
+static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
+					 unsigned long iova)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_lpae_iopte pte, *ptep = data->pgd;
+	int lvl = ARM_LPAE_START_LVL(data);
+
+	do {
+		/* Valid IOPTE pointer? */
+		if (!ptep)
+			return 0;
+
+		/* Grab the IOPTE we're interested in */
+		pte = *(ptep + ARM_LPAE_LVL_IDX(iova, lvl, data));
+
+		/* Valid entry? */
+		if (!pte)
+			return 0;
+
+		/* Leaf entry? */
+		if (iopte_leaf(pte,lvl))
+			goto found_translation;
+
+		/* Take it to the next level */
+		ptep = iopte_deref(pte, data);
+	} while (++lvl < ARM_LPAE_MAX_LEVELS);
+
+	/* Ran out of page tables to walk */
+	return 0;
+
+found_translation:
+	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
+	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
+}
+
+static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
+{
+	unsigned long granule;
+
+	/*
+	 * We need to restrict the supported page sizes to match the
+	 * translation regime for a particular granule. Aim to match
+	 * the CPU page size if possible, otherwise prefer smaller sizes.
+	 * While we're at it, restrict the block sizes to match the
+	 * chosen granule.
+	 */
+	if (cfg->pgsize_bitmap & PAGE_SIZE)
+		granule = PAGE_SIZE;
+	else if (cfg->pgsize_bitmap & ~PAGE_MASK)
+		granule = 1UL << __fls(cfg->pgsize_bitmap & ~PAGE_MASK);
+	else if (cfg->pgsize_bitmap & PAGE_MASK)
+		granule = 1UL << __ffs(cfg->pgsize_bitmap & PAGE_MASK);
+	else
+		granule = 0;
+
+	switch (granule) {
+	case SZ_4K:
+		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+		break;
+	case SZ_16K:
+		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
+		break;
+	case SZ_64K:
+		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
+		break;
+	default:
+		cfg->pgsize_bitmap = 0;
+	}
+}
+
+static struct arm_lpae_io_pgtable *
+arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
+{
+	unsigned long va_bits, pgd_bits;
+	struct arm_lpae_io_pgtable *data;
+
+	arm_lpae_restrict_pgsizes(cfg);
+
+	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
+		return NULL;
+
+	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
+		return NULL;
+
+	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
+		return NULL;
+
+	if (!selftest_running && cfg->iommu_dev->dma_pfn_offset) {
+		dev_err(cfg->iommu_dev, "Cannot accommodate DMA offset for IOMMU page tables\n");
+		return NULL;
+	}
+
+	data = kmalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->pg_shift = __ffs(cfg->pgsize_bitmap);
+	data->bits_per_level = data->pg_shift - ilog2(sizeof(arm_lpae_iopte));
+
+	va_bits = cfg->ias - data->pg_shift;
+	data->levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+
+	/* Calculate the actual size of our pgd (without concatenation) */
+	pgd_bits = va_bits - (data->bits_per_level * (data->levels - 1));
+	data->pgd_size = 1UL << (pgd_bits + ilog2(sizeof(arm_lpae_iopte)));
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map		= arm_lpae_map,
+		.unmap		= arm_lpae_unmap,
+		.iova_to_phys	= arm_lpae_iova_to_phys,
+	};
+
+	return data;
+}
+
+static struct io_pgtable *
+arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	u64 reg;
+	struct arm_lpae_io_pgtable *data;
+
+	if (cfg->quirks & ~IO_PGTABLE_QUIRK_ARM_NS)
+		return NULL;
+
+	data = arm_lpae_alloc_pgtable(cfg);
+	if (!data)
+		return NULL;
+
+	/* TCR */
+	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		reg |= ARM_LPAE_TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		reg |= ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		reg |= ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	case 36:
+		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	case 40:
+		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	case 42:
+		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	case 44:
+		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	case 48:
+		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
+	default:
+		goto out_free_data;
+	}
+
+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
+
+	/* Disable speculative walks through TTBR1 */
+	reg |= ARM_LPAE_TCR_EPD1;
+	cfg->arm_lpae_s1_cfg.tcr = reg;
+
+	/* MAIRs */
+	reg = (ARM_LPAE_MAIR_ATTR_NC
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+	      (ARM_LPAE_MAIR_ATTR_WBRWA
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
+	      (ARM_LPAE_MAIR_ATTR_DEVICE
+	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+
+	cfg->arm_lpae_s1_cfg.mair[0] = reg;
+	cfg->arm_lpae_s1_cfg.mair[1] = 0;
+
+	/* Looking good; allocate a pgd */
+	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd)
+		goto out_free_data;
+
+	/* Ensure the empty pgd is visible before any actual TTBR write */
+	wmb();
+
+	/* TTBRs */
+	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
+	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
+	return &data->iop;
+
+out_free_data:
+	kfree(data);
+	return NULL;
+}
+
+static struct io_pgtable *
+arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	u64 reg, sl;
+	struct arm_lpae_io_pgtable *data;
+
+	/* The NS quirk doesn't apply at stage 2 */
+	if (cfg->quirks)
+		return NULL;
+
+	data = arm_lpae_alloc_pgtable(cfg);
+	if (!data)
+		return NULL;
+
+	/*
+	 * Concatenate PGDs at level 1 if possible in order to reduce
+	 * the depth of the stage-2 walk.
+	 */
+	if (data->levels == ARM_LPAE_MAX_LEVELS) {
+		unsigned long pgd_pages;
+
+		pgd_pages = data->pgd_size >> ilog2(sizeof(arm_lpae_iopte));
+		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
+			data->pgd_size = pgd_pages << data->pg_shift;
+			data->levels--;
+		}
+	}
+
+	/* VTCR */
+	reg = ARM_64_LPAE_S2_TCR_RES1 |
+	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
+	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
+	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+
+	sl = ARM_LPAE_START_LVL(data);
+
+	switch (ARM_LPAE_GRANULE(data)) {
+	case SZ_4K:
+		reg |= ARM_LPAE_TCR_TG0_4K;
+		sl++; /* SL0 format is different for 4K granule size */
+		break;
+	case SZ_16K:
+		reg |= ARM_LPAE_TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		reg |= ARM_LPAE_TCR_TG0_64K;
+		break;
+	}
+
+	switch (cfg->oas) {
+	case 32:
+		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	case 36:
+		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	case 40:
+		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	case 42:
+		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	case 44:
+		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	case 48:
+		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
+	default:
+		goto out_free_data;
+	}
+
+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
+	reg |= (~sl & ARM_LPAE_TCR_SL0_MASK) << ARM_LPAE_TCR_SL0_SHIFT;
+	cfg->arm_lpae_s2_cfg.vtcr = reg;
+
+	/* Allocate pgd pages */
+	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd)
+		goto out_free_data;
+
+	/* Ensure the empty pgd is visible before any actual TTBR write */
+	wmb();
+
+	/* VTTBR */
+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
+	return &data->iop;
+
+out_free_data:
+	kfree(data);
+	return NULL;
+}
+
+static struct io_pgtable *
+arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	struct io_pgtable *iop;
+
+	if (cfg->ias > 32 || cfg->oas > 40)
+		return NULL;
+
+	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+	iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
+	if (iop) {
+		cfg->arm_lpae_s1_cfg.tcr |= ARM_32_LPAE_TCR_EAE;
+		cfg->arm_lpae_s1_cfg.tcr &= 0xffffffff;
+	}
+
+	return iop;
+}
+
+static struct io_pgtable *
+arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	struct io_pgtable *iop;
+
+	if (cfg->ias > 40 || cfg->oas > 40)
+		return NULL;
+
+	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+	iop = arm_64_lpae_alloc_pgtable_s2(cfg, cookie);
+	if (iop)
+		cfg->arm_lpae_s2_cfg.vtcr &= 0xffffffff;
+
+	return iop;
+}
+
+struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
+	.alloc	= arm_64_lpae_alloc_pgtable_s1,
+	.free	= arm_lpae_free_pgtable,
+};
+
+struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
+	.alloc	= arm_64_lpae_alloc_pgtable_s2,
+	.free	= arm_lpae_free_pgtable,
+};
+
+struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
+	.alloc	= arm_32_lpae_alloc_pgtable_s1,
+	.free	= arm_lpae_free_pgtable,
+};
+
+struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
+	.alloc	= arm_32_lpae_alloc_pgtable_s2,
+	.free	= arm_lpae_free_pgtable,
+};
+
+#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
+
+static struct io_pgtable_cfg *cfg_cookie;
+
+static void dummy_tlb_flush_all(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static void dummy_tlb_add_flush(unsigned long iova, size_t size,
+				size_t granule, bool leaf, void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
+}
+
+static void dummy_tlb_sync(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static struct iommu_gather_ops dummy_tlb_ops __initdata = {
+	.tlb_flush_all	= dummy_tlb_flush_all,
+	.tlb_add_flush	= dummy_tlb_add_flush,
+	.tlb_sync	= dummy_tlb_sync,
+};
+
+static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
+{
+	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
+		cfg->pgsize_bitmap, cfg->ias);
+	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
+		data->levels, data->pgd_size, data->pg_shift,
+		data->bits_per_level, data->pgd);
+}
+
+#define __FAIL(ops, i)	({						\
+		WARN(1, "selftest: test failed for fmt idx %d\n", (i));	\
+		arm_lpae_dump_ops(ops);					\
+		selftest_running = false;				\
+		-EFAULT;						\
+})
+
+static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
+{
+	static const enum io_pgtable_fmt fmts[] = {
+		ARM_64_LPAE_S1,
+		ARM_64_LPAE_S2,
+	};
+
+	int i, j;
+	unsigned long iova;
+	size_t size;
+	struct io_pgtable_ops *ops;
+
+	selftest_running = true;
+
+	for (i = 0; i < ARRAY_SIZE(fmts); ++i) {
+		cfg_cookie = cfg;
+		ops = alloc_io_pgtable_ops(fmts[i], cfg, cfg);
+		if (!ops) {
+			pr_err("selftest: failed to allocate io pgtable ops\n");
+			return -ENOMEM;
+		}
+
+		/*
+		 * Initial sanity checks.
+		 * Empty page tables shouldn't provide any translations.
+		 */
+		if (ops->iova_to_phys(ops, 42))
+			return __FAIL(ops, i);
+
+		if (ops->iova_to_phys(ops, SZ_1G + 42))
+			return __FAIL(ops, i);
+
+		if (ops->iova_to_phys(ops, SZ_2G + 42))
+			return __FAIL(ops, i);
+
+		/*
+		 * Distinct mappings of different granule sizes.
+		 */
+		iova = 0;
+		j = find_first_bit(&cfg->pgsize_bitmap, BITS_PER_LONG);
+		while (j != BITS_PER_LONG) {
+			size = 1UL << j;
+
+			if (ops->map(ops, iova, iova, size, IOMMU_READ |
+							    IOMMU_WRITE |
+							    IOMMU_NOEXEC |
+							    IOMMU_CACHE))
+				return __FAIL(ops, i);
+
+			/* Overlapping mappings */
+			if (!ops->map(ops, iova, iova + size, size,
+				      IOMMU_READ | IOMMU_NOEXEC))
+				return __FAIL(ops, i);
+
+			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+				return __FAIL(ops, i);
+
+			iova += SZ_1G;
+			j++;
+			j = find_next_bit(&cfg->pgsize_bitmap, BITS_PER_LONG, j);
+		}
+
+		/* Partial unmap */
+		size = 1UL << __ffs(cfg->pgsize_bitmap);
+		if (ops->unmap(ops, SZ_1G + size, size) != size)
+			return __FAIL(ops, i);
+
+		/* Remap of partial unmap */
+		if (ops->map(ops, SZ_1G + size, size, size, IOMMU_READ))
+			return __FAIL(ops, i);
+
+		if (ops->iova_to_phys(ops, SZ_1G + size + 42) != (size + 42))
+			return __FAIL(ops, i);
+
+		/* Full unmap */
+		iova = 0;
+		j = find_first_bit(&cfg->pgsize_bitmap, BITS_PER_LONG);
+		while (j != BITS_PER_LONG) {
+			size = 1UL << j;
+
+			if (ops->unmap(ops, iova, size) != size)
+				return __FAIL(ops, i);
+
+			if (ops->iova_to_phys(ops, iova + 42))
+				return __FAIL(ops, i);
+
+			/* Remap full block */
+			if (ops->map(ops, iova, iova, size, IOMMU_WRITE))
+				return __FAIL(ops, i);
+
+			if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+				return __FAIL(ops, i);
+
+			iova += SZ_1G;
+			j++;
+			j = find_next_bit(&cfg->pgsize_bitmap, BITS_PER_LONG, j);
+		}
+
+		free_io_pgtable_ops(ops);
+	}
+
+	selftest_running = false;
+	return 0;
+}
+
+static int __init arm_lpae_do_selftests(void)
+{
+	static const unsigned long pgsize[] = {
+		SZ_4K | SZ_2M | SZ_1G,
+		SZ_16K | SZ_32M,
+		SZ_64K | SZ_512M,
+	};
+
+	static const unsigned int ias[] = {
+		32, 36, 40, 42, 44, 48,
+	};
+
+	int i, j, pass = 0, fail = 0;
+	struct io_pgtable_cfg cfg = {
+		.tlb = &dummy_tlb_ops,
+		.oas = 48,
+	};
+
+	for (i = 0; i < ARRAY_SIZE(pgsize); ++i) {
+		for (j = 0; j < ARRAY_SIZE(ias); ++j) {
+			cfg.pgsize_bitmap = pgsize[i];
+			cfg.ias = ias[j];
+			pr_info("selftest: pgsize_bitmap 0x%08lx, IAS %u\n",
+				pgsize[i], ias[j]);
+			if (arm_lpae_run_tests(&cfg))
+				fail++;
+			else
+				pass++;
+		}
+	}
+
+	pr_info("selftest: completed with %d PASS %d FAIL\n", pass, fail);
+	return fail ? -EFAULT : 0;
+}
+subsys_initcall(arm_lpae_do_selftests);
+#endif
diff --git a/xen/drivers/passthrough/arm/io-pgtable.c b/xen/drivers/passthrough/arm/io-pgtable.c
new file mode 100644
index 0000000..127558d
--- /dev/null
+++ b/xen/drivers/passthrough/arm/io-pgtable.c
@@ -0,0 +1,79 @@
+/*
+ * Generic page table allocator for IOMMUs.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2014 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+
+#include <linux/bug.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "io-pgtable.h"
+
+static const struct io_pgtable_init_fns *
+io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
+#ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE
+	[ARM_32_LPAE_S1] = &io_pgtable_arm_32_lpae_s1_init_fns,
+	[ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns,
+	[ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
+	[ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
+#endif
+#ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S
+	[ARM_V7S] = &io_pgtable_arm_v7s_init_fns,
+#endif
+};
+
+struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
+					    struct io_pgtable_cfg *cfg,
+					    void *cookie)
+{
+	struct io_pgtable *iop;
+	const struct io_pgtable_init_fns *fns;
+
+	if (fmt >= IO_PGTABLE_NUM_FMTS)
+		return NULL;
+
+	fns = io_pgtable_init_table[fmt];
+	if (!fns)
+		return NULL;
+
+	iop = fns->alloc(cfg, cookie);
+	if (!iop)
+		return NULL;
+
+	iop->fmt	= fmt;
+	iop->cookie	= cookie;
+	iop->cfg	= *cfg;
+
+	return &iop->ops;
+}
+
+/*
+ * It is the IOMMU driver's responsibility to ensure that the page table
+ * is no longer accessible to the walker by this point.
+ */
+void free_io_pgtable_ops(struct io_pgtable_ops *ops)
+{
+	struct io_pgtable *iop;
+
+	if (!ops)
+		return;
+
+	iop = container_of(ops, struct io_pgtable, ops);
+	io_pgtable_tlb_flush_all(iop);
+	io_pgtable_init_table[iop->fmt]->free(iop);
+}
diff --git a/xen/drivers/passthrough/arm/io-pgtable.h b/xen/drivers/passthrough/arm/io-pgtable.h
new file mode 100644
index 0000000..969d82c
--- /dev/null
+++ b/xen/drivers/passthrough/arm/io-pgtable.h
@@ -0,0 +1,208 @@
+#ifndef __IO_PGTABLE_H
+#define __IO_PGTABLE_H
+#include <linux/bitops.h>
+
+/*
+ * Public API for use by IOMMU drivers
+ */
+enum io_pgtable_fmt {
+	ARM_32_LPAE_S1,
+	ARM_32_LPAE_S2,
+	ARM_64_LPAE_S1,
+	ARM_64_LPAE_S2,
+	ARM_V7S,
+	IO_PGTABLE_NUM_FMTS,
+};
+
+/**
+ * struct iommu_gather_ops - IOMMU callbacks for TLB and page table management.
+ *
+ * @tlb_flush_all: Synchronously invalidate the entire TLB context.
+ * @tlb_add_flush: Queue up a TLB invalidation for a virtual address range.
+ * @tlb_sync:      Ensure any queued TLB invalidation has taken effect, and
+ *                 any corresponding page table updates are visible to the
+ *                 IOMMU.
+ *
+ * Note that these can all be called in atomic context and must therefore
+ * not block.
+ */
+struct iommu_gather_ops {
+	void (*tlb_flush_all)(void *cookie);
+	void (*tlb_add_flush)(unsigned long iova, size_t size, size_t granule,
+			      bool leaf, void *cookie);
+	void (*tlb_sync)(void *cookie);
+};
+
+/**
+ * struct io_pgtable_cfg - Configuration data for a set of page tables.
+ *
+ * @quirks:        A bitmap of hardware quirks that require some special
+ *                 action by the low-level page table allocator.
+ * @pgsize_bitmap: A bitmap of page sizes supported by this set of page
+ *                 tables.
+ * @ias:           Input address (iova) size, in bits.
+ * @oas:           Output address (paddr) size, in bits.
+ * @tlb:           TLB management callbacks for this set of tables.
+ * @iommu_dev:     The device representing the DMA configuration for the
+ *                 page table walker.
+ */
+struct io_pgtable_cfg {
+	/*
+	 * IO_PGTABLE_QUIRK_ARM_NS: (ARM formats) Set NS and NSTABLE bits in
+	 *	stage 1 PTEs, for hardware which insists on validating them
+	 *	even in	non-secure state where they should normally be ignored.
+	 *
+	 * IO_PGTABLE_QUIRK_NO_PERMS: Ignore the IOMMU_READ, IOMMU_WRITE and
+	 *	IOMMU_NOEXEC flags and map everything with full access, for
+	 *	hardware which does not implement the permissions of a given
+	 *	format, and/or requires some format-specific default value.
+	 *
+	 * IO_PGTABLE_QUIRK_TLBI_ON_MAP: If the format forbids caching invalid
+	 *	(unmapped) entries but the hardware might do so anyway, perform
+	 *	TLB maintenance when mapping as well as when unmapping.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_MTK_4GB: (ARM v7s format) Set bit 9 in all
+	 *	PTEs, for Mediatek IOMMUs which treat it as a 33rd address bit
+	 *	when the SoC is in "4GB mode" and they can only access the high
+	 *	remap of DRAM (0x1_00000000 to 0x1_ffffffff).
+	 */
+	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
+	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
+	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
+	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
+	unsigned long			quirks;
+	unsigned long			pgsize_bitmap;
+	unsigned int			ias;
+	unsigned int			oas;
+	const struct iommu_gather_ops	*tlb;
+	struct device			*iommu_dev;
+
+	/* Low-level data specific to the table format */
+	union {
+		struct {
+			u64	ttbr[2];
+			u64	tcr;
+			u64	mair[2];
+		} arm_lpae_s1_cfg;
+
+		struct {
+			u64	vttbr;
+			u64	vtcr;
+		} arm_lpae_s2_cfg;
+
+		struct {
+			u32	ttbr[2];
+			u32	tcr;
+			u32	nmrr;
+			u32	prrr;
+		} arm_v7s_cfg;
+	};
+};
+
+/**
+ * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
+ *
+ * @map:          Map a physically contiguous memory region.
+ * @unmap:        Unmap a physically contiguous memory region.
+ * @iova_to_phys: Translate iova to physical address.
+ *
+ * These functions map directly onto the iommu_ops member functions with
+ * the same names.
+ */
+struct io_pgtable_ops {
+	int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
+		   phys_addr_t paddr, size_t size, int prot);
+	int (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
+		     size_t size);
+	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
+				    unsigned long iova);
+};
+
+/**
+ * alloc_io_pgtable_ops() - Allocate a page table allocator for use by an IOMMU.
+ *
+ * @fmt:    The page table format.
+ * @cfg:    The page table configuration. This will be modified to represent
+ *          the configuration actually provided by the allocator (e.g. the
+ *          pgsize_bitmap may be restricted).
+ * @cookie: An opaque token provided by the IOMMU driver and passed back to
+ *          the callback routines in cfg->tlb.
+ */
+struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
+					    struct io_pgtable_cfg *cfg,
+					    void *cookie);
+
+/**
+ * free_io_pgtable_ops() - Free an io_pgtable_ops structure. The caller
+ *                         *must* ensure that the page table is no longer
+ *                         live, but the TLB can be dirty.
+ *
+ * @ops: The ops returned from alloc_io_pgtable_ops.
+ */
+void free_io_pgtable_ops(struct io_pgtable_ops *ops);
+
+
+/*
+ * Internal structures for page table allocator implementations.
+ */
+
+/**
+ * struct io_pgtable - Internal structure describing a set of page tables.
+ *
+ * @fmt:    The page table format.
+ * @cookie: An opaque token provided by the IOMMU driver and passed back to
+ *          any callback routines.
+ * @tlb_sync_pending: Private flag for optimising out redundant syncs.
+ * @cfg:    A copy of the page table configuration.
+ * @ops:    The page table operations in use for this set of page tables.
+ */
+struct io_pgtable {
+	enum io_pgtable_fmt	fmt;
+	void			*cookie;
+	bool			tlb_sync_pending;
+	struct io_pgtable_cfg	cfg;
+	struct io_pgtable_ops	ops;
+};
+
+#define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
+
+static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
+{
+	iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	iop->tlb_sync_pending = true;
+}
+
+static inline void io_pgtable_tlb_add_flush(struct io_pgtable *iop,
+		unsigned long iova, size_t size, size_t granule, bool leaf)
+{
+	iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf, iop->cookie);
+	iop->tlb_sync_pending = true;
+}
+
+static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
+{
+	if (iop->tlb_sync_pending) {
+		iop->cfg.tlb->tlb_sync(iop->cookie);
+		iop->tlb_sync_pending = false;
+	}
+}
+
+/**
+ * struct io_pgtable_init_fns - Alloc/free a set of page tables for a
+ *                              particular format.
+ *
+ * @alloc: Allocate a set of page tables described by cfg.
+ * @free:  Free the page tables associated with iop.
+ */
+struct io_pgtable_init_fns {
+	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
+	void (*free)(struct io_pgtable *iop);
+};
+
+extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns;
+
+#endif /* __IO_PGTABLE_H */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 4/7] iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2017-07-26 15:10 ` [RFC PATCH v1 3/7] iommu/arm: ipmmu-vmsa: Add io-pgtables support Oleksandr Tyshchenko
@ 2017-07-26 15:10 ` Oleksandr Tyshchenko
  2017-07-26 15:10 ` [RFC PATCH v1 5/7] iommu/arm: Build IPMMU-VMSA related stuff Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Modify the Linux framework to be functional inside Xen.
It's mostly about differences between memory manipulations
in Xen and Linux.

Also wrap following code in #if 0:
- All DMA related stuff
- Stage-2 related things
- Self test

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/io-pgtable-arm.c | 235 +++++++++++++++++++++++----
 xen/drivers/passthrough/arm/io-pgtable.c     |  19 ++-
 xen/drivers/passthrough/arm/io-pgtable.h     |  14 +-
 3 files changed, 231 insertions(+), 37 deletions(-)

diff --git a/xen/drivers/passthrough/arm/io-pgtable-arm.c b/xen/drivers/passthrough/arm/io-pgtable-arm.c
index f5c90e1..c98caa3 100644
--- a/xen/drivers/passthrough/arm/io-pgtable-arm.c
+++ b/xen/drivers/passthrough/arm/io-pgtable-arm.c
@@ -16,20 +16,76 @@
  * Copyright (C) 2014 ARM Limited
  *
  * Author: Will Deacon <will.deacon@arm.com>
+ *
+ * Based on Linux drivers/iommu/io-pgtable-arm.c
+ * => commit 7c6d90e2bb1a98b86d73b9e8ab4d97ed5507e37c
+ * (iommu/io-pgtable-arm: Fix iova_to_phys for block entries)
+ *
+ * Xen modification:
+ * Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
+ * Copyright (C) 2016-2017 EPAM Systems Inc.
  */
 
-#define pr_fmt(fmt)	"arm-lpae io-pgtable: " fmt
+#include <xen/config.h>
+#include <xen/delay.h>
+#include <xen/errno.h>
+#include <xen/err.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/vmap.h>
+#include <xen/rbtree.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+#include <xen/log2.h>
+#include <xen/domain_page.h>
+#include <asm/atomic.h>
+#include <asm/device.h>
+#include <asm/io.h>
+#include <asm/platform.h>
 
-#include <linux/iommu.h>
-#include <linux/kernel.h>
-#include <linux/sizes.h>
-#include <linux/slab.h>
-#include <linux/types.h>
-#include <linux/dma-mapping.h>
+#include "io-pgtable.h"
 
-#include <asm/barrier.h>
+/***** Start of Xen specific code *****/
 
-#include "io-pgtable.h"
+#define IOMMU_READ	(1 << 0)
+#define IOMMU_WRITE	(1 << 1)
+#define IOMMU_CACHE	(1 << 2) /* DMA cache coherency */
+#define IOMMU_NOEXEC	(1 << 3)
+#define IOMMU_MMIO	(1 << 4) /* e.g. things like MSI doorbells */
+
+#define kfree xfree
+#define kmalloc(size, flags)		_xmalloc(size, sizeof(void *))
+#define kzalloc(size, flags)		_xzalloc(size, sizeof(void *))
+#define devm_kzalloc(dev, size, flags)	_xzalloc(size, sizeof(void *))
+#define kmalloc_array(size, n, flags)	_xmalloc_array(size, sizeof(void *), n)
+
+typedef enum {
+	GFP_KERNEL,
+	GFP_ATOMIC,
+	__GFP_HIGHMEM,
+	__GFP_HIGH
+} gfp_t;
+
+#define __fls(x) (fls(x) - 1)
+#define __ffs(x) (ffs(x) - 1)
+
+/*
+ * Re-define WARN_ON with implementation that "returns" result that
+ * allow us to use following construction:
+ * if (WARN_ON(condition))
+ *     return error;
+ */
+#undef WARN_ON
+#define WARN_ON(condition) ({                                           \
+        int __ret_warn_on = !!(condition);                              \
+        if (unlikely(__ret_warn_on))                                    \
+               WARN();                                                  \
+        unlikely(__ret_warn_on);                                        \
+})
+
+/***** Start of Linux allocator code *****/
 
 #define ARM_LPAE_MAX_ADDR_BITS		48
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -166,9 +222,10 @@
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
 
+/* Xen: __va is not suitable here use maddr_to_page instead. */
 /* IOPTE accessors */
 #define iopte_deref(pte,d)					\
-	(__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)	\
+	(maddr_to_page((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)	\
 	& ~(ARM_LPAE_GRANULE(d) - 1ULL)))
 
 #define iopte_type(pte,l)					\
@@ -195,11 +252,21 @@ struct arm_lpae_io_pgtable {
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
 
-	void			*pgd;
+	/* Xen: We deal with domain pages. */
+	struct page_info	*pgd;
 };
 
 typedef u64 arm_lpae_iopte;
 
+/*
+ * Xen: Overwrite Linux functions that are in charge of memory
+ * allocation/deallocation by Xen ones. The main reason is that we want to
+ * operate with domain pages and as the result we have to use Xen's API for this.
+ * Taking into account that Xen's API deals with struct page_info *page
+ * modify all depended code. Also keep in mind that the domain pages must be
+ * mapped just before using it and unmapped right after we completed.
+ */
+#if 0
 static bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -259,6 +326,41 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
 					   __arm_lpae_dma_addr(ptep),
 					   sizeof(pte), DMA_TO_DEVICE);
 }
+#endif
+
+static struct page_info *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
+				    struct io_pgtable_cfg *cfg)
+{
+	struct page_info *pages;
+	unsigned int order = get_order_from_bytes(size);
+	int i;
+
+	pages = alloc_domheap_pages(NULL, order, 0);
+	if (pages == NULL)
+		return NULL;
+
+	for (i = 0; i < (1 << order); i ++)
+		clear_and_clean_page(pages + i);
+
+	return pages;
+}
+
+static void __arm_lpae_free_pages(struct page_info *pages, size_t size,
+				  struct io_pgtable_cfg *cfg)
+{
+	unsigned int order = get_order_from_bytes(size);
+
+	free_domheap_pages(pages, order);
+}
+
+static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
+			       struct io_pgtable_cfg *cfg)
+{
+	smp_mb();
+	*ptep = pte;
+	smp_mb();
+	clean_dcache(*ptep);
+}
 
 static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			    unsigned long iova, size_t size, int lvl,
@@ -274,7 +376,9 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 
 	if (iopte_leaf(*ptep, lvl)) {
 		/* We require an unmap first */
+#if 0 /* Xen: Not needed */
 		WARN_ON(!selftest_running);
+#endif
 		return -EEXIST;
 	} else if (iopte_type(*ptep, lvl) == ARM_LPAE_PTE_TYPE_TABLE) {
 		/*
@@ -304,6 +408,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 	return 0;
 }
 
+/* Xen: We deal with domain pages. */
 static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 			  phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
 			  int lvl, arm_lpae_iopte *ptep)
@@ -311,6 +416,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	arm_lpae_iopte *cptep, pte;
 	size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	struct page_info *page;
+	int ret;
 
 	/* Find our entry at the current level */
 	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
@@ -326,21 +433,32 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	/* Grab a pointer to the next level */
 	pte = *ptep;
 	if (!pte) {
-		cptep = __arm_lpae_alloc_pages(ARM_LPAE_GRANULE(data),
+		page = __arm_lpae_alloc_pages(ARM_LPAE_GRANULE(data),
 					       GFP_ATOMIC, cfg);
-		if (!cptep)
+		if (!page)
 			return -ENOMEM;
 
-		pte = __pa(cptep) | ARM_LPAE_PTE_TYPE_TABLE;
+		/* Xen: __pa is not suitable here use page_to_maddr instead. */
+		pte = page_to_maddr(page) | ARM_LPAE_PTE_TYPE_TABLE;
 		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
 			pte |= ARM_LPAE_PTE_NSTABLE;
 		__arm_lpae_set_pte(ptep, pte, cfg);
+	/* Xen: Sync with my fix for Linux */
+	} else if (!iopte_leaf(pte, lvl)) {
+		page = iopte_deref(pte, data);
 	} else {
-		cptep = iopte_deref(pte, data);
+		/* We require an unmap first */
+#if 0 /* Xen: Not needed */
+		WARN_ON(!selftest_running);
+#endif
+		return -EEXIST;
 	}
 
 	/* Rinse, repeat */
-	return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep);
+	cptep = __map_domain_page(page);
+	ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep);
+	unmap_domain_page(cptep);
+	return ret;
 }
 
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
@@ -381,11 +499,12 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	return pte;
 }
 
+/* Xen: We deal with domain pages. */
 static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 			phys_addr_t paddr, size_t size, int iommu_prot)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
 
@@ -394,21 +513,26 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 		return 0;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
+	ptep = __map_domain_page(data->pgd);
 	ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep);
+	unmap_domain_page(ptep);
+
 	/*
 	 * Synchronise all PTE updates for the new mapping before there's
 	 * a chance for anything to kick off a table walk for the new iova.
 	 */
-	wmb();
+	smp_wmb();
 
 	return ret;
 }
 
+/* Xen: We deal with domain pages. */
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
-				    arm_lpae_iopte *ptep)
+				    struct page_info *page)
 {
 	arm_lpae_iopte *start, *end;
 	unsigned long table_size;
+	arm_lpae_iopte *ptep = __map_domain_page(page);
 
 	if (lvl == ARM_LPAE_START_LVL(data))
 		table_size = data->pgd_size;
@@ -432,7 +556,8 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
 	}
 
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+	unmap_domain_page(start);
+	__arm_lpae_free_pages(page, table_size, &data->iop.cfg);
 }
 
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
@@ -443,6 +568,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 	kfree(data);
 }
 
+/* Xen: We deal with domain pages. */
 static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 				    unsigned long iova, size_t size,
 				    arm_lpae_iopte prot, int lvl,
@@ -469,8 +595,12 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 				   tablep) < 0) {
 			if (table) {
 				/* Free the table we allocated */
-				tablep = iopte_deref(table, data);
-				__arm_lpae_free_pgtable(data, lvl + 1, tablep);
+				/*
+				 * Xen: iopte_deref returns struct page_info *,
+				 * it is exactly what we need. Pass it directly to function
+				 * instead of adding new variable.
+				 */
+				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(table, data));
 			}
 			return 0; /* Bytes unmapped */
 		}
@@ -482,6 +612,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	return size;
 }
 
+/* Xen: We deal with domain pages. */
 static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			    unsigned long iova, size_t size, int lvl,
 			    arm_lpae_iopte *ptep)
@@ -489,6 +620,7 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 	arm_lpae_iopte pte;
 	struct io_pgtable *iop = &data->iop;
 	size_t blk_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
+	int ret;
 
 	/* Something went horribly wrong and we ran out of page table */
 	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
@@ -496,6 +628,10 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 
 	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
 	pte = *ptep;
+	/*
+	 * Xen: TODO: Sometimes we catch this since p2m tries to unmap
+	 * the same page twice.
+	 */
 	if (WARN_ON(!pte))
 		return 0;
 
@@ -508,8 +644,12 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			io_pgtable_tlb_add_flush(iop, iova, size,
 						ARM_LPAE_GRANULE(data), false);
 			io_pgtable_tlb_sync(iop);
-			ptep = iopte_deref(pte, data);
-			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
+			/*
+			 * Xen: iopte_deref returns struct page_info *,
+			 * it is exactly what we need. Pass it directly to function
+			 * instead of adding new variable.
+			 */
+			__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
 		} else {
 			io_pgtable_tlb_add_flush(iop, iova, size, size, true);
 		}
@@ -526,39 +666,48 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 	}
 
 	/* Keep on walkin' */
-	ptep = iopte_deref(pte, data);
-	return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+	ptep = __map_domain_page(iopte_deref(pte, data));
+	ret = __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
+	unmap_domain_page(ptep);
+	return ret;
 }
 
+/* Xen: We deal with domain pages. */
 static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			  size_t size)
 {
 	size_t unmapped;
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep = __map_domain_page(data->pgd);
 	int lvl = ARM_LPAE_START_LVL(data);
 
 	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);
 	if (unmapped)
 		io_pgtable_tlb_sync(&data->iop);
+	unmap_domain_page(ptep);
+
+	/* Xen: Add barrier here to synchronise all PTE updates. */
+	smp_wmb();
 
 	return unmapped;
 }
 
+/* Xen: We deal with domain pages. */
 static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 					 unsigned long iova)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
+	arm_lpae_iopte pte, *ptep = __map_domain_page(data->pgd);
 	int lvl = ARM_LPAE_START_LVL(data);
 
 	do {
 		/* Valid IOPTE pointer? */
 		if (!ptep)
-			return 0;
+			break;
 
 		/* Grab the IOPTE we're interested in */
 		pte = *(ptep + ARM_LPAE_LVL_IDX(iova, lvl, data));
+		unmap_domain_page(ptep);
 
 		/* Valid entry? */
 		if (!pte)
@@ -569,9 +718,10 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 			goto found_translation;
 
 		/* Take it to the next level */
-		ptep = iopte_deref(pte, data);
+		ptep = __map_domain_page(iopte_deref(pte, data));
 	} while (++lvl < ARM_LPAE_MAX_LEVELS);
 
+	unmap_domain_page(ptep);
 	/* Ran out of page tables to walk */
 	return 0;
 
@@ -626,16 +776,25 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	if (!(cfg->pgsize_bitmap & (SZ_4K | SZ_16K | SZ_64K)))
 		return NULL;
 
+	/*
+	 * Xen: Just to be sure that minimum page supported by the IOMMU
+	 * is not bigger than PAGE_SIZE.
+	 */
+	if (PAGE_SIZE & ((1 << __ffs(cfg->pgsize_bitmap)) - 1))
+		return NULL;
+
 	if (cfg->ias > ARM_LPAE_MAX_ADDR_BITS)
 		return NULL;
 
 	if (cfg->oas > ARM_LPAE_MAX_ADDR_BITS)
 		return NULL;
 
+#if 0 /* Xen: Not needed */
 	if (!selftest_running && cfg->iommu_dev->dma_pfn_offset) {
 		dev_err(cfg->iommu_dev, "Cannot accommodate DMA offset for IOMMU page tables\n");
 		return NULL;
 	}
+#endif
 
 	data = kmalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
@@ -736,10 +895,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
-	wmb();
+	smp_wmb();
 
 	/* TTBRs */
-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
+	/* Xen: virt_to_phys is not suitable here use page_to_maddr instead */
+	cfg->arm_lpae_s1_cfg.ttbr[0] = page_to_maddr(data->pgd);
 	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
 	return &data->iop;
 
@@ -748,6 +908,7 @@ out_free_data:
 	return NULL;
 }
 
+#if 0 /* Xen: Not needed */
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -840,6 +1001,7 @@ out_free_data:
 	kfree(data);
 	return NULL;
 }
+#endif
 
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
@@ -859,6 +1021,7 @@ arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	return iop;
 }
 
+#if 0 /* Xen: Not needed */
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 {
@@ -874,26 +1037,34 @@ arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	return iop;
 }
+#endif
 
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
 	.alloc	= arm_64_lpae_alloc_pgtable_s1,
 	.free	= arm_lpae_free_pgtable,
 };
 
+#if 0 /* Xen: Not needed */
 struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = {
 	.alloc	= arm_64_lpae_alloc_pgtable_s2,
 	.free	= arm_lpae_free_pgtable,
 };
+#endif
 
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = {
 	.alloc	= arm_32_lpae_alloc_pgtable_s1,
 	.free	= arm_lpae_free_pgtable,
 };
 
+#if 0 /* Xen: Not needed */
 struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = {
 	.alloc	= arm_32_lpae_alloc_pgtable_s2,
 	.free	= arm_lpae_free_pgtable,
 };
+#endif
+
+/* Xen: */
+#undef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
diff --git a/xen/drivers/passthrough/arm/io-pgtable.c b/xen/drivers/passthrough/arm/io-pgtable.c
index 127558d..bfc7020 100644
--- a/xen/drivers/passthrough/arm/io-pgtable.c
+++ b/xen/drivers/passthrough/arm/io-pgtable.c
@@ -16,22 +16,33 @@
  * Copyright (C) 2014 ARM Limited
  *
  * Author: Will Deacon <will.deacon@arm.com>
+ *
+ * Based on Linux drivers/iommu/io-pgtable.c
+ * => commit 54c6d242fa32cba8313936e3a35f27dc2c7c3e04
+ * (iommu/io-pgtable: Fix a brace coding style issue)
+ *
+ * Xen modification:
+ * Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
+ * Copyright (C) 2016-2017 EPAM Systems Inc.
  */
 
-#include <linux/bug.h>
-#include <linux/kernel.h>
-#include <linux/types.h>
-
 #include "io-pgtable.h"
 
+/* Xen: Just compile what we exactly want. */
+#define CONFIG_IOMMU_IO_PGTABLE_LPAE
+
 static const struct io_pgtable_init_fns *
 io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE
 	[ARM_32_LPAE_S1] = &io_pgtable_arm_32_lpae_s1_init_fns,
+#if 0 /* Xen: Not needed */
 	[ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns,
+#endif
 	[ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
+#if 0 /* Xen: Not needed */
 	[ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
 #endif
+#endif
 #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S
 	[ARM_V7S] = &io_pgtable_arm_v7s_init_fns,
 #endif
diff --git a/xen/drivers/passthrough/arm/io-pgtable.h b/xen/drivers/passthrough/arm/io-pgtable.h
index 969d82c..fb81fcf 100644
--- a/xen/drivers/passthrough/arm/io-pgtable.h
+++ b/xen/drivers/passthrough/arm/io-pgtable.h
@@ -1,6 +1,11 @@
 #ifndef __IO_PGTABLE_H
 #define __IO_PGTABLE_H
-#include <linux/bitops.h>
+#include <asm/device.h>
+#include <xen/sched.h>
+
+/* Xen */
+typedef paddr_t phys_addr_t;
+typedef paddr_t dma_addr_t;
 
 /*
  * Public API for use by IOMMU drivers
@@ -200,9 +205,16 @@ struct io_pgtable_init_fns {
 };
 
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
+#if 0 /* Xen: Not needed */
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
+#endif
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
+#if 0 /* Xen: Not needed */
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
+#endif
+/* Xen: Fix */
+#ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S
 extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns;
+#endif
 
 #endif /* __IO_PGTABLE_H */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 5/7] iommu/arm: Build IPMMU-VMSA related stuff
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2017-07-26 15:10 ` [RFC PATCH v1 4/7] iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables Oleksandr Tyshchenko
@ 2017-07-26 15:10 ` Oleksandr Tyshchenko
  2017-07-26 15:10 ` [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
index f4cd26e..b4eec6f 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,2 +1,5 @@
 obj-y += iommu.o
 obj-y += smmu.o
+obj-y += ipmmu-vmsa.o
+obj-y += io-pgtable.o
+obj-y += io-pgtable-arm.o
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2017-07-26 15:10 ` [RFC PATCH v1 5/7] iommu/arm: Build IPMMU-VMSA related stuff Oleksandr Tyshchenko
@ 2017-07-26 15:10 ` Oleksandr Tyshchenko
  2017-08-08 11:36   ` Julien Grall
  2017-07-26 15:10 ` [RFC PATCH v1 7/7] iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it Oleksandr Tyshchenko
  2017-08-01 12:27 ` [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Julien Grall
  7 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This is the PoC how to optimize page table deallocation sequence
by splitting it into separate chunks.
Use iommu_pt_cleanup_list to queue pages that need to be handled and
freed next time. Use free_page_table platform callback to dequeue
pages.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/io-pgtable-arm.c | 94 +++++++++++++++++++++++++---
 xen/drivers/passthrough/arm/io-pgtable.c     |  5 +-
 xen/drivers/passthrough/arm/io-pgtable.h     |  4 +-
 xen/drivers/passthrough/arm/ipmmu-vmsa.c     | 33 ++++++++--
 4 files changed, 119 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/arm/io-pgtable-arm.c b/xen/drivers/passthrough/arm/io-pgtable-arm.c
index c98caa3..7673fda 100644
--- a/xen/drivers/passthrough/arm/io-pgtable-arm.c
+++ b/xen/drivers/passthrough/arm/io-pgtable-arm.c
@@ -254,6 +254,10 @@ struct arm_lpae_io_pgtable {
 
 	/* Xen: We deal with domain pages. */
 	struct page_info	*pgd;
+	/* Xen: To indicate that deallocation sequence is in progress. */
+	bool_t				cleanup;
+	/* Xen: To count allocated domain pages. */
+	unsigned int		page_count;
 };
 
 typedef u64 arm_lpae_iopte;
@@ -329,7 +333,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
 #endif
 
 static struct page_info *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
-				    struct io_pgtable_cfg *cfg)
+				    struct arm_lpae_io_pgtable *data)
 {
 	struct page_info *pages;
 	unsigned int order = get_order_from_bytes(size);
@@ -342,15 +346,21 @@ static struct page_info *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	for (i = 0; i < (1 << order); i ++)
 		clear_and_clean_page(pages + i);
 
+	data->page_count += (1<<order);
+
 	return pages;
 }
 
 static void __arm_lpae_free_pages(struct page_info *pages, size_t size,
-				  struct io_pgtable_cfg *cfg)
+				  struct arm_lpae_io_pgtable *data)
 {
 	unsigned int order = get_order_from_bytes(size);
 
+	BUG_ON((int)data->page_count <= 0);
+
 	free_domheap_pages(pages, order);
+
+	data->page_count -= (1<<order);
 }
 
 static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
@@ -434,7 +444,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	pte = *ptep;
 	if (!pte) {
 		page = __arm_lpae_alloc_pages(ARM_LPAE_GRANULE(data),
-					       GFP_ATOMIC, cfg);
+					       GFP_ATOMIC, data);
 		if (!page)
 			return -ENOMEM;
 
@@ -526,6 +536,46 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	return ret;
 }
 
+static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
+				    struct page_info *page);
+
+/*
+ * TODO: We have reused unused at the moment "page->pad" variable for
+ * storing "data" pointer we need during deallocation sequence. The current
+ * free_page_table platform callback carries the only one "page" argument.
+ * To perform required calculations with the current (generic) allocator
+ * implementation we are highly interested in the following fields:
+ * - data->levels
+ * - data->pg_shift
+ * - data->pgd_size
+ * But, this necessity might be avoided if we integrate allocator code with
+ * IPMMU-VMSA driver. And these variables will turn into the
+ * corresponding #define-s.
+ */
+static void __arm_lpae_free_next_pgtable(struct arm_lpae_io_pgtable *data,
+				    int lvl, struct page_info *page)
+{
+	if (!data->cleanup) {
+		/*
+		 * We are here during normal page table maintenance. Just call
+		 * __arm_lpae_free_pgtable(), what we actually had to call.
+		 */
+		__arm_lpae_free_pgtable(data, lvl, page);
+	} else {
+		/*
+		 * The page table deallocation sequence is in progress. Use some fields
+		 * in struct page_info to pass arguments we will need during handling
+		 * this page back. Queue page to list.
+		 */
+		PFN_ORDER(page) = lvl;
+		page->pad = (u64)&data->iop.ops;
+
+		spin_lock(&iommu_pt_cleanup_lock);
+		page_list_add_tail(page, &iommu_pt_cleanup_list);
+		spin_unlock(&iommu_pt_cleanup_lock);
+	}
+}
+
 /* Xen: We deal with domain pages. */
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 				    struct page_info *page)
@@ -553,19 +603,41 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 		if (!pte || iopte_leaf(pte, lvl))
 			continue;
 
-		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
+		__arm_lpae_free_next_pgtable(data, lvl + 1, iopte_deref(pte, data));
 	}
 
 	unmap_domain_page(start);
-	__arm_lpae_free_pages(page, table_size, &data->iop.cfg);
+	__arm_lpae_free_pages(page, table_size, data);
 }
 
-static void arm_lpae_free_pgtable(struct io_pgtable *iop)
+/*
+ * We added extra "page" argument since we want to know what page is processed
+ * at the moment and should be freed.
+ * */
+static void arm_lpae_free_pgtable(struct io_pgtable *iop, struct page_info *page)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
+	int lvl;
 
-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
-	kfree(data);
+	if (!data->cleanup) {
+		/* Start page table deallocation sequence from the first level. */
+		data->cleanup = true;
+		lvl = ARM_LPAE_START_LVL(data);
+	} else {
+		/* Retrieve the level to continue deallocation sequence from. */
+		lvl = PFN_ORDER(page);
+		PFN_ORDER(page) = 0;
+		page->pad = 0;
+	}
+
+	__arm_lpae_free_pgtable(data, lvl, page);
+
+	/*
+	 * Seems, we have already deallocated all pages, so it is time
+	 * to release unfreed resource.
+	 */
+	if (!data->page_count)
+		kfree(data);
 }
 
 /* Xen: We deal with domain pages. */
@@ -889,8 +961,12 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s1_cfg.mair[0] = reg;
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
 
+	/* Just to be sure */
+	data->cleanup = false;
+	data->page_count = 0;
+
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, data);
 	if (!data->pgd)
 		goto out_free_data;
 
diff --git a/xen/drivers/passthrough/arm/io-pgtable.c b/xen/drivers/passthrough/arm/io-pgtable.c
index bfc7020..e25d731 100644
--- a/xen/drivers/passthrough/arm/io-pgtable.c
+++ b/xen/drivers/passthrough/arm/io-pgtable.c
@@ -77,7 +77,7 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
  * It is the IOMMU driver's responsibility to ensure that the page table
  * is no longer accessible to the walker by this point.
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops)
+void free_io_pgtable_ops(struct io_pgtable_ops *ops, struct page_info *page)
 {
 	struct io_pgtable *iop;
 
@@ -86,5 +86,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 
 	iop = container_of(ops, struct io_pgtable, ops);
 	io_pgtable_tlb_flush_all(iop);
-	io_pgtable_init_table[iop->fmt]->free(iop);
+	iop->cookie = NULL;
+	io_pgtable_init_table[iop->fmt]->free(iop, page);
 }
diff --git a/xen/drivers/passthrough/arm/io-pgtable.h b/xen/drivers/passthrough/arm/io-pgtable.h
index fb81fcf..df0e21b 100644
--- a/xen/drivers/passthrough/arm/io-pgtable.h
+++ b/xen/drivers/passthrough/arm/io-pgtable.h
@@ -144,7 +144,7 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
  *
  * @ops: The ops returned from alloc_io_pgtable_ops.
  */
-void free_io_pgtable_ops(struct io_pgtable_ops *ops);
+void free_io_pgtable_ops(struct io_pgtable_ops *ops, struct page_info *page);
 
 
 /*
@@ -201,7 +201,7 @@ static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
  */
 struct io_pgtable_init_fns {
 	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
-	void (*free)(struct io_pgtable *iop);
+	void (*free)(struct io_pgtable *iop, struct page_info *page);
 };
 
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index e54b507..2a04800 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -708,8 +708,8 @@ static void ipmmu_tlb_flush_all(void *cookie)
 {
 	struct ipmmu_vmsa_domain *domain = cookie;
 
-	/* Xen: Just return if context_id has non-existent value */
-	if (domain->context_id >= domain->root->num_ctx)
+	/* Xen: Just return if context is absent or context_id has non-existent value */
+	if (!domain || domain->context_id >= domain->root->num_ctx)
 		return;
 
 	ipmmu_tlb_invalidate(domain);
@@ -796,7 +796,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 */
 	ret = ipmmu_domain_allocate_context(domain->root, domain);
 	if (ret < 0) {
-		free_io_pgtable_ops(domain->iop);
+		/* Pass root page table for this domain as an argument. */
+		free_io_pgtable_ops(domain->iop,
+				maddr_to_page(domain->cfg.arm_lpae_s1_cfg.ttbr[0]));
 		return ret;
 	}
 
@@ -2193,7 +2195,12 @@ static void ipmmu_vmsa_destroy_domain(struct iommu_domain *io_domain)
 		 * been detached.
 		 */
 		ipmmu_domain_destroy_context(domain);
-		free_io_pgtable_ops(domain->iop);
+		/*
+		 * Pass root page table for this domain as an argument.
+		 * This call will lead to start deallocation sequence.
+		 */
+		free_io_pgtable_ops(domain->iop,
+				maddr_to_page(domain->cfg.arm_lpae_s1_cfg.ttbr[0]));
 	}
 
 	kfree(domain);
@@ -2383,6 +2390,17 @@ static int ipmmu_vmsa_domain_init(struct domain *d, bool use_iommu)
 	return 0;
 }
 
+/*
+ * Seems, there is one more page we need to process. So, retrieve
+ * the pointer and go on deallocation sequence.
+ */
+static void ipmmu_vmsa_free_page_table(struct page_info *page)
+{
+	struct io_pgtable_ops *ops = (struct io_pgtable_ops *)page->pad;
+
+	free_io_pgtable_ops(ops, page);
+}
+
 static void __hwdom_init ipmmu_vmsa_hwdom_init(struct domain *d)
 {
 }
@@ -2404,6 +2422,12 @@ static void ipmmu_vmsa_domain_teardown(struct domain *d)
 	ASSERT(list_empty(&xen_domain->contexts));
 	xfree(xen_domain);
 	dom_iommu(d)->arch.priv = NULL;
+	/*
+	 * After this point we have all domain resources deallocated, except
+	 * page table which we will deallocate asynchronously. The IOMMU code
+	 * provides us with iommu_pt_cleanup_list and free_page_table platform
+	 * callback what we actually going to use.
+	 */
 }
 
 static int __must_check ipmmu_vmsa_map_pages(struct domain *d,
@@ -2462,6 +2486,7 @@ static void ipmmu_vmsa_dump_p2m_table(struct domain *d)
 static const struct iommu_ops ipmmu_vmsa_iommu_ops = {
 	.init = ipmmu_vmsa_domain_init,
 	.hwdom_init = ipmmu_vmsa_hwdom_init,
+	.free_page_table = ipmmu_vmsa_free_page_table,
 	.teardown = ipmmu_vmsa_domain_teardown,
 	.iotlb_flush = ipmmu_vmsa_iotlb_flush,
 	.assign_device = ipmmu_vmsa_assign_dev,
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v1 7/7] iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2017-07-26 15:10 ` [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously Oleksandr Tyshchenko
@ 2017-07-26 15:10 ` Oleksandr Tyshchenko
  2017-08-01 12:27 ` [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Julien Grall
  7 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-07-26 15:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Tyshchenko, Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The patch was ported from RFC patch for Linux and
slightly modified in order to handle IOVA space above 32-bit.

iommu/ipmmu-vmsa: Initial R-Car Gen3 VA64 mode support
https://patchwork.kernel.org/patch/9532335/

Modifications to the original patch are:
- Increase IOVA space from 32-bit to 39-bit
- Print full IOVA in case of page fault
- Setup TTBR1 as well as TTBR0

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/drivers/passthrough/arm/ipmmu-vmsa.c | 55 ++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index 2a04800..211ce39 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -220,6 +220,7 @@ struct ipmmu_features {
 	bool has_eight_ctx;
 	bool setup_imbuscr;
 	bool twobit_imttbcr_sl0;
+	bool imctr_va64;
 };
 
 #ifdef CONFIG_RCAR_DDR_BACKUP
@@ -334,6 +335,7 @@ static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
 #define IM_CTX_SIZE			0x40
 
 #define IMCTR				0x0000
+#define IMCTR_VA64			(1 << 29)
 #define IMCTR_TRE			(1 << 17)
 #define IMCTR_AFE			(1 << 16)
 #define IMCTR_RTSEL_MASK		(3 << 4)
@@ -381,7 +383,7 @@ static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
 #define IMTTBCR_SL0_LVL_2		(0 << 4)
 #define IMTTBCR_SL0_LVL_1		(1 << 4)
 #define IMTTBCR_TSZ0_MASK		(7 << 0)
-#define IMTTBCR_TSZ0_SHIFT		O
+#define IMTTBCR_TSZ0_SHIFT		0
 
 #define IMTTBCR_SL0_TWOBIT_LVL_3	(0 << 6)
 #define IMTTBCR_SL0_TWOBIT_LVL_2	(1 << 6)
@@ -424,6 +426,7 @@ static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
 #define IMMAIR_ATTR_IDX_DEV		2
 
 #define IMEAR				0x0030
+#define IMEUAR				0x0034
 
 #define IMPCTR				0x0200
 #define IMPSTR				0x0208
@@ -770,7 +773,7 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 */
 	domain->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS;
 	domain->cfg.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
-	domain->cfg.ias = 32;
+	domain->cfg.ias = domain->root->features->imctr_va64 ? 39 : 32;
 	domain->cfg.oas = 40;
 	domain->cfg.tlb = &ipmmu_gather_ops;
 #if 0 /* Xen: Not needed */
@@ -783,8 +786,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 */
 	domain->cfg.iommu_dev = domain->root->dev;
 
-	domain->iop = alloc_io_pgtable_ops(ARM_32_LPAE_S1, &domain->cfg,
-					   domain);
+	domain->iop = alloc_io_pgtable_ops(domain->root->features->imctr_va64 ?
+					   ARM_64_LPAE_S1 : ARM_32_LPAE_S1,
+					   &domain->cfg, domain);
 	if (!domain->iop)
 		return -EINVAL;
 
@@ -818,6 +822,14 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
 
 	/*
+	 * With enabling IMCTR_VA64 we need to setup TTBR1 as well
+	 */
+	if (domain->root->features->imctr_va64) {
+		ipmmu_ctx_write(domain, IMTTLBR1, ttbr);
+		ipmmu_ctx_write(domain, IMTTUBR1, ttbr >> 32);
+	}
+
+	/*
 	 * TTBCR
 	 * We use long descriptors with inner-shareable WBWA tables and allocate
 	 * the whole 32-bit VA space to TTBR0.
@@ -828,6 +840,19 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	else
 		tmp = IMTTBCR_SL0_LVL_1;
 
+	/*
+	 * As we are going to use TTBR1 we need to setup attributes for the memory
+	 * associated with the translation table walks using TTBR1.
+	 * Also for using IMCTR_VA64 mode we need to calculate and setup
+	 * TTBR0/TTBR1 addressed regions.
+	 */
+	if (domain->root->features->imctr_va64) {
+		tmp |= IMTTBCR_SH1_INNER_SHAREABLE | IMTTBCR_ORGN1_WB_WA |
+				IMTTBCR_IRGN1_WB_WA;
+		tmp |= (64ULL - domain->cfg.ias) << IMTTBCR_TSZ0_SHIFT;
+		tmp |= (64ULL - domain->cfg.ias) << IMTTBCR_TSZ1_SHIFT;
+	}
+
 	ipmmu_ctx_write(domain, IMTTBCR, IMTTBCR_EAE |
 			IMTTBCR_SH0_INNER_SHAREABLE | IMTTBCR_ORGN0_WB_WA |
 			IMTTBCR_IRGN0_WB_WA | tmp);
@@ -855,7 +880,8 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
 	 * Xen: Enable the context for the root IPMMU only.
 	 */
 	ipmmu_ctx_write(domain, IMCTR,
-			 IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
+			 (domain->root->features->imctr_va64 ? IMCTR_VA64 : 0)
+			 | IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
 
 	return 0;
 }
@@ -909,13 +935,14 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 	const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF | IMSTR_TF;
 	struct ipmmu_vmsa_device *mmu = domain->mmu;
 	u32 status;
-	u32 iova;
+	u64 iova;
 
 	status = ipmmu_ctx_read(domain, IMSTR);
 	if (!(status & err_mask))
 		return IRQ_NONE;
 
-	iova = ipmmu_ctx_read(domain, IMEAR);
+	iova = ipmmu_ctx_read(domain, IMEAR) |
+			((u64)ipmmu_ctx_read(domain, IMEUAR) << 32);
 
 	/*
 	 * Clear the error status flags. Unlike traditional interrupt flag
@@ -927,10 +954,10 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 
 	/* Log fatal errors. */
 	if (status & IMSTR_MHIT)
-		dev_err_ratelimited(mmu->dev, "d%d: Multiple TLB hits @0x%08x\n",
+		dev_err_ratelimited(mmu->dev, "d%d: Multiple TLB hits @0x%"PRIx64"\n",
 				domain->d->domain_id, iova);
 	if (status & IMSTR_ABORT)
-		dev_err_ratelimited(mmu->dev, "d%d: Page Table Walk Abort @0x%08x\n",
+		dev_err_ratelimited(mmu->dev, "d%d: Page Table Walk Abort @0x%"PRIx64"\n",
 				domain->d->domain_id, iova);
 
 	if (!(status & (IMSTR_PF | IMSTR_TF)))
@@ -946,7 +973,7 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
 		return IRQ_HANDLED;
 
 	dev_err_ratelimited(mmu->dev,
-			"d%d: Unhandled fault: status 0x%08x iova 0x%08x\n",
+			"d%d: Unhandled fault: status 0x%08x iova 0x%"PRIx64"\n",
 			domain->d->domain_id, status, iova);
 
 	return IRQ_HANDLED;
@@ -1219,8 +1246,7 @@ size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova, size_t si
 	if ((dma_addr_t)iova + size > max_iova) {
 		printk("out-of-bound: iova 0x%lx + size 0x%zx > max_iova 0x%"PRIx64"\n",
 			   iova, size, max_iova);
-		/* TODO Return -EINVAL instead */
-		return 0;
+		return -EINVAL;
 	}
 
 	/*
@@ -1277,8 +1303,7 @@ int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova,
 	if ((dma_addr_t)iova + size > max_iova) {
 		printk("out-of-bound: iova 0x%lx + size 0x%zx > max_iova 0x%"PRIx64"\n",
 		       iova, size, max_iova);
-		/* TODO Return -EINVAL instead */
-		return 0;
+		return -EINVAL;
 	}
 
 	while (size) {
@@ -1725,6 +1750,7 @@ static const struct ipmmu_features ipmmu_features_default = {
 	.has_eight_ctx = false,
 	.setup_imbuscr = true,
 	.twobit_imttbcr_sl0 = false,
+	.imctr_va64 = false,
 };
 
 static const struct ipmmu_features ipmmu_features_rcar_gen3 = {
@@ -1733,6 +1759,7 @@ static const struct ipmmu_features ipmmu_features_rcar_gen3 = {
 	.has_eight_ctx = true,
 	.setup_imbuscr = false,
 	.twobit_imttbcr_sl0 = true,
+	.imctr_va64 = true,
 };
 
 static const struct of_device_id ipmmu_of_ids[] = {
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM
  2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
                   ` (6 preceding siblings ...)
  2017-07-26 15:10 ` [RFC PATCH v1 7/7] iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it Oleksandr Tyshchenko
@ 2017-08-01 12:27 ` Julien Grall
  2017-08-01 17:13   ` Oleksandr Tyshchenko
  7 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2017-08-01 12:27 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Goel, Sameer

On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Hi, all.

Hi,

Please CC maintainers and any relevant person on the cover letter. This 
is quite useful to have in the inbox.

> The purpose of this patch series is to add IPMMU-VMSA support to Xen on ARM.
> It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car Gen3 SoCs (ARM).
> And this IOMMU can't share the page table with the CPU since it doesn't use the same page-table format
> as the CPU on ARM therefore I name it "Non-shared" IOMMU.
> This all means that current patch series must be based on "Non-shared" IOMMU support [1]
> for the IPMMU-VMSA to be functional inside Xen.
>
> The IPMMU-VMSA driver as well as the ARM LPAE allocator were directly ported from BSP for Linux the vendor provides:
> git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git rcar-3.5.3

I think this is probably a good starting point to discuss about IOMMU 
support in Xen. I skimmed through the patches and saw the words "rfc" 
and "ported from BSP".

At the moment for IOMMU we rely on the Linux community to do the review, 
but this is not the case here as it is an RFC.  I can definitely help to 
check if it comply for Xen, but I don't have the competence to tell 
whether it is valid for the hardware.

We may want to find a compromise to get it merged in Xen, but surely we 
don't want to build it by default at least until we had feedback from 
the community about the validity of the code here.

As I mentioned above, we are currently borrowing drivers from Linux and 
adapting for Xen. Today we support SMMUv{1,2} (we need to resync it) and 
there are plan to add IPMMU-VMSA (this series) and SMMUv3.

I am aware that Linux IOMMU subsystem has growing quite a lot making 
more tricky to get support in Xen. I wanted to get feedback how complex 
from you and Sameer how complex it was and whether we should consider 
doing our own.

> Patch series was rebased on Xen 4.9.0 release and tested on Renesas R-Car Gen3 H3 ES2.0/M3 based boards
> with devices assigned to different domains.
>
> You can find patch series here:
> repo: https://github.com/otyshchenko1/xen.git branch: ipmmu_v2
>
> P.S. There is one more patch which needs to be brought back to life [2]
> Any reasons why this patch hasn't been upstremed yet?

The series didn't make it upstream. Feel free to resend it separately.

>
> Thank you.
>
> [1] [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM
> https://www.mail-archive.com/xen-devel@lists.xen.org/msg115901.html
>
> [2] [Xen-devel] [PATCH v8 02/28] xen: Add log2 functionality
> https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00031.html
>
> Oleksandr Tyshchenko (7):
>   iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support
>   iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
>   iommu/arm: ipmmu-vmsa: Add io-pgtables support
>   iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables
>   iommu/arm: Build IPMMU-VMSA related stuff
>   iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously
>   iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it
>
>  xen/drivers/passthrough/arm/Makefile         |    3 +
>  xen/drivers/passthrough/arm/io-pgtable-arm.c | 1331 +++++++++++++
>  xen/drivers/passthrough/arm/io-pgtable.c     |   91 +
>  xen/drivers/passthrough/arm/io-pgtable.h     |  220 +++
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c     | 2611 ++++++++++++++++++++++++++
>  5 files changed, 4256 insertions(+)
>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable-arm.c
>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable.c
>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable.h
>  create mode 100644 xen/drivers/passthrough/arm/ipmmu-vmsa.c
>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM
  2017-08-01 12:27 ` [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Julien Grall
@ 2017-08-01 17:13   ` Oleksandr Tyshchenko
  2017-08-08 11:21     ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-08-01 17:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, geert+renesas, Will Deacon, joro,
	damm+renesas, Oleksandr Tyshchenko, laurent.pinchart+renesas,
	Goel, Sameer, xen-devel, Robin Murphy, Artem Mygaiev

Hi, Julien

On Tue, Aug 1, 2017 at 3:27 PM, Julien Grall <julien.grall@arm.com> wrote:
> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Hi, all.
>
>
> Hi,
>
> Please CC maintainers and any relevant person on the cover letter. This is
> quite useful to have in the inbox.
Yes. I CCed guys who, I think, are/were involved in IPMMU-VMSA
development from Linux side +
IOMMU maintainers (mostly ARM). Sorry, if I missed someone or mistakenly added.

>
>> The purpose of this patch series is to add IPMMU-VMSA support to Xen on
>> ARM.
>> It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car
>> Gen3 SoCs (ARM).
>> And this IOMMU can't share the page table with the CPU since it doesn't
>> use the same page-table format
>> as the CPU on ARM therefore I name it "Non-shared" IOMMU.
>> This all means that current patch series must be based on "Non-shared"
>> IOMMU support [1]
>> for the IPMMU-VMSA to be functional inside Xen.
>>
>> The IPMMU-VMSA driver as well as the ARM LPAE allocator were directly
>> ported from BSP for Linux the vendor provides:
>> git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
>> rcar-3.5.3
>
>
> I think this is probably a good starting point to discuss about IOMMU
> support in Xen. I skimmed through the patches and saw the words "rfc" and
> "ported from BSP".
Well, at the time of porting IPMMU-VMSA driver, BSP [1] had more
complete support than mainline [2]
and seems to have at the moment.
For example, mainline driver still has single IPMMU context while BSP
driver can have up to 8 contexts,
the maximum uTLBs mainline driver can handle is 32, but for BSP driver
this value was increased to 48, etc.
But, I see attempts to get all required support in [3]. So, when this
support reaches upsteam, I hope,
it won't be a big problem to rebase on mainline driver if we decide to
align with it.

>
> At the moment for IOMMU we rely on the Linux community to do the review, but
> this is not the case here as it is an RFC.  I can definitely help to check
> if it comply for Xen,
yes, please.

> but I don't have the competence to tell whether it is
> valid for the hardware.
>
> We may want to find a compromise to get it merged in Xen, but surely we
> don't want to build it by default at least until we had feedback from the
> community about the validity of the code here.
agree.

>
> As I mentioned above, we are currently borrowing drivers from Linux and
> adapting for Xen. Today we support SMMUv{1,2} (we need to resync it) and
> there are plan to add IPMMU-VMSA (this series) and SMMUv3.
It would be really nice to have IPMMU-VMSA support in Xen. Without
this support the SCF [4] we are developing right now
and even the Passthrough feature won't be fully functional on R-Car
Gen3 based boards powered by Xen hypervisor.

>
> I am aware that Linux IOMMU subsystem has growing quite a lot making more
> tricky to get support in Xen. I wanted to get feedback how complex from you
> and Sameer how complex it was and whether we should consider doing our own.

Yes, the IPMMU-VMSA Linux driver relies on some Linux functional
(IOMMU/DMA/io-pgtable frameworks) the Xen doesn't have (it is
expected). So, it took *some time*
to make Linux driver happy inside Xen). Moreover, this all resulted in
the fact that the driver looks complicated a bit).
A lot of different wrappers, #if 0, code style, etc.
On the other hand, I think, I will be able to fairly quickly align
with new BSP, etc.

But, I really don't know should we continue to follow this direction
or not, perhaps it will depend on
how complex the entity is and how much things we must pull together
with it to make it happy.

>
>> Patch series was rebased on Xen 4.9.0 release and tested on Renesas R-Car
>> Gen3 H3 ES2.0/M3 based boards
>> with devices assigned to different domains.
>>
>> You can find patch series here:
>> repo: https://github.com/otyshchenko1/xen.git branch: ipmmu_v2
>>
>> P.S. There is one more patch which needs to be brought back to life [2]
>> Any reasons why this patch hasn't been upstremed yet?
>
>
> The series didn't make it upstream. Feel free to resend it separately.
ok.

>
>>
>> Thank you.
>>
>> [1] [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM
>> https://www.mail-archive.com/xen-devel@lists.xen.org/msg115901.html
>>
>> [2] [Xen-devel] [PATCH v8 02/28] xen: Add log2 functionality
>> https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00031.html
>>
>> Oleksandr Tyshchenko (7):
>>   iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support
>>   iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
>>   iommu/arm: ipmmu-vmsa: Add io-pgtables support
>>   iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables
>>   iommu/arm: Build IPMMU-VMSA related stuff
>>   iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously
>>   iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it
>>
>>  xen/drivers/passthrough/arm/Makefile         |    3 +
>>  xen/drivers/passthrough/arm/io-pgtable-arm.c | 1331 +++++++++++++
>>  xen/drivers/passthrough/arm/io-pgtable.c     |   91 +
>>  xen/drivers/passthrough/arm/io-pgtable.h     |  220 +++
>>  xen/drivers/passthrough/arm/ipmmu-vmsa.c     | 2611
>> ++++++++++++++++++++++++++
>>  5 files changed, 4256 insertions(+)
>>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable-arm.c
>>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable.c
>>  create mode 100644 xen/drivers/passthrough/arm/io-pgtable.h
>>  create mode 100644 xen/drivers/passthrough/arm/ipmmu-vmsa.c
>>
>
> Cheers,
>
> --
> Julien Grall

[1] https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3
[2] http://elixir.free-electrons.com/linux/latest/source/drivers/iommu/ipmmu-vmsa.c
[3] https://lwn.net/Articles/725769/
[4] https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02124.html

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM
  2017-08-01 17:13   ` Oleksandr Tyshchenko
@ 2017-08-08 11:21     ` Julien Grall
  2017-08-08 16:52       ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2017-08-08 11:21 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, geert+renesas, Will Deacon, joro,
	damm+renesas, Oleksandr Tyshchenko, laurent.pinchart+renesas,
	Goel, Sameer, xen-devel, Robin Murphy, Artem Mygaiev



On 01/08/17 18:13, Oleksandr Tyshchenko wrote:
> Hi, Julien
>
> On Tue, Aug 1, 2017 at 3:27 PM, Julien Grall <julien.grall@arm.com> wrote:
>> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>>
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> Hi, all.
>>
>>
>> Hi,
>>
>> Please CC maintainers and any relevant person on the cover letter. This is
>> quite useful to have in the inbox.
> Yes. I CCed guys who, I think, are/were involved in IPMMU-VMSA
> development from Linux side +
> IOMMU maintainers (mostly ARM). Sorry, if I missed someone or mistakenly added.
>
>>
>>> The purpose of this patch series is to add IPMMU-VMSA support to Xen on
>>> ARM.
>>> It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car
>>> Gen3 SoCs (ARM).
>>> And this IOMMU can't share the page table with the CPU since it doesn't
>>> use the same page-table format
>>> as the CPU on ARM therefore I name it "Non-shared" IOMMU.
>>> This all means that current patch series must be based on "Non-shared"
>>> IOMMU support [1]
>>> for the IPMMU-VMSA to be functional inside Xen.
>>>
>>> The IPMMU-VMSA driver as well as the ARM LPAE allocator were directly
>>> ported from BSP for Linux the vendor provides:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
>>> rcar-3.5.3
>>
>>
>> I think this is probably a good starting point to discuss about IOMMU
>> support in Xen. I skimmed through the patches and saw the words "rfc" and
>> "ported from BSP".
> Well, at the time of porting IPMMU-VMSA driver, BSP [1] had more
> complete support than mainline [2]
> and seems to have at the moment.
> For example, mainline driver still has single IPMMU context while BSP
> driver can have up to 8 contexts,
> the maximum uTLBs mainline driver can handle is 32, but for BSP driver
> this value was increased to 48, etc.
> But, I see attempts to get all required support in [3]. So, when this
> support reaches upsteam, I hope,
> it won't be a big problem to rebase on mainline driver if we decide to
> align with it.

My main concern here is this driver haven't had a thorough review by the 
Linux community.

When we ported the SMMUv{1,2} driver we knew the Linux community was 
happy with it and hence adapting for Xen was not a big deal. There are 
only few limited changes in the code from Linux.

Looking at patch #2, #6, #7, the changes don't seem limited in the code 
from Linux + it is a driver from a BSP. The code really needs to be very 
close to make the port from Linux really worth it.

Stefano do you have any opinion?

>
>>
>> At the moment for IOMMU we rely on the Linux community to do the review, but
>> this is not the case here as it is an RFC.  I can definitely help to check
>> if it comply for Xen,
> yes, please.
>
>> but I don't have the competence to tell whether it is
>> valid for the hardware.
>>
>> We may want to find a compromise to get it merged in Xen, but surely we
>> don't want to build it by default at least until we had feedback from the
>> community about the validity of the code here.
> agree.
>
>>
>> As I mentioned above, we are currently borrowing drivers from Linux and
>> adapting for Xen. Today we support SMMUv{1,2} (we need to resync it) and
>> there are plan to add IPMMU-VMSA (this series) and SMMUv3.
> It would be really nice to have IPMMU-VMSA support in Xen. Without
> this support the SCF [4] we are developing right now
> and even the Passthrough feature won't be fully functional on R-Car
> Gen3 based boards powered by Xen hypervisor.

As said in the previous e-mail. This would be nice enhancement for Xen, 
but we need to decide in which form it will be upstreamed.

>
>>
>> I am aware that Linux IOMMU subsystem has growing quite a lot making more
>> tricky to get support in Xen. I wanted to get feedback how complex from you
>> and Sameer how complex it was and whether we should consider doing our own.
>
> Yes, the IPMMU-VMSA Linux driver relies on some Linux functional
> (IOMMU/DMA/io-pgtable frameworks) the Xen doesn't have (it is
> expected). So, it took *some time*
> to make Linux driver happy inside Xen). Moreover, this all resulted in
> the fact that the driver looks complicated a bit).
> A lot of different wrappers, #if 0, code style, etc.
> On the other hand, I think, I will be able to fairly quickly align
> with new BSP, etc.
>
> But, I really don't know should we continue to follow this direction
> or not, perhaps it will depend on
> how complex the entity is and how much things we must pull together
> with it to make it happy.
>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-07-26 15:09 ` [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver Oleksandr Tyshchenko
@ 2017-08-08 11:34   ` Julien Grall
  2017-08-10 14:27     ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2017-08-08 11:34 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel; +Cc: Oleksandr Tyshchenko, Stefano Stabellini

Hi,

On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Modify the Linux IPMMU driver to be functional inside Xen.
> All devices within a single Xen domain must use the same
> IOMMU context no matter what IOMMU domains they are attached to.
> This is the main difference between drivers in Linux
> and Xen. Having 8 separate contexts allow us to passthrough
> devices to 8 guest domain at the same time.
>
> Also wrap following code in #if 0:
> - All DMA related stuff
> - Linux PM callbacks
> - Driver remove callback
> - iommu_group management
>
> Maybe, it would be more correct to move different Linux2Xen wrappers,
> define-s, helpers from IPMMU-VMSA and SMMU to some common file
> before introducing IPMMU-VMSA patch series. And this common file
> might be reused by possible future IOMMUs on ARM.

Yes please if we go forward with the Linux way.

>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> ---
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 984 +++++++++++++++++++++++++++++--
>  1 file changed, 948 insertions(+), 36 deletions(-)
>
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index 2b380ff..e54b507 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -6,31 +6,212 @@
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; version 2 of the License.
> + *
> + * Based on Linux drivers/iommu/ipmmu-vmsa.c
> + * => commit f4747eba89c9b5d90fdf0a5458866283c47395d8
> + * (iommu/ipmmu-vmsa: Restrict IOMMU Domain Geometry to 32-bit address space)
> + *
> + * Xen modification:
> + * Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
> + * Copyright (C) 2016-2017 EPAM Systems Inc.
>   */
>
> -#include <linux/bitmap.h>
> -#include <linux/delay.h>
> -#include <linux/dma-iommu.h>
> -#include <linux/dma-mapping.h>
> -#include <linux/err.h>
> -#include <linux/export.h>
> -#include <linux/interrupt.h>
> -#include <linux/io.h>
> -#include <linux/iommu.h>
> -#include <linux/module.h>
> -#include <linux/of.h>
> -#include <linux/of_iommu.h>
> -#include <linux/platform_device.h>
> -#include <linux/sizes.h>
> -#include <linux/slab.h>
> -
> -#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
> -#include <asm/dma-iommu.h>
> -#include <asm/pgalloc.h>
> -#endif
> +#include <xen/config.h>
> +#include <xen/delay.h>
> +#include <xen/errno.h>
> +#include <xen/err.h>
> +#include <xen/irq.h>
> +#include <xen/lib.h>
> +#include <xen/list.h>
> +#include <xen/mm.h>
> +#include <xen/vmap.h>
> +#include <xen/rbtree.h>
> +#include <xen/sched.h>
> +#include <xen/sizes.h>
> +#include <asm/atomic.h>
> +#include <asm/device.h>
> +#include <asm/io.h>
> +#include <asm/platform.h>
>
>  #include "io-pgtable.h"
>
> +/* TODO:
> + * 1. Optimize xen_domain->lock usage.
> + * 2. Show domain_id in every printk which is per Xen domain.
> + *
> + */
> +
> +/***** Start of Xen specific code *****/
> +
> +#define IOMMU_READ	(1 << 0)
> +#define IOMMU_WRITE	(1 << 1)
> +#define IOMMU_CACHE	(1 << 2) /* DMA cache coherency */
> +#define IOMMU_NOEXEC	(1 << 3)
> +#define IOMMU_MMIO	(1 << 4) /* e.g. things like MSI doorbells */
> +
> +#define __fls(x) (fls(x) - 1)
> +#define __ffs(x) (ffs(x) - 1)
> +
> +#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
> +
> +#define ioread32 readl
> +#define iowrite32 writel
> +
> +#define dev_info dev_notice
> +
> +#define devm_request_irq(unused, irq, func, flags, name, dev) \
> +	request_irq(irq, flags, func, name, dev)
> +
> +/* Alias to Xen device tree helpers */
> +#define device_node dt_device_node
> +#define of_phandle_args dt_phandle_args
> +#define of_device_id dt_device_match
> +#define of_match_node dt_match_node
> +#define of_parse_phandle_with_args dt_parse_phandle_with_args
> +#define of_find_property dt_find_property
> +#define of_count_phandle_with_args dt_count_phandle_with_args
> +
> +/* Xen: Helpers to get device MMIO and IRQs */
> +struct resource
> +{
> +	u64 addr;
> +	u64 size;
> +	unsigned int type;
> +};
> +
> +#define resource_size(res) (res)->size;
> +
> +#define platform_device dt_device_node
> +
> +#define IORESOURCE_MEM 0
> +#define IORESOURCE_IRQ 1
> +
> +static struct resource *platform_get_resource(struct platform_device *pdev,
> +					      unsigned int type,
> +					      unsigned int num)
> +{
> +	/*
> +	 * The resource is only used between 2 calls of platform_get_resource.
> +	 * It's quite ugly but it's avoid to add too much code in the part
> +	 * imported from Linux
> +	 */
> +	static struct resource res;
> +	int ret = 0;
> +
> +	res.type = type;
> +
> +	switch (type) {
> +	case IORESOURCE_MEM:
> +		ret = dt_device_get_address(pdev, num, &res.addr, &res.size);
> +
> +		return ((ret) ? NULL : &res);
> +
> +	case IORESOURCE_IRQ:
> +		ret = platform_get_irq(pdev, num);
> +		if (ret < 0)
> +			return NULL;
> +
> +		res.addr = ret;
> +		res.size = 1;
> +
> +		return &res;
> +
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +enum irqreturn {
> +	IRQ_NONE	= (0 << 0),
> +	IRQ_HANDLED	= (1 << 0),
> +};
> +
> +typedef enum irqreturn irqreturn_t;
> +
> +/* Device logger functions */
> +#define dev_print(dev, lvl, fmt, ...)						\
> +	 printk(lvl "ipmmu: %s: " fmt, dt_node_full_name(dev_to_dt(dev)), ## __VA_ARGS__)
> +
> +#define dev_dbg(dev, fmt, ...) dev_print(dev, XENLOG_DEBUG, fmt, ## __VA_ARGS__)
> +#define dev_notice(dev, fmt, ...) dev_print(dev, XENLOG_INFO, fmt, ## __VA_ARGS__)
> +#define dev_warn(dev, fmt, ...) dev_print(dev, XENLOG_WARNING, fmt, ## __VA_ARGS__)
> +#define dev_err(dev, fmt, ...) dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
> +
> +#define dev_err_ratelimited(dev, fmt, ...)					\
> +	 dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
> +
> +#define dev_name(dev) dt_node_full_name(dev_to_dt(dev))
> +
> +/* Alias to Xen allocation helpers */
> +#define kfree xfree
> +#define kmalloc(size, flags)		_xmalloc(size, sizeof(void *))
> +#define kzalloc(size, flags)		_xzalloc(size, sizeof(void *))
> +#define devm_kzalloc(dev, size, flags)	_xzalloc(size, sizeof(void *))
> +#define kmalloc_array(size, n, flags)	_xmalloc_array(size, sizeof(void *), n)
> +#define kcalloc(size, n, flags)		_xzalloc_array(size, sizeof(void *), n)
> +
> +static void __iomem *devm_ioremap_resource(struct device *dev,
> +					   struct resource *res)
> +{
> +	void __iomem *ptr;
> +
> +	if (!res || res->type != IORESOURCE_MEM) {
> +		dev_err(dev, "Invalid resource\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	ptr = ioremap_nocache(res->addr, res->size);
> +	if (!ptr) {
> +		dev_err(dev,
> +			"ioremap failed (addr 0x%"PRIx64" size 0x%"PRIx64")\n",
> +			res->addr, res->size);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return ptr;
> +}
> +
> +/* Xen doesn't handle IOMMU fault */
> +#define report_iommu_fault(...)	1
> +
> +#define MODULE_DEVICE_TABLE(type, name)
> +#define module_param_named(name, value, type, perm)
> +#define MODULE_PARM_DESC(_parm, desc)
> +
> +/* Xen: Dummy iommu_domain */
> +struct iommu_domain
> +{
> +	atomic_t ref;
> +	/* Used to link iommu_domain contexts for a same domain.
> +	 * There is at least one per-IPMMU to used by the domain.
> +	 * */
> +	struct list_head		list;
> +};
> +
> +/* Xen: Describes informations required for a Xen domain */
> +struct ipmmu_vmsa_xen_domain {
> +	spinlock_t			lock;
> +	/* List of context (i.e iommu_domain) associated to this domain */
> +	struct list_head		contexts;
> +	struct iommu_domain		*base_context;
> +};
> +
> +/*
> + * Xen: Information about each device stored in dev->archdata.iommu
> + *
> + * On Linux the dev->archdata.iommu only stores the arch specific information,
> + * but, on Xen, we also have to store the iommu domain.
> + */
> +struct ipmmu_vmsa_xen_device {
> +	struct iommu_domain *domain;
> +	struct ipmmu_vmsa_archdata *archdata;
> +};
> +
> +#define dev_iommu(dev) ((struct ipmmu_vmsa_xen_device *)dev->archdata.iommu)
> +#define dev_iommu_domain(dev) (dev_iommu(dev)->domain)
> +
> +/***** Start of Linux IPMMU code *****/
> +
>  #define IPMMU_CTX_MAX 8
>
>  struct ipmmu_features {
> @@ -64,7 +245,9 @@ struct ipmmu_vmsa_device {
>  	struct hw_register *reg_backup[IPMMU_CTX_MAX];
>  #endif
>
> +#if 0 /* Xen: Not needed */
>  	struct dma_iommu_mapping *mapping;
> +#endif
>  };
>
>  struct ipmmu_vmsa_domain {
> @@ -77,6 +260,9 @@ struct ipmmu_vmsa_domain {
>
>  	unsigned int context_id;
>  	spinlock_t lock;			/* Protects mappings */
> +
> +	/* Xen: Domain associated to this configuration */
> +	struct domain *d;
>  };
>
>  struct ipmmu_vmsa_archdata {
> @@ -94,14 +280,20 @@ struct ipmmu_vmsa_archdata {
>  static DEFINE_SPINLOCK(ipmmu_devices_lock);
>  static LIST_HEAD(ipmmu_devices);
>
> +#if 0 /* Xen: Not needed */
>  static DEFINE_SPINLOCK(ipmmu_slave_devices_lock);
>  static LIST_HEAD(ipmmu_slave_devices);
> +#endif
>
>  static struct ipmmu_vmsa_domain *to_vmsa_domain(struct iommu_domain *dom)
>  {
>  	return container_of(dom, struct ipmmu_vmsa_domain, io_domain);
>  }
>
> +/*
> + * Xen: Rewrite Linux helpers to manipulate with archdata on Xen.
> + */
> +#if 0
>  #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>  static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
>  {
> @@ -120,6 +312,16 @@ static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
>  {
>  }
>  #endif
> +#else
> +static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
> +{
> +	return dev_iommu(dev)->archdata;
> +}
> +static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata *p)
> +{
> +	dev_iommu(dev)->archdata = p;
> +}
> +#endif
>
>  #define TLB_LOOP_TIMEOUT		100	/* 100us */
>
> @@ -355,6 +557,10 @@ static struct hw_register *root_pgtable[IPMMU_CTX_MAX] = {
>
>  static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
>  {
> +	/* Xen: Fix */

Hmmm. Can we get a bit more details?

> +	if (!mmu)
> +		return false;
> +
>  	if (mmu->features->has_cache_leaf_nodes)
>  		return mmu->is_leaf ? false : true;
>  	else
> @@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct ipmmu_vmsa_domain *domain, unsigned int reg,
>  	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
>  }
>
> -static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned int reg,
> +/* Xen: Write the context for cache IPMMU only. */

Same here. Why does it need to be different with Xen?

> +static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned int reg,
>  			     u32 data)
>  {
>  	if (domain->mmu != domain->root)
> -		ipmmu_write(domain->mmu,
> -			    domain->context_id * IM_CTX_SIZE + reg, data);
> +		ipmmu_write(domain->mmu, domain->context_id * IM_CTX_SIZE + reg, data);
> +}
>
> -	ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg, data);
> +/*
> + * Xen: Write the context for both root IPMMU and all cache IPMMUs
> + * that assigned to this Xen domain.
> + */
> +static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned int reg,
> +			     u32 data)
> +{
> +	struct ipmmu_vmsa_xen_domain *xen_domain = dom_iommu(domain->d)->arch.priv;
> +	struct iommu_domain *io_domain;
> +
> +	list_for_each_entry(io_domain, &xen_domain->contexts, list)
> +		ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
> +
> +	ipmmu_ctx_write(domain, reg, data);
>  }
>
>  /* -----------------------------------------------------------------------------
> @@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
>  {
>  	struct ipmmu_vmsa_domain *domain = cookie;
>
> +	/* Xen: Just return if context_id has non-existent value */

Same here.

> +	if (domain->context_id >= domain->root->num_ctx)
> +		return;
> +
>  	ipmmu_tlb_invalidate(domain);
>  }
>
> @@ -549,8 +773,10 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	domain->cfg.ias = 32;
>  	domain->cfg.oas = 40;
>  	domain->cfg.tlb = &ipmmu_gather_ops;
> +#if 0 /* Xen: Not needed */
>  	domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32);
>  	domain->io_domain.geometry.force_aperture = true;
> +#endif
>  	/*
>  	 * TODO: Add support for coherent walk through CCI with DVM and remove
>  	 * cache handling. For now, delegate it to the io-pgtable code.
> @@ -562,6 +788,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	if (!domain->iop)
>  		return -EINVAL;
>
> +	/* Xen: Initialize context_id with non-existent value */
> +	domain->context_id = domain->root->num_ctx;

Why do you need to do that for Xen? Overall I think you need a bit more 
explanation of why you need those changes for Xen compare to the Linux 
driver.

> +
>  	/*
>  	 * Find an unused context.
>  	 */
> @@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>
>  	/* TTBR0 */
>  	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> +
> +	/* Xen: */
> +	dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd 0x%"PRIx64")\n",
> +			domain->d->domain_id, domain->context_id, ttbr);

If you want to keep driver close to Linux, then you need to avoid 
unecessary change.

> +
>  	ipmmu_ctx_write(domain, IMTTLBR0, ttbr);
>  	ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
>
> @@ -616,8 +850,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	 * translation table format doesn't use TEX remapping. Don't enable AF
>  	 * software management as we have no use for it. Flush the TLB as
>  	 * required when modifying the context registers.
> +	 * Xen: Enable the context for the root IPMMU only.
>  	 */
> -	ipmmu_ctx_write2(domain, IMCTR,
> +	ipmmu_ctx_write(domain, IMCTR,
>  			 IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
>
>  	return 0;
> @@ -638,13 +873,18 @@ static void ipmmu_domain_free_context(struct ipmmu_vmsa_device *mmu,
>
>  static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>  {
> +	/* Xen: Just return if context_id has non-existent value */
> +	if (domain->context_id >= domain->root->num_ctx)
> +		return;
> +
>  	/*
>  	 * Disable the context. Flush the TLB as required when modifying the
>  	 * context registers.
>  	 *
>  	 * TODO: Is TLB flush really needed ?
> +	 * Xen: Disable the context for the root IPMMU only.
>  	 */
> -	ipmmu_ctx_write2(domain, IMCTR, IMCTR_FLUSH);
> +	ipmmu_ctx_write(domain, IMCTR, IMCTR_FLUSH);
>  	ipmmu_tlb_sync(domain);
>
>  #ifdef CONFIG_RCAR_DDR_BACKUP
> @@ -652,12 +892,16 @@ static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>  #endif
>
>  	ipmmu_domain_free_context(domain->root, domain->context_id);
> +
> +	/* Xen: Initialize context_id with non-existent value */
> +	domain->context_id = domain->root->num_ctx;
>  }
>
>  /* -----------------------------------------------------------------------------
>   * Fault Handling
>   */
>
> +/* Xen: Show domain_id in every printk */
>  static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
>  {
>  	const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF | IMSTR_TF;
> @@ -681,11 +925,11 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
>
>  	/* Log fatal errors. */
>  	if (status & IMSTR_MHIT)
> -		dev_err_ratelimited(mmu->dev, "Multiple TLB hits @0x%08x\n",
> -				    iova);
> +		dev_err_ratelimited(mmu->dev, "d%d: Multiple TLB hits @0x%08x\n",
> +				domain->d->domain_id, iova);
>  	if (status & IMSTR_ABORT)
> -		dev_err_ratelimited(mmu->dev, "Page Table Walk Abort @0x%08x\n",
> -				    iova);
> +		dev_err_ratelimited(mmu->dev, "d%d: Page Table Walk Abort @0x%08x\n",
> +				domain->d->domain_id, iova);
>
>  	if (!(status & (IMSTR_PF | IMSTR_TF)))
>  		return IRQ_NONE;
> @@ -700,8 +944,8 @@ static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
>  		return IRQ_HANDLED;
>
>  	dev_err_ratelimited(mmu->dev,
> -			    "Unhandled fault: status 0x%08x iova 0x%08x\n",
> -			    status, iova);
> +			"d%d: Unhandled fault: status 0x%08x iova 0x%08x\n",
> +			domain->d->domain_id, status, iova);
>
>  	return IRQ_HANDLED;
>  }
> @@ -730,6 +974,16 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
>  	return status;
>  }
>
> +/* Xen: Interrupt handlers wrapper */
> +static void ipmmu_irq_xen(int irq, void *dev,
> +				      struct cpu_user_regs *regs)
> +{
> +	ipmmu_irq(irq, dev);
> +}
> +
> +#define ipmmu_irq ipmmu_irq_xen
> +
> +#if 0 /* Xen: Not needed */
>  /* -----------------------------------------------------------------------------
>   * IOMMU Operations
>   */
> @@ -759,6 +1013,7 @@ static void ipmmu_domain_free(struct iommu_domain *io_domain)
>  	free_io_pgtable_ops(domain->iop);
>  	kfree(domain);
>  }
> +#endif
>
>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>  			       struct device *dev)
> @@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain *io_domain,
>  		/* The domain hasn't been used yet, initialize it. */
>  		domain->mmu = mmu;
>  		domain->root = root;
> +
> +/*
> + * Xen: We have already initialized and enabled context for root IPMMU
> + * for this Xen domain. Enable context for given cache IPMMU only.
> + * Flush the TLB as required when modifying the context registers.

Why?

> + */
> +#if 0
>  		ret = ipmmu_domain_init_context(domain);
> +#endif
> +		ipmmu_ctx_write1(domain, IMCTR,
> +				ipmmu_ctx_read(domain, IMCTR) | IMCTR_FLUSH);
> +
> +		dev_info(dev, "Using IPMMU context %u\n", domain->context_id);
> +#if 0 /* Xen: Not needed */
>  		if (ret < 0) {
>  			dev_err(dev, "Unable to initialize IPMMU context\n");
>  			domain->mmu = NULL;
> @@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain *io_domain,
>  			dev_info(dev, "Using IPMMU context %u\n",
>  				 domain->context_id);
>  		}
> +#endif
>  	} else if (domain->mmu != mmu) {
>  		/*
>  		 * Something is wrong, we can't attach two devices using
> @@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct iommu_domain *io_domain,
>  	 */
>  }
>
> +/*
> + * Xen: The current implementation of these callbacks is insufficient for us
> + * since they are intended to be called from Linux IOMMU core that
> + * has already done all required actions such as doing various checks,
> + * splitting into memory block the hardware supports and so on.

Can you expand it here? Why can't our IOMMU framework could do that?

IHMO, if we want to get driver from Linux, we need to get an interface 
very close to it. Otherwise it is not worth it because you would have to 
implement for each IOMMU.

My overall feeling at the moment is Xen is not ready to welcome this 
driver directly from Linux. This is also a BSP driver, so no thorough 
review done by the community.

I have been told the BSP driver was in pretty bad state, so I think we 
really need to weight pros and cons of using it.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously
  2017-07-26 15:10 ` [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously Oleksandr Tyshchenko
@ 2017-08-08 11:36   ` Julien Grall
  0 siblings, 0 replies; 20+ messages in thread
From: Julien Grall @ 2017-08-08 11:36 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel; +Cc: Oleksandr Tyshchenko, Stefano Stabellini



On 26/07/17 16:10, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This is the PoC how to optimize page table deallocation sequence
> by splitting it into separate chunks.
> Use iommu_pt_cleanup_list to queue pages that need to be handled and
> freed next time. Use free_page_table platform callback to dequeue
> pages.

The page allocation/deallocation definitely need to be split in chunk 
and allow voluntary preemption. Otherwise you may end up hit the RCU 
sched on the toolstack domain.

>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> ---
>  xen/drivers/passthrough/arm/io-pgtable-arm.c | 94 +++++++++++++++++++++++++---
>  xen/drivers/passthrough/arm/io-pgtable.c     |  5 +-
>  xen/drivers/passthrough/arm/io-pgtable.h     |  4 +-
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c     | 33 ++++++++--
>  4 files changed, 119 insertions(+), 17 deletions(-)
>
> diff --git a/xen/drivers/passthrough/arm/io-pgtable-arm.c b/xen/drivers/passthrough/arm/io-pgtable-arm.c
> index c98caa3..7673fda 100644
> --- a/xen/drivers/passthrough/arm/io-pgtable-arm.c
> +++ b/xen/drivers/passthrough/arm/io-pgtable-arm.c
> @@ -254,6 +254,10 @@ struct arm_lpae_io_pgtable {
>
>  	/* Xen: We deal with domain pages. */
>  	struct page_info	*pgd;
> +	/* Xen: To indicate that deallocation sequence is in progress. */
> +	bool_t				cleanup;
> +	/* Xen: To count allocated domain pages. */
> +	unsigned int		page_count;
>  };
>
>  typedef u64 arm_lpae_iopte;
> @@ -329,7 +333,7 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>  #endif
>
>  static struct page_info *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> -				    struct io_pgtable_cfg *cfg)
> +				    struct arm_lpae_io_pgtable *data)
>  {
>  	struct page_info *pages;
>  	unsigned int order = get_order_from_bytes(size);
> @@ -342,15 +346,21 @@ static struct page_info *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>  	for (i = 0; i < (1 << order); i ++)
>  		clear_and_clean_page(pages + i);
>
> +	data->page_count += (1<<order);
> +
>  	return pages;
>  }
>
>  static void __arm_lpae_free_pages(struct page_info *pages, size_t size,
> -				  struct io_pgtable_cfg *cfg)
> +				  struct arm_lpae_io_pgtable *data)
>  {
>  	unsigned int order = get_order_from_bytes(size);
>
> +	BUG_ON((int)data->page_count <= 0);
> +
>  	free_domheap_pages(pages, order);
> +
> +	data->page_count -= (1<<order);
>  }
>
>  static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
> @@ -434,7 +444,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>  	pte = *ptep;
>  	if (!pte) {
>  		page = __arm_lpae_alloc_pages(ARM_LPAE_GRANULE(data),
> -					       GFP_ATOMIC, cfg);
> +					       GFP_ATOMIC, data);
>  		if (!page)
>  			return -ENOMEM;
>
> @@ -526,6 +536,46 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	return ret;
>  }
>
> +static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
> +				    struct page_info *page);
> +
> +/*
> + * TODO: We have reused unused at the moment "page->pad" variable for
> + * storing "data" pointer we need during deallocation sequence. The current
> + * free_page_table platform callback carries the only one "page" argument.
> + * To perform required calculations with the current (generic) allocator
> + * implementation we are highly interested in the following fields:
> + * - data->levels
> + * - data->pg_shift
> + * - data->pgd_size
> + * But, this necessity might be avoided if we integrate allocator code with
> + * IPMMU-VMSA driver. And these variables will turn into the
> + * corresponding #define-s.
> + */
> +static void __arm_lpae_free_next_pgtable(struct arm_lpae_io_pgtable *data,
> +				    int lvl, struct page_info *page)
> +{
> +	if (!data->cleanup) {
> +		/*
> +		 * We are here during normal page table maintenance. Just call
> +		 * __arm_lpae_free_pgtable(), what we actually had to call.
> +		 */
> +		__arm_lpae_free_pgtable(data, lvl, page);
> +	} else {
> +		/*
> +		 * The page table deallocation sequence is in progress. Use some fields
> +		 * in struct page_info to pass arguments we will need during handling
> +		 * this page back. Queue page to list.
> +		 */
> +		PFN_ORDER(page) = lvl;
> +		page->pad = (u64)&data->iop.ops;
> +
> +		spin_lock(&iommu_pt_cleanup_lock);
> +		page_list_add_tail(page, &iommu_pt_cleanup_list);
> +		spin_unlock(&iommu_pt_cleanup_lock);
> +	}
> +}
> +
>  /* Xen: We deal with domain pages. */
>  static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>  				    struct page_info *page)
> @@ -553,19 +603,41 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>  		if (!pte || iopte_leaf(pte, lvl))
>  			continue;
>
> -		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
> +		__arm_lpae_free_next_pgtable(data, lvl + 1, iopte_deref(pte, data));
>  	}
>
>  	unmap_domain_page(start);
> -	__arm_lpae_free_pages(page, table_size, &data->iop.cfg);
> +	__arm_lpae_free_pages(page, table_size, data);
>  }
>
> -static void arm_lpae_free_pgtable(struct io_pgtable *iop)
> +/*
> + * We added extra "page" argument since we want to know what page is processed
> + * at the moment and should be freed.
> + * */
> +static void arm_lpae_free_pgtable(struct io_pgtable *iop, struct page_info *page)
>  {
>  	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
> +	int lvl;
>
> -	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
> -	kfree(data);
> +	if (!data->cleanup) {
> +		/* Start page table deallocation sequence from the first level. */
> +		data->cleanup = true;
> +		lvl = ARM_LPAE_START_LVL(data);
> +	} else {
> +		/* Retrieve the level to continue deallocation sequence from. */
> +		lvl = PFN_ORDER(page);
> +		PFN_ORDER(page) = 0;
> +		page->pad = 0;
> +	}
> +
> +	__arm_lpae_free_pgtable(data, lvl, page);
> +
> +	/*
> +	 * Seems, we have already deallocated all pages, so it is time
> +	 * to release unfreed resource.
> +	 */
> +	if (!data->page_count)
> +		kfree(data);
>  }
>
>  /* Xen: We deal with domain pages. */
> @@ -889,8 +961,12 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	cfg->arm_lpae_s1_cfg.mair[0] = reg;
>  	cfg->arm_lpae_s1_cfg.mair[1] = 0;
>
> +	/* Just to be sure */
> +	data->cleanup = false;
> +	data->page_count = 0;
> +
>  	/* Looking good; allocate a pgd */
> -	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> +	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, data);
>  	if (!data->pgd)
>  		goto out_free_data;
>
> diff --git a/xen/drivers/passthrough/arm/io-pgtable.c b/xen/drivers/passthrough/arm/io-pgtable.c
> index bfc7020..e25d731 100644
> --- a/xen/drivers/passthrough/arm/io-pgtable.c
> +++ b/xen/drivers/passthrough/arm/io-pgtable.c
> @@ -77,7 +77,7 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
>   * It is the IOMMU driver's responsibility to ensure that the page table
>   * is no longer accessible to the walker by this point.
>   */
> -void free_io_pgtable_ops(struct io_pgtable_ops *ops)
> +void free_io_pgtable_ops(struct io_pgtable_ops *ops, struct page_info *page)
>  {
>  	struct io_pgtable *iop;
>
> @@ -86,5 +86,6 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
>
>  	iop = container_of(ops, struct io_pgtable, ops);
>  	io_pgtable_tlb_flush_all(iop);
> -	io_pgtable_init_table[iop->fmt]->free(iop);
> +	iop->cookie = NULL;
> +	io_pgtable_init_table[iop->fmt]->free(iop, page);
>  }
> diff --git a/xen/drivers/passthrough/arm/io-pgtable.h b/xen/drivers/passthrough/arm/io-pgtable.h
> index fb81fcf..df0e21b 100644
> --- a/xen/drivers/passthrough/arm/io-pgtable.h
> +++ b/xen/drivers/passthrough/arm/io-pgtable.h
> @@ -144,7 +144,7 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
>   *
>   * @ops: The ops returned from alloc_io_pgtable_ops.
>   */
> -void free_io_pgtable_ops(struct io_pgtable_ops *ops);
> +void free_io_pgtable_ops(struct io_pgtable_ops *ops, struct page_info *page);
>
>
>  /*
> @@ -201,7 +201,7 @@ static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
>   */
>  struct io_pgtable_init_fns {
>  	struct io_pgtable *(*alloc)(struct io_pgtable_cfg *cfg, void *cookie);
> -	void (*free)(struct io_pgtable *iop);
> +	void (*free)(struct io_pgtable *iop, struct page_info *page);
>  };
>
>  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index e54b507..2a04800 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -708,8 +708,8 @@ static void ipmmu_tlb_flush_all(void *cookie)
>  {
>  	struct ipmmu_vmsa_domain *domain = cookie;
>
> -	/* Xen: Just return if context_id has non-existent value */
> -	if (domain->context_id >= domain->root->num_ctx)
> +	/* Xen: Just return if context is absent or context_id has non-existent value */
> +	if (!domain || domain->context_id >= domain->root->num_ctx)
>  		return;
>
>  	ipmmu_tlb_invalidate(domain);
> @@ -796,7 +796,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  	 */
>  	ret = ipmmu_domain_allocate_context(domain->root, domain);
>  	if (ret < 0) {
> -		free_io_pgtable_ops(domain->iop);
> +		/* Pass root page table for this domain as an argument. */
> +		free_io_pgtable_ops(domain->iop,
> +				maddr_to_page(domain->cfg.arm_lpae_s1_cfg.ttbr[0]));
>  		return ret;
>  	}
>
> @@ -2193,7 +2195,12 @@ static void ipmmu_vmsa_destroy_domain(struct iommu_domain *io_domain)
>  		 * been detached.
>  		 */
>  		ipmmu_domain_destroy_context(domain);
> -		free_io_pgtable_ops(domain->iop);
> +		/*
> +		 * Pass root page table for this domain as an argument.
> +		 * This call will lead to start deallocation sequence.
> +		 */
> +		free_io_pgtable_ops(domain->iop,
> +				maddr_to_page(domain->cfg.arm_lpae_s1_cfg.ttbr[0]));
>  	}
>
>  	kfree(domain);
> @@ -2383,6 +2390,17 @@ static int ipmmu_vmsa_domain_init(struct domain *d, bool use_iommu)
>  	return 0;
>  }
>
> +/*
> + * Seems, there is one more page we need to process. So, retrieve
> + * the pointer and go on deallocation sequence.
> + */
> +static void ipmmu_vmsa_free_page_table(struct page_info *page)
> +{
> +	struct io_pgtable_ops *ops = (struct io_pgtable_ops *)page->pad;
> +
> +	free_io_pgtable_ops(ops, page);
> +}
> +
>  static void __hwdom_init ipmmu_vmsa_hwdom_init(struct domain *d)
>  {
>  }
> @@ -2404,6 +2422,12 @@ static void ipmmu_vmsa_domain_teardown(struct domain *d)
>  	ASSERT(list_empty(&xen_domain->contexts));
>  	xfree(xen_domain);
>  	dom_iommu(d)->arch.priv = NULL;
> +	/*
> +	 * After this point we have all domain resources deallocated, except
> +	 * page table which we will deallocate asynchronously. The IOMMU code
> +	 * provides us with iommu_pt_cleanup_list and free_page_table platform
> +	 * callback what we actually going to use.
> +	 */
>  }
>
>  static int __must_check ipmmu_vmsa_map_pages(struct domain *d,
> @@ -2462,6 +2486,7 @@ static void ipmmu_vmsa_dump_p2m_table(struct domain *d)
>  static const struct iommu_ops ipmmu_vmsa_iommu_ops = {
>  	.init = ipmmu_vmsa_domain_init,
>  	.hwdom_init = ipmmu_vmsa_hwdom_init,
> +	.free_page_table = ipmmu_vmsa_free_page_table,
>  	.teardown = ipmmu_vmsa_domain_teardown,
>  	.iotlb_flush = ipmmu_vmsa_iotlb_flush,
>  	.assign_device = ipmmu_vmsa_assign_dev,
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM
  2017-08-08 11:21     ` Julien Grall
@ 2017-08-08 16:52       ` Stefano Stabellini
  0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2017-08-08 16:52 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, geert+renesas, Will Deacon, joro,
	damm+renesas, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	laurent.pinchart+renesas, Goel, Sameer, xen-devel, Robin Murphy,
	Artem Mygaiev

On Tue, 8 Aug 2017, Julien Grall wrote:
> On 01/08/17 18:13, Oleksandr Tyshchenko wrote:
> > Hi, Julien
> > 
> > On Tue, Aug 1, 2017 at 3:27 PM, Julien Grall <julien.grall@arm.com> wrote:
> > > On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
> > > > 
> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > 
> > > > Hi, all.
> > > 
> > > 
> > > Hi,
> > > 
> > > Please CC maintainers and any relevant person on the cover letter. This is
> > > quite useful to have in the inbox.
> > Yes. I CCed guys who, I think, are/were involved in IPMMU-VMSA
> > development from Linux side +
> > IOMMU maintainers (mostly ARM). Sorry, if I missed someone or mistakenly
> > added.
> > 
> > > 
> > > > The purpose of this patch series is to add IPMMU-VMSA support to Xen on
> > > > ARM.
> > > > It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car
> > > > Gen3 SoCs (ARM).
> > > > And this IOMMU can't share the page table with the CPU since it doesn't
> > > > use the same page-table format
> > > > as the CPU on ARM therefore I name it "Non-shared" IOMMU.
> > > > This all means that current patch series must be based on "Non-shared"
> > > > IOMMU support [1]
> > > > for the IPMMU-VMSA to be functional inside Xen.
> > > > 
> > > > The IPMMU-VMSA driver as well as the ARM LPAE allocator were directly
> > > > ported from BSP for Linux the vendor provides:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git
> > > > rcar-3.5.3
> > > 
> > > 
> > > I think this is probably a good starting point to discuss about IOMMU
> > > support in Xen. I skimmed through the patches and saw the words "rfc" and
> > > "ported from BSP".
> > Well, at the time of porting IPMMU-VMSA driver, BSP [1] had more
> > complete support than mainline [2]
> > and seems to have at the moment.
> > For example, mainline driver still has single IPMMU context while BSP
> > driver can have up to 8 contexts,
> > the maximum uTLBs mainline driver can handle is 32, but for BSP driver
> > this value was increased to 48, etc.
> > But, I see attempts to get all required support in [3]. So, when this
> > support reaches upsteam, I hope,
> > it won't be a big problem to rebase on mainline driver if we decide to
> > align with it.
> 
> My main concern here is this driver haven't had a thorough review by the Linux
> community.
> 
> When we ported the SMMUv{1,2} driver we knew the Linux community was happy
> with it and hence adapting for Xen was not a big deal. There are only few
> limited changes in the code from Linux.
> 
> Looking at patch #2, #6, #7, the changes don't seem limited in the code from
> Linux + it is a driver from a BSP. The code really needs to be very close to
> make the port from Linux really worth it.
> 
> Stefano do you have any opinion?

Yes, the fact that a driver is already in Linux gives us a guarantee
that it had been reviewed before. This driver doesn't come with that
guarantee, that means that one of us would have to review it. Otherwise,
we could take the contribution and mark it as "unsupported/experimental"
like we do for many new features until we think it is stable enough.

Regardless, I think you are right that there is no gain in trying to
make this patch series "compatible" with the original Linux driver,
because the driver isn't in Linux. We might as well take a clean
contribution to Xen without any #if 0.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-08 11:34   ` Julien Grall
@ 2017-08-10 14:27     ` Oleksandr Tyshchenko
  2017-08-10 15:13       ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-08-10 14:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Stefano Stabellini, Oleksandr Tyshchenko

Hi, Julien

On Tue, Aug 8, 2017 at 2:34 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi,
>
> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Modify the Linux IPMMU driver to be functional inside Xen.
>> All devices within a single Xen domain must use the same
>> IOMMU context no matter what IOMMU domains they are attached to.
>> This is the main difference between drivers in Linux
>> and Xen. Having 8 separate contexts allow us to passthrough
>> devices to 8 guest domain at the same time.
>>
>> Also wrap following code in #if 0:
>> - All DMA related stuff
>> - Linux PM callbacks
>> - Driver remove callback
>> - iommu_group management
>>
>> Maybe, it would be more correct to move different Linux2Xen wrappers,
>> define-s, helpers from IPMMU-VMSA and SMMU to some common file
>> before introducing IPMMU-VMSA patch series. And this common file
>> might be reused by possible future IOMMUs on ARM.
>
>
> Yes please if we go forward with the Linux way.
OK. I will keep it in mind.

>
>
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> CC: Julien Grall <julien.grall@arm.com>
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>>  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 984
>> +++++++++++++++++++++++++++++--
>>  1 file changed, 948 insertions(+), 36 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>> b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>> index 2b380ff..e54b507 100644
>> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
>> @@ -6,31 +6,212 @@
>>   * This program is free software; you can redistribute it and/or modify
>>   * it under the terms of the GNU General Public License as published by
>>   * the Free Software Foundation; version 2 of the License.
>> + *
>> + * Based on Linux drivers/iommu/ipmmu-vmsa.c
>> + * => commit f4747eba89c9b5d90fdf0a5458866283c47395d8
>> + * (iommu/ipmmu-vmsa: Restrict IOMMU Domain Geometry to 32-bit address
>> space)
>> + *
>> + * Xen modification:
>> + * Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>
>> + * Copyright (C) 2016-2017 EPAM Systems Inc.
>>   */
>>
>> -#include <linux/bitmap.h>
>> -#include <linux/delay.h>
>> -#include <linux/dma-iommu.h>
>> -#include <linux/dma-mapping.h>
>> -#include <linux/err.h>
>> -#include <linux/export.h>
>> -#include <linux/interrupt.h>
>> -#include <linux/io.h>
>> -#include <linux/iommu.h>
>> -#include <linux/module.h>
>> -#include <linux/of.h>
>> -#include <linux/of_iommu.h>
>> -#include <linux/platform_device.h>
>> -#include <linux/sizes.h>
>> -#include <linux/slab.h>
>> -
>> -#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
>> -#include <asm/dma-iommu.h>
>> -#include <asm/pgalloc.h>
>> -#endif
>> +#include <xen/config.h>
>> +#include <xen/delay.h>
>> +#include <xen/errno.h>
>> +#include <xen/err.h>
>> +#include <xen/irq.h>
>> +#include <xen/lib.h>
>> +#include <xen/list.h>
>> +#include <xen/mm.h>
>> +#include <xen/vmap.h>
>> +#include <xen/rbtree.h>
>> +#include <xen/sched.h>
>> +#include <xen/sizes.h>
>> +#include <asm/atomic.h>
>> +#include <asm/device.h>
>> +#include <asm/io.h>
>> +#include <asm/platform.h>
>>
>>  #include "io-pgtable.h"
>>
>> +/* TODO:
>> + * 1. Optimize xen_domain->lock usage.
>> + * 2. Show domain_id in every printk which is per Xen domain.
>> + *
>> + */
>> +
>> +/***** Start of Xen specific code *****/
>> +
>> +#define IOMMU_READ     (1 << 0)
>> +#define IOMMU_WRITE    (1 << 1)
>> +#define IOMMU_CACHE    (1 << 2) /* DMA cache coherency */
>> +#define IOMMU_NOEXEC   (1 << 3)
>> +#define IOMMU_MMIO     (1 << 4) /* e.g. things like MSI doorbells */
>> +
>> +#define __fls(x) (fls(x) - 1)
>> +#define __ffs(x) (ffs(x) - 1)
>> +
>> +#define IO_PGTABLE_QUIRK_ARM_NS                BIT(0)
>> +
>> +#define ioread32 readl
>> +#define iowrite32 writel
>> +
>> +#define dev_info dev_notice
>> +
>> +#define devm_request_irq(unused, irq, func, flags, name, dev) \
>> +       request_irq(irq, flags, func, name, dev)
>> +
>> +/* Alias to Xen device tree helpers */
>> +#define device_node dt_device_node
>> +#define of_phandle_args dt_phandle_args
>> +#define of_device_id dt_device_match
>> +#define of_match_node dt_match_node
>> +#define of_parse_phandle_with_args dt_parse_phandle_with_args
>> +#define of_find_property dt_find_property
>> +#define of_count_phandle_with_args dt_count_phandle_with_args
>> +
>> +/* Xen: Helpers to get device MMIO and IRQs */
>> +struct resource
>> +{
>> +       u64 addr;
>> +       u64 size;
>> +       unsigned int type;
>> +};
>> +
>> +#define resource_size(res) (res)->size;
>> +
>> +#define platform_device dt_device_node
>> +
>> +#define IORESOURCE_MEM 0
>> +#define IORESOURCE_IRQ 1
>> +
>> +static struct resource *platform_get_resource(struct platform_device
>> *pdev,
>> +                                             unsigned int type,
>> +                                             unsigned int num)
>> +{
>> +       /*
>> +        * The resource is only used between 2 calls of
>> platform_get_resource.
>> +        * It's quite ugly but it's avoid to add too much code in the part
>> +        * imported from Linux
>> +        */
>> +       static struct resource res;
>> +       int ret = 0;
>> +
>> +       res.type = type;
>> +
>> +       switch (type) {
>> +       case IORESOURCE_MEM:
>> +               ret = dt_device_get_address(pdev, num, &res.addr,
>> &res.size);
>> +
>> +               return ((ret) ? NULL : &res);
>> +
>> +       case IORESOURCE_IRQ:
>> +               ret = platform_get_irq(pdev, num);
>> +               if (ret < 0)
>> +                       return NULL;
>> +
>> +               res.addr = ret;
>> +               res.size = 1;
>> +
>> +               return &res;
>> +
>> +       default:
>> +               return NULL;
>> +       }
>> +}
>> +
>> +enum irqreturn {
>> +       IRQ_NONE        = (0 << 0),
>> +       IRQ_HANDLED     = (1 << 0),
>> +};
>> +
>> +typedef enum irqreturn irqreturn_t;
>> +
>> +/* Device logger functions */
>> +#define dev_print(dev, lvl, fmt, ...)
>> \
>> +        printk(lvl "ipmmu: %s: " fmt, dt_node_full_name(dev_to_dt(dev)),
>> ## __VA_ARGS__)
>> +
>> +#define dev_dbg(dev, fmt, ...) dev_print(dev, XENLOG_DEBUG, fmt, ##
>> __VA_ARGS__)
>> +#define dev_notice(dev, fmt, ...) dev_print(dev, XENLOG_INFO, fmt, ##
>> __VA_ARGS__)
>> +#define dev_warn(dev, fmt, ...) dev_print(dev, XENLOG_WARNING, fmt, ##
>> __VA_ARGS__)
>> +#define dev_err(dev, fmt, ...) dev_print(dev, XENLOG_ERR, fmt, ##
>> __VA_ARGS__)
>> +
>> +#define dev_err_ratelimited(dev, fmt, ...)
>> \
>> +        dev_print(dev, XENLOG_ERR, fmt, ## __VA_ARGS__)
>> +
>> +#define dev_name(dev) dt_node_full_name(dev_to_dt(dev))
>> +
>> +/* Alias to Xen allocation helpers */
>> +#define kfree xfree
>> +#define kmalloc(size, flags)           _xmalloc(size, sizeof(void *))
>> +#define kzalloc(size, flags)           _xzalloc(size, sizeof(void *))
>> +#define devm_kzalloc(dev, size, flags) _xzalloc(size, sizeof(void *))
>> +#define kmalloc_array(size, n, flags)  _xmalloc_array(size, sizeof(void
>> *), n)
>> +#define kcalloc(size, n, flags)                _xzalloc_array(size,
>> sizeof(void *), n)
>> +
>> +static void __iomem *devm_ioremap_resource(struct device *dev,
>> +                                          struct resource *res)
>> +{
>> +       void __iomem *ptr;
>> +
>> +       if (!res || res->type != IORESOURCE_MEM) {
>> +               dev_err(dev, "Invalid resource\n");
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       ptr = ioremap_nocache(res->addr, res->size);
>> +       if (!ptr) {
>> +               dev_err(dev,
>> +                       "ioremap failed (addr 0x%"PRIx64" size
>> 0x%"PRIx64")\n",
>> +                       res->addr, res->size);
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +
>> +       return ptr;
>> +}
>> +
>> +/* Xen doesn't handle IOMMU fault */
>> +#define report_iommu_fault(...)        1
>> +
>> +#define MODULE_DEVICE_TABLE(type, name)
>> +#define module_param_named(name, value, type, perm)
>> +#define MODULE_PARM_DESC(_parm, desc)
>> +
>> +/* Xen: Dummy iommu_domain */
>> +struct iommu_domain
>> +{
>> +       atomic_t ref;
>> +       /* Used to link iommu_domain contexts for a same domain.
>> +        * There is at least one per-IPMMU to used by the domain.
>> +        * */
>> +       struct list_head                list;
>> +};
>> +
>> +/* Xen: Describes informations required for a Xen domain */
>> +struct ipmmu_vmsa_xen_domain {
>> +       spinlock_t                      lock;
>> +       /* List of context (i.e iommu_domain) associated to this domain */
>> +       struct list_head                contexts;
>> +       struct iommu_domain             *base_context;
>> +};
>> +
>> +/*
>> + * Xen: Information about each device stored in dev->archdata.iommu
>> + *
>> + * On Linux the dev->archdata.iommu only stores the arch specific
>> information,
>> + * but, on Xen, we also have to store the iommu domain.
>> + */
>> +struct ipmmu_vmsa_xen_device {
>> +       struct iommu_domain *domain;
>> +       struct ipmmu_vmsa_archdata *archdata;
>> +};
>> +
>> +#define dev_iommu(dev) ((struct ipmmu_vmsa_xen_device
>> *)dev->archdata.iommu)
>> +#define dev_iommu_domain(dev) (dev_iommu(dev)->domain)
>> +
>> +/***** Start of Linux IPMMU code *****/
>> +
>>  #define IPMMU_CTX_MAX 8
>>
>>  struct ipmmu_features {
>> @@ -64,7 +245,9 @@ struct ipmmu_vmsa_device {
>>         struct hw_register *reg_backup[IPMMU_CTX_MAX];
>>  #endif
>>
>> +#if 0 /* Xen: Not needed */
>>         struct dma_iommu_mapping *mapping;
>> +#endif
>>  };
>>
>>  struct ipmmu_vmsa_domain {
>> @@ -77,6 +260,9 @@ struct ipmmu_vmsa_domain {
>>
>>         unsigned int context_id;
>>         spinlock_t lock;                        /* Protects mappings */
>> +
>> +       /* Xen: Domain associated to this configuration */
>> +       struct domain *d;
>>  };
>>
>>  struct ipmmu_vmsa_archdata {
>> @@ -94,14 +280,20 @@ struct ipmmu_vmsa_archdata {
>>  static DEFINE_SPINLOCK(ipmmu_devices_lock);
>>  static LIST_HEAD(ipmmu_devices);
>>
>> +#if 0 /* Xen: Not needed */
>>  static DEFINE_SPINLOCK(ipmmu_slave_devices_lock);
>>  static LIST_HEAD(ipmmu_slave_devices);
>> +#endif
>>
>>  static struct ipmmu_vmsa_domain *to_vmsa_domain(struct iommu_domain *dom)
>>  {
>>         return container_of(dom, struct ipmmu_vmsa_domain, io_domain);
>>  }
>>
>> +/*
>> + * Xen: Rewrite Linux helpers to manipulate with archdata on Xen.
>> + */
>> +#if 0
>>  #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>  static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
>>  {
>> @@ -120,6 +312,16 @@ static void set_archdata(struct device *dev, struct
>> ipmmu_vmsa_archdata *p)
>>  {
>>  }
>>  #endif
>> +#else
>> +static struct ipmmu_vmsa_archdata *to_archdata(struct device *dev)
>> +{
>> +       return dev_iommu(dev)->archdata;
>> +}
>> +static void set_archdata(struct device *dev, struct ipmmu_vmsa_archdata
>> *p)
>> +{
>> +       dev_iommu(dev)->archdata = p;
>> +}
>> +#endif
>>
>>  #define TLB_LOOP_TIMEOUT               100     /* 100us */
>>
>> @@ -355,6 +557,10 @@ static struct hw_register
>> *root_pgtable[IPMMU_CTX_MAX] = {
>>
>>  static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
>>  {
>> +       /* Xen: Fix */
>
>
> Hmmm. Can we get a bit more details?

These is a case when ipmmu_is_root is called with "mmu" being NULL.
https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2330

In ipmmu_vmsa_alloc_page_table() we need to find "root mmu", but we
doesn't have argument to pass.
So, I had two options:

1. Add code searching for it.
...
spin_lock(&ipmmu_devices_lock);
list_for_each_entry(mmu, &ipmmu_devices, list) {
   if (ipmmu_is_root(mmu))
      break;
}
spin_unlock(&ipmmu_devices_lock);
...

2. Use existing ipmmu_find_root() with adding this check for a valid value.
So, if we call ipmmu_find_root() with argument being NULL we will
actually get searching the list.

I decided to use 2 option.

>
>> +       if (!mmu)
>> +               return false;
>> +
>>         if (mmu->features->has_cache_leaf_nodes)
>>                 return mmu->is_leaf ? false : true;
>>         else
>> @@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct ipmmu_vmsa_domain
>> *domain, unsigned int reg,
>>         ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg,
>> data);
>>  }
>>
>> -static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>> int reg,
>> +/* Xen: Write the context for cache IPMMU only. */
>
>
> Same here. Why does it need to be different with Xen?

Well, let me elaborate a bit more about this.

I feel that I need to explain in a few words about IPMMU itself:
Generally speaking,
The IPMMU hardware (R-Car Gen3) has 8 context banks and consists of next parts:
- root IPMMU
- a number of cache IPMMUs

Each cache IPMMU is connected to root IPMMU and has uTLB ports the
master devices can be tied to.
Something, like this:

master device1 ---> cache IPMMU1 [8 ctx] ---> root IPMMU [8 ctx] -> memory
                           |                                          |
master device2 --                                          |
                                                                      |
master device3 ---> cache IPMMU2 [8 ctx] --

Each context bank has registers.
Some registers exist for both root IPMMU and cache IPMMUs -> IMCTR
Some registers exist only for root IPMMU -> IMTTLBRx/IMTTUBRx, IMMAIR0, etc

So, original driver has two helpers:
1. ipmmu_ctx_write() - is intended to write a register in context bank
N* for root IPMMU only.
2. ipmmu_ctx_write2() - is intended to write a register in context
bank N for both root IPMMU and cache IPMMU.
*where N=0-7

AFAIU, original Linux driver provides each IOMMU domain with a
separate IPMMU context:
master device1 + master device2 are in IOMMU domain1 and use IPMMU context 0
master device3 is in IOMMU domain2 and uses IPMMU context 1

So, when attaching device to new IOMMU domain in Linux we have to
initialize context for root IPMMU and enable context (IMCTR register)
for both root and cache IPMMUs.
https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620

In Xen we need additional helper "ipmmu_ctx_write1" for writing a
register in context bank N for cache IPMMU only.
The reason is that we need a way to control cache IPMMU separately
since we have a little bit another model.

All IOMMU domains within a single Xen domain (dom_iommu(d)->arch.priv)
use the same IPMMU context N
which was initialized and enabled at the domain creation time. This
means that all master devices
that are assigned to the guest domain "d" use only this IPMMU context
N which actually contains P2M mapping for domain "d":
master device1 + master device2 are in IOMMU domain1 and use IPMMU context 0
master device3 is in IOMMU domain2 and also uses IPMMU context 0

So, when attaching device to new IOMMU domain in Xen we don't have to
initialize and enable context,
because it has been already done at domain initialization time:
https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
we just have to enable context for corresponding cache IPMMU only:
https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L1083

This is the main difference between drivers in Linux and Xen.

So, as you can see there is a need to manipulate context registers for
cache IPMMU without touching root IPMMU,
that's why I added this helper.

Does this make sense?

>
>
>> +static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned
>> int reg,
>>                              u32 data)
>>  {
>>         if (domain->mmu != domain->root)
>> -               ipmmu_write(domain->mmu,
>> -                           domain->context_id * IM_CTX_SIZE + reg, data);
>> +               ipmmu_write(domain->mmu, domain->context_id * IM_CTX_SIZE
>> + reg, data);
>> +}
>>
>> -       ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg,
>> data);
>> +/*
>> + * Xen: Write the context for both root IPMMU and all cache IPMMUs
>> + * that assigned to this Xen domain.
>> + */
>> +static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>> int reg,
>> +                            u32 data)
>> +{
>> +       struct ipmmu_vmsa_xen_domain *xen_domain =
>> dom_iommu(domain->d)->arch.priv;
>> +       struct iommu_domain *io_domain;
>> +
>> +       list_for_each_entry(io_domain, &xen_domain->contexts, list)
>> +               ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
>> +
>> +       ipmmu_ctx_write(domain, reg, data);
>>  }
>>
>>  /*
>> -----------------------------------------------------------------------------
>> @@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
>>  {
>>         struct ipmmu_vmsa_domain *domain = cookie;
>>
>> +       /* Xen: Just return if context_id has non-existent value */
>
>
> Same here.

I think that there is a possible race.
In ipmmu_domain_init_context() we are trying to allocate context and
if allocation fails we will call free_io_pgtable_ops(),
but "domain->context_id" hasn't been initialized yet (likely it is 0).
https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L799

And having following call stack:
free_io_pgtable_ops() -> io_pgtable_tlb_flush_all() ->
ipmmu_tlb_flush_all() -> ipmmu_tlb_invalidate()
we will get a mistaken cache flush for a context pointed by
uninitialized "domain->context_id".

That's why I initialized context_id with non-existent value before
allocating context
https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L792
and checked it for a valid value here
https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L712
and everywhere it is need to checked.

>
>> +       if (domain->context_id >= domain->root->num_ctx)
>> +               return;
>> +
>>         ipmmu_tlb_invalidate(domain);
>>  }
>>
>> @@ -549,8 +773,10 @@ static int ipmmu_domain_init_context(struct
>> ipmmu_vmsa_domain *domain)
>>         domain->cfg.ias = 32;
>>         domain->cfg.oas = 40;
>>         domain->cfg.tlb = &ipmmu_gather_ops;
>> +#if 0 /* Xen: Not needed */
>>         domain->io_domain.geometry.aperture_end = DMA_BIT_MASK(32);
>>         domain->io_domain.geometry.force_aperture = true;
>> +#endif
>>         /*
>>          * TODO: Add support for coherent walk through CCI with DVM and
>> remove
>>          * cache handling. For now, delegate it to the io-pgtable code.
>> @@ -562,6 +788,9 @@ static int ipmmu_domain_init_context(struct
>> ipmmu_vmsa_domain *domain)
>>         if (!domain->iop)
>>                 return -EINVAL;
>>
>> +       /* Xen: Initialize context_id with non-existent value */
>> +       domain->context_id = domain->root->num_ctx;
>
>
> Why do you need to do that for Xen? Overall I think you need a bit more
> explanation of why you need those changes for Xen compare to the Linux
> driver.
I have just explained above why this change is needed. To avoid possible race.
If an explanation I gave above sounds reasonable, I can put comment in code.

>
>> +
>>         /*
>>          * Find an unused context.
>>          */
>> @@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct
>> ipmmu_vmsa_domain *domain)
>>
>>         /* TTBR0 */
>>         ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
>> +
>> +       /* Xen: */
>> +       dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd
>> 0x%"PRIx64")\n",
>> +                       domain->d->domain_id, domain->context_id, ttbr);
>
>
> If you want to keep driver close to Linux, then you need to avoid unecessary
> change.
Shall I drop it?

>
>
>> +
>>         ipmmu_ctx_write(domain, IMTTLBR0, ttbr);
>>         ipmmu_ctx_write(domain, IMTTUBR0, ttbr >> 32);
>>
>> @@ -616,8 +850,9 @@ static int ipmmu_domain_init_context(struct
>> ipmmu_vmsa_domain *domain)
>>          * translation table format doesn't use TEX remapping. Don't
>> enable AF
>>          * software management as we have no use for it. Flush the TLB as
>>          * required when modifying the context registers.
>> +        * Xen: Enable the context for the root IPMMU only.
>>          */
>> -       ipmmu_ctx_write2(domain, IMCTR,
>> +       ipmmu_ctx_write(domain, IMCTR,
>>                          IMCTR_INTEN | IMCTR_FLUSH | IMCTR_MMUEN);
>>
>>         return 0;
>> @@ -638,13 +873,18 @@ static void ipmmu_domain_free_context(struct
>> ipmmu_vmsa_device *mmu,
>>
>>  static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain
>> *domain)
>>  {
>> +       /* Xen: Just return if context_id has non-existent value */
>> +       if (domain->context_id >= domain->root->num_ctx)
>> +               return;
>> +
>>         /*
>>          * Disable the context. Flush the TLB as required when modifying
>> the
>>          * context registers.
>>          *
>>          * TODO: Is TLB flush really needed ?
>> +        * Xen: Disable the context for the root IPMMU only.
>>          */
>> -       ipmmu_ctx_write2(domain, IMCTR, IMCTR_FLUSH);
>> +       ipmmu_ctx_write(domain, IMCTR, IMCTR_FLUSH);
>>         ipmmu_tlb_sync(domain);
>>
>>  #ifdef CONFIG_RCAR_DDR_BACKUP
>> @@ -652,12 +892,16 @@ static void ipmmu_domain_destroy_context(struct
>> ipmmu_vmsa_domain *domain)
>>  #endif
>>
>>         ipmmu_domain_free_context(domain->root, domain->context_id);
>> +
>> +       /* Xen: Initialize context_id with non-existent value */
>> +       domain->context_id = domain->root->num_ctx;
>>  }
>>
>>  /*
>> -----------------------------------------------------------------------------
>>   * Fault Handling
>>   */
>>
>> +/* Xen: Show domain_id in every printk */
>>  static irqreturn_t ipmmu_domain_irq(struct ipmmu_vmsa_domain *domain)
>>  {
>>         const u32 err_mask = IMSTR_MHIT | IMSTR_ABORT | IMSTR_PF |
>> IMSTR_TF;
>> @@ -681,11 +925,11 @@ static irqreturn_t ipmmu_domain_irq(struct
>> ipmmu_vmsa_domain *domain)
>>
>>         /* Log fatal errors. */
>>         if (status & IMSTR_MHIT)
>> -               dev_err_ratelimited(mmu->dev, "Multiple TLB hits
>> @0x%08x\n",
>> -                                   iova);
>> +               dev_err_ratelimited(mmu->dev, "d%d: Multiple TLB hits
>> @0x%08x\n",
>> +                               domain->d->domain_id, iova);
>>         if (status & IMSTR_ABORT)
>> -               dev_err_ratelimited(mmu->dev, "Page Table Walk Abort
>> @0x%08x\n",
>> -                                   iova);
>> +               dev_err_ratelimited(mmu->dev, "d%d: Page Table Walk Abort
>> @0x%08x\n",
>> +                               domain->d->domain_id, iova);
>>
>>         if (!(status & (IMSTR_PF | IMSTR_TF)))
>>                 return IRQ_NONE;
>> @@ -700,8 +944,8 @@ static irqreturn_t ipmmu_domain_irq(struct
>> ipmmu_vmsa_domain *domain)
>>                 return IRQ_HANDLED;
>>
>>         dev_err_ratelimited(mmu->dev,
>> -                           "Unhandled fault: status 0x%08x iova
>> 0x%08x\n",
>> -                           status, iova);
>> +                       "d%d: Unhandled fault: status 0x%08x iova
>> 0x%08x\n",
>> +                       domain->d->domain_id, status, iova);
>>
>>         return IRQ_HANDLED;
>>  }
>> @@ -730,6 +974,16 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
>>         return status;
>>  }
>>
>> +/* Xen: Interrupt handlers wrapper */
>> +static void ipmmu_irq_xen(int irq, void *dev,
>> +                                     struct cpu_user_regs *regs)
>> +{
>> +       ipmmu_irq(irq, dev);
>> +}
>> +
>> +#define ipmmu_irq ipmmu_irq_xen
>> +
>> +#if 0 /* Xen: Not needed */
>>  /*
>> -----------------------------------------------------------------------------
>>   * IOMMU Operations
>>   */
>> @@ -759,6 +1013,7 @@ static void ipmmu_domain_free(struct iommu_domain
>> *io_domain)
>>         free_io_pgtable_ops(domain->iop);
>>         kfree(domain);
>>  }
>> +#endif
>>
>>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>>                                struct device *dev)
>> @@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain
>> *io_domain,
>>                 /* The domain hasn't been used yet, initialize it. */
>>                 domain->mmu = mmu;
>>                 domain->root = root;
>> +
>> +/*
>> + * Xen: We have already initialized and enabled context for root IPMMU
>> + * for this Xen domain. Enable context for given cache IPMMU only.
>> + * Flush the TLB as required when modifying the context registers.
>
>
> Why?

Original Linux driver provides each IOMMU domain with a separate IPMMU context.
So, when attaching device to IOMMU domain which hasn't been
initialized yet we have to
call ipmmu_domain_init_context() for initializing (root only) and
enabling (root + cache * ) context for this IOMMU domain.

* You can see at the end of the "original" ipmmu_domain_init_context()
implementation, that context is enabled for both cache and root IPMMUs
because of "ipmmu_ctx_write2".
https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620

From my point of view, we don't have to do the same when we are
attaching device in Xen, as we keep only one IPMMU context (P2M
mappings) per Xen domain
for using by all assigned to this guest devices.
What is more a number of context banks is limited (8), and if we
followed Linux way here, we would be quickly run out of available
contexts.
But having one IPMMU context per Xen domain allow us to passthrough
devices to 8 guest domain.

Taking into the account described above, we initialize (root only) and
enable (root only ** ) context at the domain creation time
if IOMMU is expected to be used for this guest.
https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380

** You can see at the end of the "modified"
ipmmu_domain_init_context() implementation, that context is enabled
for root IPMMU only
because of "ipmmu_ctx_write".
https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L882

That's why, here, in ipmmu_attach_device() we don't have to call
ipmmu_domain_init_context() anymore, because
the context has been already initialized and enabled. All what we need
here is to enable this context for cache IPMMU the device
is physically connected to.

Does this make sense?

>
>
>> + */
>> +#if 0
>>                 ret = ipmmu_domain_init_context(domain);
>> +#endif
>> +               ipmmu_ctx_write1(domain, IMCTR,
>> +                               ipmmu_ctx_read(domain, IMCTR) |
>> IMCTR_FLUSH);
>> +
>> +               dev_info(dev, "Using IPMMU context %u\n",
>> domain->context_id);
>> +#if 0 /* Xen: Not needed */
>>                 if (ret < 0) {
>>                         dev_err(dev, "Unable to initialize IPMMU
>> context\n");
>>                         domain->mmu = NULL;
>> @@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain
>> *io_domain,
>>                         dev_info(dev, "Using IPMMU context %u\n",
>>                                  domain->context_id);
>>                 }
>> +#endif
>>         } else if (domain->mmu != mmu) {
>>                 /*
>>                  * Something is wrong, we can't attach two devices using
>> @@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct iommu_domain
>> *io_domain,
>>          */
>>  }
>>
>> +/*
>> + * Xen: The current implementation of these callbacks is insufficient for
>> us
>> + * since they are intended to be called from Linux IOMMU core that
>> + * has already done all required actions such as doing various checks,
>> + * splitting into memory block the hardware supports and so on.
>
>
> Can you expand it here? Why can't our IOMMU framework could do that?

If add all required support to IOMMU framework and modify all existing
IOMMU drivers
to follow this support, then yes, it will avoid IOMMU drivers such as
IPMMU-VMSA from having these stuff in.

To be honest, I was trying to touch IOMMU common code and other IOMMU
drivers as little as possible,
but I had to introduce a few changes ("non-shared IOMMU").

>
> IHMO, if we want to get driver from Linux, we need to get an interface very
> close to it. Otherwise it is not worth it because you would have to
> implement for each IOMMU.
You are right.

>
> My overall feeling at the moment is Xen is not ready to welcome this driver
> directly from Linux. This is also a BSP driver, so no thorough review done
> by the community.

As I said in a cover letter the BSP driver had more complete support
than the mainline one.
I would like to clarify what need to be done from my side.
Should I wait for the missing things reach upsteam and then rebase on
the mainline driver?
Or should I rewrite this driver without following Linux?

>
> I have been told the BSP driver was in pretty bad state, so I think we
> really need to weight pros and cons of using it.

I am afraid I didn't get the first part of sentence.

>
> Cheers,
>
> --
> Julien Grall

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-10 14:27     ` Oleksandr Tyshchenko
@ 2017-08-10 15:13       ` Julien Grall
  2017-08-21 15:53         ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2017-08-10 15:13 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel, Stefano Stabellini, Oleksandr Tyshchenko

Hi,

On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
> On Tue, Aug 8, 2017 at 2:34 PM, Julien Grall <julien.grall@arm.com> wrote:
>> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>> @@ -355,6 +557,10 @@ static struct hw_register
>>> *root_pgtable[IPMMU_CTX_MAX] = {
>>>
>>>  static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
>>>  {
>>> +       /* Xen: Fix */
>>
>>
>> Hmmm. Can we get a bit more details?
>
> These is a case when ipmmu_is_root is called with "mmu" being NULL.
> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2330
>
> In ipmmu_vmsa_alloc_page_table() we need to find "root mmu", but we
> doesn't have argument to pass.
> So, I had two options:
>
> 1. Add code searching for it.
> ...
> spin_lock(&ipmmu_devices_lock);
> list_for_each_entry(mmu, &ipmmu_devices, list) {
>    if (ipmmu_is_root(mmu))
>       break;
> }
> spin_unlock(&ipmmu_devices_lock);
> ...
>
> 2. Use existing ipmmu_find_root() with adding this check for a valid value.
> So, if we call ipmmu_find_root() with argument being NULL we will
> actually get searching the list.
>
> I decided to use 2 option.

Can you please expand the comment then?

>
>>
>>> +       if (!mmu)
>>> +               return false;
>>> +
>>>         if (mmu->features->has_cache_leaf_nodes)
>>>                 return mmu->is_leaf ? false : true;
>>>         else
>>> @@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct ipmmu_vmsa_domain
>>> *domain, unsigned int reg,
>>>         ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg,
>>> data);
>>>  }
>>>
>>> -static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>> int reg,
>>> +/* Xen: Write the context for cache IPMMU only. */
>>
>>
>> Same here. Why does it need to be different with Xen?
>
> Well, let me elaborate a bit more about this.
>
> I feel that I need to explain in a few words about IPMMU itself:
> Generally speaking,
> The IPMMU hardware (R-Car Gen3) has 8 context banks and consists of next parts:
> - root IPMMU
> - a number of cache IPMMUs
>
> Each cache IPMMU is connected to root IPMMU and has uTLB ports the
> master devices can be tied to.
> Something, like this:
>
> master device1 ---> cache IPMMU1 [8 ctx] ---> root IPMMU [8 ctx] -> memory
>                            |                                          |
> master device2 --                                          |
>                                                                       |
> master device3 ---> cache IPMMU2 [8 ctx] --
>
> Each context bank has registers.
> Some registers exist for both root IPMMU and cache IPMMUs -> IMCTR
> Some registers exist only for root IPMMU -> IMTTLBRx/IMTTUBRx, IMMAIR0, etc
>
> So, original driver has two helpers:
> 1. ipmmu_ctx_write() - is intended to write a register in context bank
> N* for root IPMMU only.
> 2. ipmmu_ctx_write2() - is intended to write a register in context
> bank N for both root IPMMU and cache IPMMU.
> *where N=0-7
>
> AFAIU, original Linux driver provides each IOMMU domain with a
> separate IPMMU context:
> master device1 + master device2 are in IOMMU domain1 and use IPMMU context 0
> master device3 is in IOMMU domain2 and uses IPMMU context 1
>
> So, when attaching device to new IOMMU domain in Linux we have to
> initialize context for root IPMMU and enable context (IMCTR register)
> for both root and cache IPMMUs.
> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>
> In Xen we need additional helper "ipmmu_ctx_write1" for writing a
> register in context bank N for cache IPMMU only.
> The reason is that we need a way to control cache IPMMU separately
> since we have a little bit another model.
>
> All IOMMU domains within a single Xen domain (dom_iommu(d)->arch.priv)
> use the same IPMMU context N
> which was initialized and enabled at the domain creation time. This
> means that all master devices
> that are assigned to the guest domain "d" use only this IPMMU context
> N which actually contains P2M mapping for domain "d":
> master device1 + master device2 are in IOMMU domain1 and use IPMMU context 0
> master device3 is in IOMMU domain2 and also uses IPMMU context 0
>
> So, when attaching device to new IOMMU domain in Xen we don't have to
> initialize and enable context,
> because it has been already done at domain initialization time:
> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
> we just have to enable context for corresponding cache IPMMU only:
> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L1083
>
> This is the main difference between drivers in Linux and Xen.
>
> So, as you can see there is a need to manipulate context registers for
> cache IPMMU without touching root IPMMU,
> that's why I added this helper.
>
> Does this make sense?

I think it does.

>
>>
>>
>>> +static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned
>>> int reg,
>>>                              u32 data)
>>>  {
>>>         if (domain->mmu != domain->root)
>>> -               ipmmu_write(domain->mmu,
>>> -                           domain->context_id * IM_CTX_SIZE + reg, data);
>>> +               ipmmu_write(domain->mmu, domain->context_id * IM_CTX_SIZE
>>> + reg, data);
>>> +}
>>>
>>> -       ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE + reg,
>>> data);
>>> +/*
>>> + * Xen: Write the context for both root IPMMU and all cache IPMMUs
>>> + * that assigned to this Xen domain.
>>> + */
>>> +static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>> int reg,
>>> +                            u32 data)
>>> +{
>>> +       struct ipmmu_vmsa_xen_domain *xen_domain =
>>> dom_iommu(domain->d)->arch.priv;
>>> +       struct iommu_domain *io_domain;
>>> +
>>> +       list_for_each_entry(io_domain, &xen_domain->contexts, list)
>>> +               ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
>>> +
>>> +       ipmmu_ctx_write(domain, reg, data);
>>>  }
>>>
>>>  /*
>>> -----------------------------------------------------------------------------
>>> @@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
>>>  {
>>>         struct ipmmu_vmsa_domain *domain = cookie;
>>>
>>> +       /* Xen: Just return if context_id has non-existent value */
>>
>>
>> Same here.
>
> I think that there is a possible race.
> In ipmmu_domain_init_context() we are trying to allocate context and
> if allocation fails we will call free_io_pgtable_ops(),
> but "domain->context_id" hasn't been initialized yet (likely it is 0).
> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L799
>
> And having following call stack:
> free_io_pgtable_ops() -> io_pgtable_tlb_flush_all() ->
> ipmmu_tlb_flush_all() -> ipmmu_tlb_invalidate()
> we will get a mistaken cache flush for a context pointed by
> uninitialized "domain->context_id".
>
> That's why I initialized context_id with non-existent value before
> allocating context
> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L792
> and checked it for a valid value here
> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L712
> and everywhere it is need to checked.

The race is in the code added or the one from Linux? If the latter, then 
you should have an action to fix it there. If the former, the I'd like 
to understand how come we introduced a race compare to Linux.

[...]

>>
>>> +
>>>         /*
>>>          * Find an unused context.
>>>          */
>>> @@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct
>>> ipmmu_vmsa_domain *domain)
>>>
>>>         /* TTBR0 */
>>>         ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
>>> +
>>> +       /* Xen: */
>>> +       dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd
>>> 0x%"PRIx64")\n",
>>> +                       domain->d->domain_id, domain->context_id, ttbr);
>>
>>
>> If you want to keep driver close to Linux, then you need to avoid unecessary
>> change.
> Shall I drop it?

Depends. How useful is it? If it is, then may you want to upstream that?

[...]

>>>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>>>                                struct device *dev)
>>> @@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain
>>> *io_domain,
>>>                 /* The domain hasn't been used yet, initialize it. */
>>>                 domain->mmu = mmu;
>>>                 domain->root = root;
>>> +
>>> +/*
>>> + * Xen: We have already initialized and enabled context for root IPMMU
>>> + * for this Xen domain. Enable context for given cache IPMMU only.
>>> + * Flush the TLB as required when modifying the context registers.
>>
>>
>> Why?
>
> Original Linux driver provides each IOMMU domain with a separate IPMMU context.
> So, when attaching device to IOMMU domain which hasn't been
> initialized yet we have to
> call ipmmu_domain_init_context() for initializing (root only) and
> enabling (root + cache * ) context for this IOMMU domain.
>
> * You can see at the end of the "original" ipmmu_domain_init_context()
> implementation, that context is enabled for both cache and root IPMMUs
> because of "ipmmu_ctx_write2".
> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>
> From my point of view, we don't have to do the same when we are
> attaching device in Xen, as we keep only one IPMMU context (P2M
> mappings) per Xen domain
> for using by all assigned to this guest devices.
> What is more a number of context banks is limited (8), and if we
> followed Linux way here, we would be quickly run out of available
> contexts.
> But having one IPMMU context per Xen domain allow us to passthrough
> devices to 8 guest domain.

The way you describe it give an impression that the driver is 
fundamentally different in Xen compare to Linux. Am I right?

>
> Taking into the account described above, we initialize (root only) and
> enable (root only ** ) context at the domain creation time
> if IOMMU is expected to be used for this guest.
> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
>
> ** You can see at the end of the "modified"
> ipmmu_domain_init_context() implementation, that context is enabled
> for root IPMMU only
> because of "ipmmu_ctx_write".
> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L882
>
> That's why, here, in ipmmu_attach_device() we don't have to call
> ipmmu_domain_init_context() anymore, because
> the context has been already initialized and enabled. All what we need
> here is to enable this context for cache IPMMU the device
> is physically connected to.
>
> Does this make sense?
>
>>
>>
>>> + */
>>> +#if 0
>>>                 ret = ipmmu_domain_init_context(domain);
>>> +#endif
>>> +               ipmmu_ctx_write1(domain, IMCTR,
>>> +                               ipmmu_ctx_read(domain, IMCTR) |
>>> IMCTR_FLUSH);
>>> +
>>> +               dev_info(dev, "Using IPMMU context %u\n",
>>> domain->context_id);
>>> +#if 0 /* Xen: Not needed */
>>>                 if (ret < 0) {
>>>                         dev_err(dev, "Unable to initialize IPMMU
>>> context\n");
>>>                         domain->mmu = NULL;
>>> @@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain
>>> *io_domain,
>>>                         dev_info(dev, "Using IPMMU context %u\n",
>>>                                  domain->context_id);
>>>                 }
>>> +#endif
>>>         } else if (domain->mmu != mmu) {
>>>                 /*
>>>                  * Something is wrong, we can't attach two devices using
>>> @@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct iommu_domain
>>> *io_domain,
>>>          */
>>>  }
>>>
>>> +/*
>>> + * Xen: The current implementation of these callbacks is insufficient for
>>> us
>>> + * since they are intended to be called from Linux IOMMU core that
>>> + * has already done all required actions such as doing various checks,
>>> + * splitting into memory block the hardware supports and so on.
>>
>>
>> Can you expand it here? Why can't our IOMMU framework could do that?
>
> If add all required support to IOMMU framework and modify all existing
> IOMMU drivers
> to follow this support, then yes, it will avoid IOMMU drivers such as
> IPMMU-VMSA from having these stuff in.
>
> To be honest, I was trying to touch IOMMU common code and other IOMMU
> drivers as little as possible,
> but I had to introduce a few changes ("non-shared IOMMU").

What I am looking is something we can easily maintain in the future. If 
it requires change in the common code then we should do it. If it 
happens to be too complex, then maybe we should not take it from Linux.

>
>>
>> IHMO, if we want to get driver from Linux, we need to get an interface very
>> close to it. Otherwise it is not worth it because you would have to
>> implement for each IOMMU.
> You are right.
>
>>
>> My overall feeling at the moment is Xen is not ready to welcome this driver
>> directly from Linux. This is also a BSP driver, so no thorough review done
>> by the community.
>
> As I said in a cover letter the BSP driver had more complete support
> than the mainline one.

I know. But this means we are going to bring code in Xen that was not 
fully reviewed and don't know the quality of the code.

> I would like to clarify what need to be done from my side.
> Should I wait for the missing things reach upsteam and then rebase on
> the mainline driver?
> Or should I rewrite this driver without following Linux?

I don't have a clear answer here. As I said, we need to weight pros and 
cons to use Linux driver over our own.

At the moment, you are using a BSP driver which has more features but 
modified quite a lot. We don't even know when this is going to be merged 
in Linux.

Keeping code close to Linux requires some hacks that are acceptable if 
you can benefits from the community (bug fix, review...). As the driver 
is taken from the BSP, we don't know if the code will stay in the 
current form nor be able to get bug fix.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-10 15:13       ` Julien Grall
@ 2017-08-21 15:53         ` Oleksandr Tyshchenko
  2017-08-23 11:41           ` Julien Grall
  0 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-08-21 15:53 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Stefano Stabellini, Oleksandr Tyshchenko

Hi, Julien.

Sorry for the late response.

On Thu, Aug 10, 2017 at 6:13 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi,
>
> On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
>>
>> On Tue, Aug 8, 2017 at 2:34 PM, Julien Grall <julien.grall@arm.com> wrote:
>>>
>>> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>>>
>>>> @@ -355,6 +557,10 @@ static struct hw_register
>>>> *root_pgtable[IPMMU_CTX_MAX] = {
>>>>
>>>>  static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
>>>>  {
>>>> +       /* Xen: Fix */
>>>
>>>
>>>
>>> Hmmm. Can we get a bit more details?
>>
>>
>> These is a case when ipmmu_is_root is called with "mmu" being NULL.
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2330
>>
>> In ipmmu_vmsa_alloc_page_table() we need to find "root mmu", but we
>> doesn't have argument to pass.
>> So, I had two options:
>>
>> 1. Add code searching for it.
>> ...
>> spin_lock(&ipmmu_devices_lock);
>> list_for_each_entry(mmu, &ipmmu_devices, list) {
>>    if (ipmmu_is_root(mmu))
>>       break;
>> }
>> spin_unlock(&ipmmu_devices_lock);
>> ...
>>
>> 2. Use existing ipmmu_find_root() with adding this check for a valid
>> value.
>> So, if we call ipmmu_find_root() with argument being NULL we will
>> actually get searching the list.
>>
>> I decided to use 2 option.
>
>
> Can you please expand the comment then?
Will do.

>
>
>>
>>>
>>>> +       if (!mmu)
>>>> +               return false;
>>>> +
>>>>         if (mmu->features->has_cache_leaf_nodes)
>>>>                 return mmu->is_leaf ? false : true;
>>>>         else
>>>> @@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct
>>>> ipmmu_vmsa_domain
>>>> *domain, unsigned int reg,
>>>>         ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE +
>>>> reg,
>>>> data);
>>>>  }
>>>>
>>>> -static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>> +/* Xen: Write the context for cache IPMMU only. */
>>>
>>>
>>>
>>> Same here. Why does it need to be different with Xen?
>>
>>
>> Well, let me elaborate a bit more about this.
>>
>> I feel that I need to explain in a few words about IPMMU itself:
>> Generally speaking,
>> The IPMMU hardware (R-Car Gen3) has 8 context banks and consists of next
>> parts:
>> - root IPMMU
>> - a number of cache IPMMUs
>>
>> Each cache IPMMU is connected to root IPMMU and has uTLB ports the
>> master devices can be tied to.
>> Something, like this:
>>
>> master device1 ---> cache IPMMU1 [8 ctx] ---> root IPMMU [8 ctx] -> memory
>>                            |                                          |
>> master device2 --                                          |
>>                                                                       |
>> master device3 ---> cache IPMMU2 [8 ctx] --
>>
>> Each context bank has registers.
>> Some registers exist for both root IPMMU and cache IPMMUs -> IMCTR
>> Some registers exist only for root IPMMU -> IMTTLBRx/IMTTUBRx, IMMAIR0,
>> etc
>>
>> So, original driver has two helpers:
>> 1. ipmmu_ctx_write() - is intended to write a register in context bank
>> N* for root IPMMU only.
>> 2. ipmmu_ctx_write2() - is intended to write a register in context
>> bank N for both root IPMMU and cache IPMMU.
>> *where N=0-7
>>
>> AFAIU, original Linux driver provides each IOMMU domain with a
>> separate IPMMU context:
>> master device1 + master device2 are in IOMMU domain1 and use IPMMU context
>> 0
>> master device3 is in IOMMU domain2 and uses IPMMU context 1
>>
>> So, when attaching device to new IOMMU domain in Linux we have to
>> initialize context for root IPMMU and enable context (IMCTR register)
>> for both root and cache IPMMUs.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>>
>> In Xen we need additional helper "ipmmu_ctx_write1" for writing a
>> register in context bank N for cache IPMMU only.
>> The reason is that we need a way to control cache IPMMU separately
>> since we have a little bit another model.
>>
>> All IOMMU domains within a single Xen domain (dom_iommu(d)->arch.priv)
>> use the same IPMMU context N
>> which was initialized and enabled at the domain creation time. This
>> means that all master devices
>> that are assigned to the guest domain "d" use only this IPMMU context
>> N which actually contains P2M mapping for domain "d":
>> master device1 + master device2 are in IOMMU domain1 and use IPMMU context
>> 0
>> master device3 is in IOMMU domain2 and also uses IPMMU context 0
>>
>> So, when attaching device to new IOMMU domain in Xen we don't have to
>> initialize and enable context,
>> because it has been already done at domain initialization time:
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
>> we just have to enable context for corresponding cache IPMMU only:
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L1083
>>
>> This is the main difference between drivers in Linux and Xen.
>>
>> So, as you can see there is a need to manipulate context registers for
>> cache IPMMU without touching root IPMMU,
>> that's why I added this helper.
>>
>> Does this make sense?
>
>
> I think it does.
good.

>
>
>>
>>>
>>>
>>>> +static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>>                              u32 data)
>>>>  {
>>>>         if (domain->mmu != domain->root)
>>>> -               ipmmu_write(domain->mmu,
>>>> -                           domain->context_id * IM_CTX_SIZE + reg,
>>>> data);
>>>> +               ipmmu_write(domain->mmu, domain->context_id *
>>>> IM_CTX_SIZE
>>>> + reg, data);
>>>> +}
>>>>
>>>> -       ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE +
>>>> reg,
>>>> data);
>>>> +/*
>>>> + * Xen: Write the context for both root IPMMU and all cache IPMMUs
>>>> + * that assigned to this Xen domain.
>>>> + */
>>>> +static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>> +                            u32 data)
>>>> +{
>>>> +       struct ipmmu_vmsa_xen_domain *xen_domain =
>>>> dom_iommu(domain->d)->arch.priv;
>>>> +       struct iommu_domain *io_domain;
>>>> +
>>>> +       list_for_each_entry(io_domain, &xen_domain->contexts, list)
>>>> +               ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
>>>> +
>>>> +       ipmmu_ctx_write(domain, reg, data);
>>>>  }
>>>>
>>>>  /*
>>>>
>>>> -----------------------------------------------------------------------------
>>>> @@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
>>>>  {
>>>>         struct ipmmu_vmsa_domain *domain = cookie;
>>>>
>>>> +       /* Xen: Just return if context_id has non-existent value */
>>>
>>>
>>>
>>> Same here.
>>
>>
>> I think that there is a possible race.
>> In ipmmu_domain_init_context() we are trying to allocate context and
>> if allocation fails we will call free_io_pgtable_ops(),
>> but "domain->context_id" hasn't been initialized yet (likely it is 0).
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L799
>>
>> And having following call stack:
>> free_io_pgtable_ops() -> io_pgtable_tlb_flush_all() ->
>> ipmmu_tlb_flush_all() -> ipmmu_tlb_invalidate()
>> we will get a mistaken cache flush for a context pointed by
>> uninitialized "domain->context_id".
>>
>> That's why I initialized context_id with non-existent value before
>> allocating context
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L792
>> and checked it for a valid value here
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L712
>> and everywhere it is need to checked.
>
>
> The race is in the code added or the one from Linux? If the latter, then you
> should have an action to fix it there. If the former, the I'd like to
> understand how come we introduced a race compare to Linux.
I think the latter. I have just pushed a patch in.
https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023857.html

>
> [...]
>
>>>
>>>> +
>>>>         /*
>>>>          * Find an unused context.
>>>>          */
>>>> @@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct
>>>> ipmmu_vmsa_domain *domain)
>>>>
>>>>         /* TTBR0 */
>>>>         ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
>>>> +
>>>> +       /* Xen: */
>>>> +       dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd
>>>> 0x%"PRIx64")\n",
>>>> +                       domain->d->domain_id, domain->context_id, ttbr);
>>>
>>>
>>>
>>> If you want to keep driver close to Linux, then you need to avoid
>>> unecessary
>>> change.
>>
>> Shall I drop it?
>
>
> Depends. How useful is it? If it is, then may you want to upstream that?
Not so useful, but it is better to keep it while driver is in progress.
However, I can move this print out of the ipmmu_domain_init_context().
"Our" ipmmu_vmsa_alloc_page_table() is a good candidate to have it.

>
> [...]
>
>
>>>>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>>>>                                struct device *dev)
>>>> @@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain
>>>> *io_domain,
>>>>                 /* The domain hasn't been used yet, initialize it. */
>>>>                 domain->mmu = mmu;
>>>>                 domain->root = root;
>>>> +
>>>> +/*
>>>> + * Xen: We have already initialized and enabled context for root IPMMU
>>>> + * for this Xen domain. Enable context for given cache IPMMU only.
>>>> + * Flush the TLB as required when modifying the context registers.
>>>
>>>
>>>
>>> Why?
>>
>>
>> Original Linux driver provides each IOMMU domain with a separate IPMMU
>> context.
>> So, when attaching device to IOMMU domain which hasn't been
>> initialized yet we have to
>> call ipmmu_domain_init_context() for initializing (root only) and
>> enabling (root + cache * ) context for this IOMMU domain.
>>
>> * You can see at the end of the "original" ipmmu_domain_init_context()
>> implementation, that context is enabled for both cache and root IPMMUs
>> because of "ipmmu_ctx_write2".
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>>
>> From my point of view, we don't have to do the same when we are
>> attaching device in Xen, as we keep only one IPMMU context (P2M
>> mappings) per Xen domain
>> for using by all assigned to this guest devices.
>> What is more a number of context banks is limited (8), and if we
>> followed Linux way here, we would be quickly run out of available
>> contexts.
>> But having one IPMMU context per Xen domain allow us to passthrough
>> devices to 8 guest domain.
>
>
> The way you describe it give an impression that the driver is fundamentally
> different in Xen compare to Linux. Am I right?
It is hard to say, is "fundamentally different" suitable combination here.

Drivers differ mostly in context assignment.
Also Xen driver has "VMSAv8-64 mode" enabled and asynchronous page
table deallocation sequence.

So, probably, yes.

>
>
>>
>> Taking into the account described above, we initialize (root only) and
>> enable (root only ** ) context at the domain creation time
>> if IOMMU is expected to be used for this guest.
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
>>
>> ** You can see at the end of the "modified"
>> ipmmu_domain_init_context() implementation, that context is enabled
>> for root IPMMU only
>> because of "ipmmu_ctx_write".
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L882
>>
>> That's why, here, in ipmmu_attach_device() we don't have to call
>> ipmmu_domain_init_context() anymore, because
>> the context has been already initialized and enabled. All what we need
>> here is to enable this context for cache IPMMU the device
>> is physically connected to.
>>
>> Does this make sense?
>>
>>>
>>>
>>>> + */
>>>> +#if 0
>>>>                 ret = ipmmu_domain_init_context(domain);
>>>> +#endif
>>>> +               ipmmu_ctx_write1(domain, IMCTR,
>>>> +                               ipmmu_ctx_read(domain, IMCTR) |
>>>> IMCTR_FLUSH);
>>>> +
>>>> +               dev_info(dev, "Using IPMMU context %u\n",
>>>> domain->context_id);
>>>> +#if 0 /* Xen: Not needed */
>>>>                 if (ret < 0) {
>>>>                         dev_err(dev, "Unable to initialize IPMMU
>>>> context\n");
>>>>                         domain->mmu = NULL;
>>>> @@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain
>>>> *io_domain,
>>>>                         dev_info(dev, "Using IPMMU context %u\n",
>>>>                                  domain->context_id);
>>>>                 }
>>>> +#endif
>>>>         } else if (domain->mmu != mmu) {
>>>>                 /*
>>>>                  * Something is wrong, we can't attach two devices using
>>>> @@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct
>>>> iommu_domain
>>>> *io_domain,
>>>>          */
>>>>  }
>>>>
>>>> +/*
>>>> + * Xen: The current implementation of these callbacks is insufficient
>>>> for
>>>> us
>>>> + * since they are intended to be called from Linux IOMMU core that
>>>> + * has already done all required actions such as doing various checks,
>>>> + * splitting into memory block the hardware supports and so on.
>>>
>>>
>>>
>>> Can you expand it here? Why can't our IOMMU framework could do that?
>>
>>
>> If add all required support to IOMMU framework and modify all existing
>> IOMMU drivers
>> to follow this support, then yes, it will avoid IOMMU drivers such as
>> IPMMU-VMSA from having these stuff in.
>>
>> To be honest, I was trying to touch IOMMU common code and other IOMMU
>> drivers as little as possible,
>> but I had to introduce a few changes ("non-shared IOMMU").
>
>
> What I am looking is something we can easily maintain in the future. If it
> requires change in the common code then we should do it. If it happens to be
> too complex, then maybe we should not take it from Linux.
I understand your point.

>
>>
>>>
>>> IHMO, if we want to get driver from Linux, we need to get an interface
>>> very
>>> close to it. Otherwise it is not worth it because you would have to
>>> implement for each IOMMU.
>>
>> You are right.
>>
>>>
>>> My overall feeling at the moment is Xen is not ready to welcome this
>>> driver
>>> directly from Linux. This is also a BSP driver, so no thorough review
>>> done
>>> by the community.
>>
>>
>> As I said in a cover letter the BSP driver had more complete support
>> than the mainline one.
>
>
> I know. But this means we are going to bring code in Xen that was not fully
> reviewed and don't know the quality of the code.
>
>> I would like to clarify what need to be done from my side.
>> Should I wait for the missing things reach upsteam and then rebase on
>> the mainline driver?
>> Or should I rewrite this driver without following Linux?
>
>
> I don't have a clear answer here. As I said, we need to weight pros and cons
> to use Linux driver over our own.
>
> At the moment, you are using a BSP driver which has more features but
> modified quite a lot. We don't even know when this is going to be merged in
> Linux.
>
> Keeping code close to Linux requires some hacks that are acceptable if you
> can benefits from the community (bug fix, review...). As the driver is taken
> from the BSP, we don't know if the code will stay in the current form nor be
> able to get bug fix.

I got it. Completely agree with you.
But, we need to choose which direction we should follow. We have 3
options at the moment
and I am OK with each of them:
1. direct port from BSP (current implementation).
2. direct port from mainline Linux (when it has required support).
3. new driver based on BSP/Linux and contains only relevant to Xen things.

I am starting to think that options 2 or 3 (+) would be more suitable.
What do you think?

>
> Cheers,
>
> --
> Julien Grall

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-21 15:53         ` Oleksandr Tyshchenko
@ 2017-08-23 11:41           ` Julien Grall
  2017-08-25 20:06             ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2017-08-23 11:41 UTC (permalink / raw)
  To: Oleksandr Tyshchenko; +Cc: xen-devel, Stefano Stabellini, Oleksandr Tyshchenko

Hi Oleksandr,

On 21/08/17 16:53, Oleksandr Tyshchenko wrote:
> On Thu, Aug 10, 2017 at 6:13 PM, Julien Grall <julien.grall@arm.com> wrote:
>> On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
>>> I would like to clarify what need to be done from my side.
>>> Should I wait for the missing things reach upsteam and then rebase on
>>> the mainline driver?
>>> Or should I rewrite this driver without following Linux?
>>
>>
>> I don't have a clear answer here. As I said, we need to weight pros and cons
>> to use Linux driver over our own.
>>
>> At the moment, you are using a BSP driver which has more features but
>> modified quite a lot. We don't even know when this is going to be merged in
>> Linux.
>>
>> Keeping code close to Linux requires some hacks that are acceptable if you
>> can benefits from the community (bug fix, review...). As the driver is taken
>> from the BSP, we don't know if the code will stay in the current form nor be
>> able to get bug fix.
>
> I got it. Completely agree with you.
> But, we need to choose which direction we should follow. We have 3
> options at the moment
> and I am OK with each of them:
> 1. direct port from BSP (current implementation).
> 2. direct port from mainline Linux (when it has required support).
> 3. new driver based on BSP/Linux and contains only relevant to Xen things.
>
> I am starting to think that options 2 or 3 (+) would be more suitable.
> What do you think?

The option 2 rely on the changes to be merged in Linux. If I understand 
correctly, we don't have any timeline for this.

So I would lean towards option 3 to get a support in Xen.

Stefano, do you have any opinion?

Cheers,
-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-23 11:41           ` Julien Grall
@ 2017-08-25 20:06             ` Stefano Stabellini
  2017-08-28 17:29               ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2017-08-25 20:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, xen-devel, Stefano Stabellini,
	Oleksandr Tyshchenko

On Wed, 23 Aug 2017, Julien Grall wrote:
> Hi Oleksandr,
> 
> On 21/08/17 16:53, Oleksandr Tyshchenko wrote:
> > On Thu, Aug 10, 2017 at 6:13 PM, Julien Grall <julien.grall@arm.com> wrote:
> > > On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
> > > > I would like to clarify what need to be done from my side.
> > > > Should I wait for the missing things reach upsteam and then rebase on
> > > > the mainline driver?
> > > > Or should I rewrite this driver without following Linux?
> > > 
> > > 
> > > I don't have a clear answer here. As I said, we need to weight pros and
> > > cons
> > > to use Linux driver over our own.
> > > 
> > > At the moment, you are using a BSP driver which has more features but
> > > modified quite a lot. We don't even know when this is going to be merged
> > > in
> > > Linux.
> > > 
> > > Keeping code close to Linux requires some hacks that are acceptable if you
> > > can benefits from the community (bug fix, review...). As the driver is
> > > taken
> > > from the BSP, we don't know if the code will stay in the current form nor
> > > be
> > > able to get bug fix.
> > 
> > I got it. Completely agree with you.
> > But, we need to choose which direction we should follow. We have 3
> > options at the moment
> > and I am OK with each of them:
> > 1. direct port from BSP (current implementation).
> > 2. direct port from mainline Linux (when it has required support).
> > 3. new driver based on BSP/Linux and contains only relevant to Xen things.
> > 
> > I am starting to think that options 2 or 3 (+) would be more suitable.
> > What do you think?
> 
> The option 2 rely on the changes to be merged in Linux. If I understand
> correctly, we don't have any timeline for this.
> 
> So I would lean towards option 3 to get a support in Xen.
> 
> Stefano, do you have any opinion?

I agree with Julien. Option 3 is the way to go. There is only a benefit
in staying close to Linux if their driver is in good state, fully
featured, and well-maintained. And we certainly don't want to block your
work on waiting for somebody else who might or might nor merge his
changes in Linux. In this case, option 3 is best. I warn you, you might
have to maintain this driver in Xen going forward though :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver
  2017-08-25 20:06             ` Stefano Stabellini
@ 2017-08-28 17:29               ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 20+ messages in thread
From: Oleksandr Tyshchenko @ 2017-08-28 17:29 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Julien Grall, Oleksandr Tyshchenko

Hi, Stefano, Julien.

On Fri, Aug 25, 2017 at 11:06 PM, Stefano Stabellini
<sstabellini@kernel.org> wrote:
> On Wed, 23 Aug 2017, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 21/08/17 16:53, Oleksandr Tyshchenko wrote:
>> > On Thu, Aug 10, 2017 at 6:13 PM, Julien Grall <julien.grall@arm.com> wrote:
>> > > On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
>> > > > I would like to clarify what need to be done from my side.
>> > > > Should I wait for the missing things reach upsteam and then rebase on
>> > > > the mainline driver?
>> > > > Or should I rewrite this driver without following Linux?
>> > >
>> > >
>> > > I don't have a clear answer here. As I said, we need to weight pros and
>> > > cons
>> > > to use Linux driver over our own.
>> > >
>> > > At the moment, you are using a BSP driver which has more features but
>> > > modified quite a lot. We don't even know when this is going to be merged
>> > > in
>> > > Linux.
>> > >
>> > > Keeping code close to Linux requires some hacks that are acceptable if you
>> > > can benefits from the community (bug fix, review...). As the driver is
>> > > taken
>> > > from the BSP, we don't know if the code will stay in the current form nor
>> > > be
>> > > able to get bug fix.
>> >
>> > I got it. Completely agree with you.
>> > But, we need to choose which direction we should follow. We have 3
>> > options at the moment
>> > and I am OK with each of them:
>> > 1. direct port from BSP (current implementation).
>> > 2. direct port from mainline Linux (when it has required support).
>> > 3. new driver based on BSP/Linux and contains only relevant to Xen things.
>> >
>> > I am starting to think that options 2 or 3 (+) would be more suitable.
>> > What do you think?
>>
>> The option 2 rely on the changes to be merged in Linux. If I understand
>> correctly, we don't have any timeline for this.
>>
>> So I would lean towards option 3 to get a support in Xen.
>>
>> Stefano, do you have any opinion?
>
> I agree with Julien. Option 3 is the way to go. There is only a benefit
> in staying close to Linux if their driver is in good state, fully
> featured, and well-maintained. And we certainly don't want to block your
> work on waiting for somebody else who might or might nor merge his
> changes in Linux. In this case, option 3 is best.
Thank you for your suggestions.

> I warn you, you might
> have to maintain this driver in Xen going forward though :-)
Why not :-)

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-08-28 17:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26 15:09 [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Oleksandr Tyshchenko
2017-07-26 15:09 ` [RFC PATCH v1 1/7] iommu/arm: ipmmu-vmsa: Add IPMMU-VMSA support Oleksandr Tyshchenko
2017-07-26 15:09 ` [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver Oleksandr Tyshchenko
2017-08-08 11:34   ` Julien Grall
2017-08-10 14:27     ` Oleksandr Tyshchenko
2017-08-10 15:13       ` Julien Grall
2017-08-21 15:53         ` Oleksandr Tyshchenko
2017-08-23 11:41           ` Julien Grall
2017-08-25 20:06             ` Stefano Stabellini
2017-08-28 17:29               ` Oleksandr Tyshchenko
2017-07-26 15:10 ` [RFC PATCH v1 3/7] iommu/arm: ipmmu-vmsa: Add io-pgtables support Oleksandr Tyshchenko
2017-07-26 15:10 ` [RFC PATCH v1 4/7] iommu/arm: ipmmu-vmsa: Add Xen changes for io-pgtables Oleksandr Tyshchenko
2017-07-26 15:10 ` [RFC PATCH v1 5/7] iommu/arm: Build IPMMU-VMSA related stuff Oleksandr Tyshchenko
2017-07-26 15:10 ` [RFC PATCH v1 6/7] iommu/arm: ipmmu-vmsa: Deallocate page table asynchronously Oleksandr Tyshchenko
2017-08-08 11:36   ` Julien Grall
2017-07-26 15:10 ` [RFC PATCH v1 7/7] iommu/arm: ipmmu-vmsa: Enable VMSAv8-64 mode if IPMMU HW supports it Oleksandr Tyshchenko
2017-08-01 12:27 ` [RFC PATCH v1 0/7] IPMMU-VMSA support on ARM Julien Grall
2017-08-01 17:13   ` Oleksandr Tyshchenko
2017-08-08 11:21     ` Julien Grall
2017-08-08 16:52       ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.