iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation
@ 2020-06-30  0:10 Krishna Reddy
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30  0:10 UTC (permalink / raw)
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas

Changes in v8:
Fixed incorrect CB_FSR read issue during context bank fault.
Rebased and validated patches on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS                                   |   2 +
 drivers/iommu/Makefile                        |   2 +-
 drivers/iommu/arm-smmu-impl.c                 |   3 +
 drivers/iommu/arm-smmu-nvidia.c               | 294 ++++++++++++++++++
 drivers/iommu/arm-smmu.c                      |  17 +-
 drivers/iommu/arm-smmu.h                      |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  0:10 [PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation Krishna Reddy
@ 2020-06-30  0:10 ` Krishna Reddy
  2020-06-30  0:16   ` Nicolin Chen
                     ` (3 more replies)
  2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
  2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
  2 siblings, 4 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30  0:10 UTC (permalink / raw)
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas

NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
---
 MAINTAINERS                     |   2 +
 drivers/iommu/Makefile          |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 196 ++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu.h        |   1 +
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F:	drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M:	Thierry Reding <thierry.reding@gmail.com>
+R:	Krishna Reddy <vdumpa@nvidia.com>
 L:	linux-tegra@vger.kernel.org
 S:	Supported
+F:	drivers/iommu/arm-smmu-nvidia.c
 F:	drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 	if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
 		smmu->impl = &calxeda_impl;
 
+	if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+		return nvidia_smmu_impl_init(smmu);
+
 	if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
 	    of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
 		return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0000000000000..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include <linux/bitfield.h>
+#include <linux/delay.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US		1000000	/* 1s! */
+#define TLB_SPIN_COUNT			10
+
+struct nvidia_smmu {
+	struct arm_smmu_device	smmu;
+	unsigned int		num_inst;
+	void __iomem		*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+	return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+			       unsigned int inst, int page)
+{
+	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+	if (!nvidia_smmu->bases[0])
+		nvidia_smmu->bases[0] = smmu->base;
+
+	return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+			      int page, int offset)
+{
+	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+	return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+			    int page, int offset, u32 val)
+{
+	unsigned int i;
+	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+	for (i = 0; i < nvidia_smmu->num_inst; i++) {
+		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+		writel_relaxed(val, reg);
+	}
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+				int page, int offset)
+{
+	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+	return readq_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
+				  int page, int offset, u64 val)
+{
+	unsigned int i;
+	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+	for (i = 0; i < nvidia_smmu->num_inst; i++) {
+		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+		writeq_relaxed(val, reg);
+	}
+}
+
+static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
+			   int sync, int status)
+{
+	unsigned int delay;
+
+	arm_smmu_writel(smmu, page, sync, 0);
+
+	for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
+		unsigned int spin_cnt;
+
+		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
+			u32 val = 0;
+			unsigned int i;
+			struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+			for (i = 0; i < nvidia_smmu->num_inst; i++) {
+				void __iomem *reg =
+					nvidia_smmu_page(smmu, i, page) + status;
+
+				val |= readl_relaxed(reg);
+			}
+
+			if (!(val & ARM_SMMU_sTLBGSTATUS_GSACTIVE))
+				return;
+
+			cpu_relax();
+		}
+
+		udelay(delay);
+	}
+
+	dev_err_ratelimited(smmu->dev,
+			    "TLB sync timed out -- SMMU may be deadlocked\n");
+}
+
+static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
+{
+	unsigned int i;
+
+	for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) {
+		u32 val;
+		void __iomem *reg = nvidia_smmu_page(smmu, i, ARM_SMMU_GR0) +
+				    ARM_SMMU_GR0_sGFSR;
+
+		/* clear global FSR */
+		val = readl_relaxed(reg);
+		writel_relaxed(val, reg);
+	}
+
+	return 0;
+}
+
+static const struct arm_smmu_impl nvidia_smmu_impl = {
+	.read_reg = nvidia_smmu_read_reg,
+	.write_reg = nvidia_smmu_write_reg,
+	.read_reg64 = nvidia_smmu_read_reg64,
+	.write_reg64 = nvidia_smmu_write_reg64,
+	.reset = nvidia_smmu_reset,
+	.tlb_sync = nvidia_smmu_tlb_sync,
+};
+
+struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+	unsigned int i;
+	struct nvidia_smmu *nvidia_smmu;
+	struct platform_device *pdev = to_platform_device(smmu->dev);
+
+	nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
+	if (!nvidia_smmu)
+		return ERR_PTR(-ENOMEM);
+
+	nvidia_smmu->smmu = *smmu;
+	/* Instance 0 is ioremapped by arm-smmu.c after this function returns */
+	nvidia_smmu->num_inst = 1;
+
+	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
+		struct resource *res;
+
+		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+		if (!res)
+			break;
+
+		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
+		if (IS_ERR(nvidia_smmu->bases[i]))
+			return ERR_CAST(nvidia_smmu->bases[i]);
+
+		nvidia_smmu->num_inst++;
+	}
+
+	nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
+	/*
+	 * Free the arm_smmu_device struct allocated in arm-smmu.c.
+	 * Once this function returns, arm-smmu.c would use arm_smmu_device
+	 * allocated as part of nvidia_smmu struct.
+	 */
+	devm_kfree(smmu->dev, smmu);
+
+	return &nvidia_smmu->smmu;
+}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be618..8cf1511ed9874 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -450,6 +450,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
 	arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v))
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
 
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
-- 
2.26.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-06-30  0:10 [PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation Krishna Reddy
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
@ 2020-06-30  0:10 ` Krishna Reddy
  2020-06-30  6:01   ` Pritesh Raithatha
                     ` (2 more replies)
  2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
  2 siblings, 3 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30  0:10 UTC (permalink / raw)
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas

Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
               - qcom,sc7180-smmu-500
               - qcom,sdm845-smmu-500
           - const: arm,mmu-500
+      - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+        items:
+          - enum:
+              - nvdia,tegra194-smmu
+          - const: arm,mmu-500
       - items:
           - const: arm,mmu-500
           - const: arm,smmu-v2
-- 
2.26.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30  0:10 [PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation Krishna Reddy
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
  2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
@ 2020-06-30  0:10 ` Krishna Reddy
  2020-06-30  0:19   ` Nicolin Chen
                     ` (2 more replies)
  2 siblings, 3 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30  0:10 UTC (permalink / raw)
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas

Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu.c        | 17 +++++-
 drivers/iommu/arm-smmu.h        |  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 1124f0ac1823a..c9423b4199c65 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+					       struct arm_smmu_device *smmu,
+					       int inst)
+{
+	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+	void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+	gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+	if (!gfsr)
+		return IRQ_NONE;
+
+	gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+	gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+	dev_err_ratelimited(smmu->dev,
+		"Unexpected global fault, this could be serious\n");
+	dev_err_ratelimited(smmu->dev,
+		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
+		gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+	writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+	int inst;
+	irqreturn_t irq_ret = IRQ_NONE;
+	struct arm_smmu_device *smmu = dev;
+	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+	for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+		irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+		if (irq_ret == IRQ_HANDLED)
+			return irq_ret;
+	}
+
+	return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+					    struct arm_smmu_device *smmu,
+					    int idx, int inst)
+{
+	u32 fsr, fsynr, cbfrsynra;
+	unsigned long iova;
+	void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+	void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx);
+
+	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+	if (!(fsr & ARM_SMMU_FSR_FAULT))
+		return IRQ_NONE;
+
+	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+	iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+	cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+	dev_err_ratelimited(smmu->dev,
+	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+			    fsr, iova, fsynr, cbfrsynra, idx);
+
+	writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+	int inst, idx;
+	irqreturn_t irq_ret = IRQ_NONE;
+	struct iommu_domain *domain = dev;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+		/*
+		 * Interrupt line shared between all context faults.
+		 * Check for faults across all contexts.
+		 */
+		for (idx = 0; idx < smmu->num_context_banks; idx++) {
+			irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+								 idx, inst);
+
+			if (irq_ret == IRQ_HANDLED)
+				return irq_ret;
+		}
+	}
+
+	return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
 	.read_reg = nvidia_smmu_read_reg,
 	.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
 	.write_reg64 = nvidia_smmu_write_reg64,
 	.reset = nvidia_smmu_reset,
 	.tlb_sync = nvidia_smmu_tlb_sync,
+	.global_fault = nvidia_smmu_global_fault,
+	.context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..3bb0aba15a356 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -673,6 +673,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	irqreturn_t (*context_fault)(int irq, void *dev);
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -835,7 +836,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	 * handler seeing a half-initialised domain state.
 	 */
 	irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
-	ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault,
+
+	if (smmu->impl && smmu->impl->context_fault)
+		context_fault = smmu->impl->context_fault;
+	else
+		context_fault = arm_smmu_context_fault;
+
+	ret = devm_request_irq(smmu->dev, irq, context_fault,
 			       IRQF_SHARED, "arm-smmu-context-fault", domain);
 	if (ret < 0) {
 		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
@@ -2107,6 +2114,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	struct arm_smmu_device *smmu;
 	struct device *dev = &pdev->dev;
 	int num_irqs, i, err;
+	irqreturn_t (*global_fault)(int irq, void *dev);
 
 	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
 	if (!smmu) {
@@ -2193,9 +2201,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		smmu->num_context_irqs = smmu->num_context_banks;
 	}
 
+	if (smmu->impl && smmu->impl->global_fault)
+		global_fault = smmu->impl->global_fault;
+	else
+		global_fault = arm_smmu_global_fault;
+
 	for (i = 0; i < smmu->num_global_irqs; ++i) {
 		err = devm_request_irq(smmu->dev, smmu->irqs[i],
-				       arm_smmu_global_fault,
+				       global_fault,
 				       IRQF_SHARED,
 				       "arm-smmu global fault",
 				       smmu);
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 8cf1511ed9874..8b330076ff2af 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -18,6 +18,7 @@
 #include <linux/io-64-nonatomic-hi-lo.h>
 #include <linux/io-pgtable.h>
 #include <linux/iommu.h>
+#include <linux/irqreturn.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
@@ -387,6 +388,8 @@ struct arm_smmu_impl {
 	void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 			 int status);
 	int (*def_domain_type)(struct device *dev);
+	irqreturn_t (*global_fault)(int irq, void *dev);
+	irqreturn_t (*context_fault)(int irq, void *dev);
 };
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
-- 
2.26.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
@ 2020-06-30  0:16   ` Nicolin Chen
  2020-06-30  5:54   ` Pritesh Raithatha
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2020-06-30  0:16 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, mperttunen, bhuntsman, will, linux-kernel, praithatha,
	talho, iommu, nicolinc, linux-tegra, yhsu, treding, robin.murphy,
	linux-arm-kernel, bbiswas

On Mon, Jun 29, 2020 at 05:10:49PM -0700, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

Reviewed-by: Nicolin Chen <nicoleotsuka@gmail.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
@ 2020-06-30  0:19   ` Nicolin Chen
  2020-06-30  5:58   ` Pritesh Raithatha
  2020-06-30  8:37   ` Jon Hunter
  2 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2020-06-30  0:19 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, mperttunen, bhuntsman, will, linux-kernel, praithatha,
	talho, iommu, nicolinc, linux-tegra, yhsu, treding, robin.murphy,
	linux-arm-kernel, bbiswas

On Mon, Jun 29, 2020 at 05:10:51PM -0700, Krishna Reddy wrote:
> Add global/context fault hooks to allow NVIDIA SMMU implementation
> handle faults across multiple SMMUs.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

Reviewed-by: Nicolin Chen <nicoleotsuka@gmail.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
  2020-06-30  0:16   ` Nicolin Chen
@ 2020-06-30  5:54   ` Pritesh Raithatha
  2020-06-30  8:19   ` Jon Hunter
  2020-06-30 10:17   ` Jon Hunter
  3 siblings, 0 replies; 38+ messages in thread
From: Pritesh Raithatha @ 2020-06-30  5:54 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Bryan Huntsman, will, linux-kernel,
	Mikko Perttunen, Timo Alho, iommu, Nicolin Chen, linux-tegra,
	Yu-Huan Hsu, Thierry Reding, robin.murphy, linux-arm-kernel,
	Bitan Biswas

> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA
> accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

Reviewed-by: Pritesh Raithatha <praithatha@nvidia.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
  2020-06-30  0:19   ` Nicolin Chen
@ 2020-06-30  5:58   ` Pritesh Raithatha
  2020-06-30  8:37   ` Jon Hunter
  2 siblings, 0 replies; 38+ messages in thread
From: Pritesh Raithatha @ 2020-06-30  5:58 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Bryan Huntsman, will, linux-kernel,
	Mikko Perttunen, Timo Alho, iommu, Nicolin Chen, linux-tegra,
	Yu-Huan Hsu, Thierry Reding, robin.murphy, linux-arm-kernel,
	Bitan Biswas

> Add global/context fault hooks to allow NVIDIA SMMU implementation handle
> faults across multiple SMMUs.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

Reviewed-by: Pritesh Raithatha <praithatha@nvidia.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
@ 2020-06-30  6:01   ` Pritesh Raithatha
  2020-06-30  8:21   ` Jon Hunter
  2020-06-30 12:27   ` Robin Murphy
  2 siblings, 0 replies; 38+ messages in thread
From: Pritesh Raithatha @ 2020-06-30  6:01 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Bryan Huntsman, will, linux-kernel,
	Mikko Perttunen, Timo Alho, iommu, Nicolin Chen, linux-tegra,
	Yu-Huan Hsu, Thierry Reding, robin.murphy, linux-arm-kernel,
	Bitan Biswas

> Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM
> MMU-500.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

Reviewed-by: Pritesh Raithatha <praithatha@nvidia.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
  2020-06-30  0:16   ` Nicolin Chen
  2020-06-30  5:54   ` Pritesh Raithatha
@ 2020-06-30  8:19   ` Jon Hunter
  2020-06-30 14:53     ` Robin Murphy
  2020-06-30 17:04     ` Krishna Reddy
  2020-06-30 10:17   ` Jon Hunter
  3 siblings, 2 replies; 38+ messages in thread
From: Jon Hunter @ 2020-06-30  8:19 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas


On 30/06/2020 01:10, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.

There is no description here of the 3rd SMMU that you mention below.
I think that we should describe the full picture here.
 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
> ---
>  MAINTAINERS                     |   2 +
>  drivers/iommu/Makefile          |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 196 ++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu.h        |   1 +
>  5 files changed, 203 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7b5ffd646c6b9..64c37dbdd4426 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16808,8 +16808,10 @@ F:	drivers/i2c/busses/i2c-tegra.c
>  
>  TEGRA IOMMU DRIVERS
>  M:	Thierry Reding <thierry.reding@gmail.com>
> +R:	Krishna Reddy <vdumpa@nvidia.com>
>  L:	linux-tegra@vger.kernel.org
>  S:	Supported
> +F:	drivers/iommu/arm-smmu-nvidia.c
>  F:	drivers/iommu/tegra*
>  
>  TEGRA KBC DRIVER
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 342190196dfb0..2b8203db73ec3 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
>  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
>  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
>  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
> -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
> +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c75b9d957b702..70f7318017617 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>  	if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
>  		smmu->impl = &calxeda_impl;
>  
> +	if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
> +		return nvidia_smmu_impl_init(smmu);
> +
>  	if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
>  	    of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
>  		return qcom_smmu_impl_init(smmu);
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> new file mode 100644
> index 0000000000000..1124f0ac1823a
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// NVIDIA ARM SMMU v2 implementation quirks
> +// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
> +
> +#include <linux/bitfield.h>
> +#include <linux/delay.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +
> +#include "arm-smmu.h"
> +
> +/*
> + * Tegra194 has three ARM MMU-500 Instances.
> + * Two of them are used together for interleaved IOVA accesses and
> + * used by non-isochronous HW devices for SMMU translations.
> + * Third one is used for SMMU translations from isochronous HW devices.
> + * It is possible to use this implementation to program either
> + * all three or two of the instances identically as desired through
> + * DT node.
> + *
> + * Programming all the three instances identically comes with redundant TLB
> + * invalidations as all three never need to be TLB invalidated for a HW device.
> + *
> + * When Linux kernel supports multiple SMMU devices, the SMMU device used for
> + * isochornous HW devices should be added as a separate ARM MMU-500 device
> + * in DT and be programmed independently for efficient TLB invalidates.
> + */
> +#define MAX_SMMU_INSTANCES 3
> +
> +#define TLB_LOOP_TIMEOUT_IN_US		1000000	/* 1s! */
> +#define TLB_SPIN_COUNT			10
> +
> +struct nvidia_smmu {
> +	struct arm_smmu_device	smmu;
> +	unsigned int		num_inst;
> +	void __iomem		*bases[MAX_SMMU_INSTANCES];
> +};
> +
> +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
> +{
> +	return container_of(smmu, struct nvidia_smmu, smmu);
> +}
> +
> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
> +			       unsigned int inst, int page)

If you run checkpatch --strict on these you will get a lot of ...

CHECK: Alignment should match open parenthesis
#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+			       unsigned int inst, int page)

We should fix these.

> +{
> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> +	if (!nvidia_smmu->bases[0])
> +		nvidia_smmu->bases[0] = smmu->base;
> +
> +	return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
> +}
> +
> +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
> +			      int page, int offset)
> +{
> +	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
> +
> +	return readl_relaxed(reg);
> +}
> +
> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
> +			    int page, int offset, u32 val)
> +{
> +	unsigned int i;
> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> +	for (i = 0; i < nvidia_smmu->num_inst; i++) {
> +		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

Personally, I would declare 'reg' outside of the loop as I feel it will make
the code cleaner and easier to read.

> +
> +		writel_relaxed(val, reg);
> +	}
> +}
> +
> +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
> +				int page, int offset)
> +{
> +	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
> +
> +	return readq_relaxed(reg);
> +}
> +
> +static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
> +				  int page, int offset, u64 val)
> +{
> +	unsigned int i;
> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> +	for (i = 0; i < nvidia_smmu->num_inst; i++) {
> +		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
> +
> +		writeq_relaxed(val, reg);
> +	}
> +}
> +
> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
> +			   int sync, int status)
> +{
> +	unsigned int delay;
> +
> +	arm_smmu_writel(smmu, page, sync, 0);
> +
> +	for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {

So we are doubling the delay every time? Is this better than just using
the same on each loop? 

> +		unsigned int spin_cnt;
> +
> +		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> +			u32 val = 0;
> +			unsigned int i;
> +			struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);

Why not do this once at the beginning of the function? 

> +
> +			for (i = 0; i < nvidia_smmu->num_inst; i++) {
> +				void __iomem *reg =
> +					nvidia_smmu_page(smmu, i, page) + status;
> +
> +				val |= readl_relaxed(reg);
> +			}
> +
> +			if (!(val & ARM_SMMU_sTLBGSTATUS_GSACTIVE))
> +				return;
> +
> +			cpu_relax();
> +		}
> +
> +		udelay(delay);
> +	}
> +
> +	dev_err_ratelimited(smmu->dev,
> +			    "TLB sync timed out -- SMMU may be deadlocked\n");
> +}
> +
> +static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) {
> +		u32 val;
> +		void __iomem *reg = nvidia_smmu_page(smmu, i, ARM_SMMU_GR0) +
> +				    ARM_SMMU_GR0_sGFSR;

I feel that declaring variables here clutters the code.
 
> +
> +		/* clear global FSR */
> +		val = readl_relaxed(reg);
> +		writel_relaxed(val, reg);
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct arm_smmu_impl nvidia_smmu_impl = {
> +	.read_reg = nvidia_smmu_read_reg,
> +	.write_reg = nvidia_smmu_write_reg,
> +	.read_reg64 = nvidia_smmu_read_reg64,
> +	.write_reg64 = nvidia_smmu_write_reg64,
> +	.reset = nvidia_smmu_reset,
> +	.tlb_sync = nvidia_smmu_tlb_sync,
> +};
> +
> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> +	unsigned int i;
> +	struct nvidia_smmu *nvidia_smmu;
> +	struct platform_device *pdev = to_platform_device(smmu->dev);
> +
> +	nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
> +	if (!nvidia_smmu)
> +		return ERR_PTR(-ENOMEM);
> +
> +	nvidia_smmu->smmu = *smmu;
> +	/* Instance 0 is ioremapped by arm-smmu.c after this function returns */
> +	nvidia_smmu->num_inst = 1;
> +
> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
> +		struct resource *res;
> +
> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> +		if (!res)
> +			break;
> +
> +		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
> +		if (IS_ERR(nvidia_smmu->bases[i]))
> +			return ERR_CAST(nvidia_smmu->bases[i]);
> +
> +		nvidia_smmu->num_inst++;
> +	}
> +
> +	nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
> +	/*
> +	 * Free the arm_smmu_device struct allocated in arm-smmu.c.
> +	 * Once this function returns, arm-smmu.c would use arm_smmu_device
> +	 * allocated as part of nvidia_smmu struct.
> +	 */
> +	devm_kfree(smmu->dev, smmu);

Why don't we just store the pointer of the smmu struct passed to this function
in the nvidia_smmu struct and then we do not need to free this here. In other
words make ...

 struct nvidia_smmu {
	struct arm_smmu_device	*smmu;
	unsigned int		num_inst;
	void __iomem		*bases[MAX_SMMU_INSTANCES];
 };  

This seems more appropriate, than copying the struct and freeing memory
allocated else-where.
 
> +
> +	return &nvidia_smmu->smmu;
> +}
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index d172c024be618..8cf1511ed9874 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -450,6 +450,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
>  	arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v))
>  
>  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
>  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
>  
>  int arm_mmu500_reset(struct arm_smmu_device *smmu);
> 

Cheers
Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
  2020-06-30  6:01   ` Pritesh Raithatha
@ 2020-06-30  8:21   ` Jon Hunter
  2020-06-30 12:27   ` Robin Murphy
  2 siblings, 0 replies; 38+ messages in thread
From: Jon Hunter @ 2020-06-30  8:21 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas


On 30/06/2020 01:10, Krishna Reddy wrote:
> Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
> on ARM MMU-500.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index d7ceb4c34423b..5b2586ac715ed 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -38,6 +38,11 @@ properties:
>                - qcom,sc7180-smmu-500
>                - qcom,sdm845-smmu-500
>            - const: arm,mmu-500
> +      - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> +        items:
> +          - enum:
> +              - nvdia,tegra194-smmu

s/nvdia/nvidia

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
  2020-06-30  0:19   ` Nicolin Chen
  2020-06-30  5:58   ` Pritesh Raithatha
@ 2020-06-30  8:37   ` Jon Hunter
  2020-06-30 12:13     ` Robin Murphy
  2 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-06-30  8:37 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas


On 30/06/2020 01:10, Krishna Reddy wrote:
> Add global/context fault hooks to allow NVIDIA SMMU implementation
> handle faults across multiple SMMUs.

Nit ... this is not just for NVIDIA, but this allows anyone to add
custom global/context and fault hooks. So I think that the changelog
should be clear that this change permits custom fault hooks and that
custom fault hooks are needed for the Tegra194 SMMU. You may also want
to say why.

> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
> ---
>  drivers/iommu/arm-smmu-nvidia.c | 98 +++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu.c        | 17 +++++-
>  drivers/iommu/arm-smmu.h        |  3 +
>  3 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> index 1124f0ac1823a..c9423b4199c65 100644
> --- a/drivers/iommu/arm-smmu-nvidia.c
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
>  	return 0;
>  }
>  
> +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
> +{
> +	return container_of(dom, struct arm_smmu_domain, domain);
> +}
> +
> +static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
> +					       struct arm_smmu_device *smmu,
> +					       int inst)
> +{
> +	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
> +	void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
> +
> +	gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
> +	if (!gfsr)
> +		return IRQ_NONE;
> +
> +	gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
> +	gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
> +	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
> +
> +	dev_err_ratelimited(smmu->dev,
> +		"Unexpected global fault, this could be serious\n");
> +	dev_err_ratelimited(smmu->dev,
> +		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
> +		gfsr, gfsynr0, gfsynr1, gfsynr2);
> +
> +	writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
> +{
> +	int inst;

Should be unsigned

> +	irqreturn_t irq_ret = IRQ_NONE;
> +	struct arm_smmu_device *smmu = dev;
> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> +
> +	for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
> +		irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
> +		if (irq_ret == IRQ_HANDLED)
> +			return irq_ret;

Any chance there could be more than one SMMU faulting by the time we
service the interrupt?

> +	}
> +
> +	return irq_ret;
> +}
> +
> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
> +					    struct arm_smmu_device *smmu,
> +					    int idx, int inst)
> +{
> +	u32 fsr, fsynr, cbfrsynra;
> +	unsigned long iova;
> +	void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
> +	void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx);
> +
> +	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
> +	if (!(fsr & ARM_SMMU_FSR_FAULT))
> +		return IRQ_NONE;
> +
> +	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
> +	iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
> +	cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
> +
> +	dev_err_ratelimited(smmu->dev,
> +	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> +			    fsr, iova, fsynr, cbfrsynra, idx);
> +
> +	writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
> +{
> +	int inst, idx;

Unsigned

> +	irqreturn_t irq_ret = IRQ_NONE;
> +	struct iommu_domain *domain = dev;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
> +		/*
> +		 * Interrupt line shared between all context faults.
> +		 * Check for faults across all contexts.
> +		 */
> +		for (idx = 0; idx < smmu->num_context_banks; idx++) {
> +			irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
> +								 idx, inst);
> +
> +			if (irq_ret == IRQ_HANDLED)
> +				return irq_ret;

Any reason why we don't check all banks?

> +		}
> +	}
> +
> +	return irq_ret;
> +}
> +
>  static const struct arm_smmu_impl nvidia_smmu_impl = {
>  	.read_reg = nvidia_smmu_read_reg,
>  	.write_reg = nvidia_smmu_write_reg,
> @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
>  	.write_reg64 = nvidia_smmu_write_reg64,
>  	.reset = nvidia_smmu_reset,
>  	.tlb_sync = nvidia_smmu_tlb_sync,
> +	.global_fault = nvidia_smmu_global_fault,
> +	.context_fault = nvidia_smmu_context_fault,
>  };
>  
>  struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 243bc4cb2705b..3bb0aba15a356 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -673,6 +673,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  	enum io_pgtable_fmt fmt;
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	irqreturn_t (*context_fault)(int irq, void *dev);
>  
>  	mutex_lock(&smmu_domain->init_mutex);
>  	if (smmu_domain->smmu)
> @@ -835,7 +836,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>  	 * handler seeing a half-initialised domain state.
>  	 */
>  	irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
> -	ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault,
> +
> +	if (smmu->impl && smmu->impl->context_fault)
> +		context_fault = smmu->impl->context_fault;
> +	else
> +		context_fault = arm_smmu_context_fault;

Why not see the default smmu->impl->context_fault to
arm_smmu_context_fault in arm_smmu_impl_init() and then allow the
various implementations to override as necessary? Then you can get rid
of this context_fault variable here and just use
smmu->impl->context_fault below.

> +
> +	ret = devm_request_irq(smmu->dev, irq, context_fault,
>  			       IRQF_SHARED, "arm-smmu-context-fault", domain);
>  	if (ret < 0) {
>  		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
> @@ -2107,6 +2114,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	struct arm_smmu_device *smmu;
>  	struct device *dev = &pdev->dev;
>  	int num_irqs, i, err;
> +	irqreturn_t (*global_fault)(int irq, void *dev);
>  
>  	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>  	if (!smmu) {
> @@ -2193,9 +2201,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  		smmu->num_context_irqs = smmu->num_context_banks;
>  	}
>  
> +	if (smmu->impl && smmu->impl->global_fault)
> +		global_fault = smmu->impl->global_fault;
> +	else
> +		global_fault = arm_smmu_global_fault;
> +

Same here.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
                     ` (2 preceding siblings ...)
  2020-06-30  8:19   ` Jon Hunter
@ 2020-06-30 10:17   ` Jon Hunter
  2020-06-30 16:23     ` Krishna Reddy
  3 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 10:17 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, will, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	robin.murphy, linux-arm-kernel, bbiswas


On 30/06/2020 01:10, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
> ---
>  MAINTAINERS                     |   2 +
>  drivers/iommu/Makefile          |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 196 ++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu.h        |   1 +
>  5 files changed, 203 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c

...

> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> +	unsigned int i;
> +	struct nvidia_smmu *nvidia_smmu;
> +	struct platform_device *pdev = to_platform_device(smmu->dev);
> +
> +	nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
> +	if (!nvidia_smmu)
> +		return ERR_PTR(-ENOMEM);
> +
> +	nvidia_smmu->smmu = *smmu;
> +	/* Instance 0 is ioremapped by arm-smmu.c after this function returns */
> +	nvidia_smmu->num_inst = 1;
> +
> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
> +		struct resource *res;
> +
> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> +		if (!res)
> +			break;

Currently this driver is only supported for Tegra194 which I understand
has 3 SMMUs. Therefore, I don't feel that we should fail silently here,
I think it is better to return an error if all 3 cannot be initialised.
In the future if there is an SoC that has less (hopefully not more) than
Tegra194 then we should handle this via the DT compatible string. In
other words, we should always know how many SMMUs there are for a given
SoC and how many we should initialise.

> +
> +		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
> +		if (IS_ERR(nvidia_smmu->bases[i]))
> +			return ERR_CAST(nvidia_smmu->bases[i]);

You want to use PTR_ERR() here.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30  8:37   ` Jon Hunter
@ 2020-06-30 12:13     ` Robin Murphy
  2020-06-30 12:42       ` Jon Hunter
  2020-07-01 18:48       ` Krishna Reddy
  0 siblings, 2 replies; 38+ messages in thread
From: Robin Murphy @ 2020-06-30 12:13 UTC (permalink / raw)
  To: Jon Hunter, Krishna Reddy
  Cc: talho, treding, bhuntsman, linux-kernel, iommu, mperttunen,
	nicoleotsuka, snikam, nicolinc, linux-tegra, yhsu, praithatha,
	will, linux-arm-kernel, bbiswas

On 2020-06-30 09:37, Jon Hunter wrote:
> 
> On 30/06/2020 01:10, Krishna Reddy wrote:
>> Add global/context fault hooks to allow NVIDIA SMMU implementation
>> handle faults across multiple SMMUs.
> 
> Nit ... this is not just for NVIDIA, but this allows anyone to add
> custom global/context and fault hooks. So I think that the changelog
> should be clear that this change permits custom fault hooks and that
> custom fault hooks are needed for the Tegra194 SMMU. You may also want
> to say why.
> 
>>
>> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
>> ---
>>   drivers/iommu/arm-smmu-nvidia.c | 98 +++++++++++++++++++++++++++++++++
>>   drivers/iommu/arm-smmu.c        | 17 +++++-
>>   drivers/iommu/arm-smmu.h        |  3 +
>>   3 files changed, 116 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
>> index 1124f0ac1823a..c9423b4199c65 100644
>> --- a/drivers/iommu/arm-smmu-nvidia.c
>> +++ b/drivers/iommu/arm-smmu-nvidia.c
>> @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
>>   	return 0;
>>   }
>>   
>> +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
>> +{
>> +	return container_of(dom, struct arm_smmu_domain, domain);
>> +}
>> +
>> +static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
>> +					       struct arm_smmu_device *smmu,
>> +					       int inst)
>> +{
>> +	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
>> +	void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
>> +
>> +	gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
>> +	if (!gfsr)
>> +		return IRQ_NONE;
>> +
>> +	gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
>> +	gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
>> +	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
>> +
>> +	dev_err_ratelimited(smmu->dev,
>> +		"Unexpected global fault, this could be serious\n");
>> +	dev_err_ratelimited(smmu->dev,
>> +		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
>> +		gfsr, gfsynr0, gfsynr1, gfsynr2);
>> +
>> +	writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
>> +{
>> +	int inst;
> 
> Should be unsigned
> 
>> +	irqreturn_t irq_ret = IRQ_NONE;
>> +	struct arm_smmu_device *smmu = dev;
>> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +	for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>> +		irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>> +		if (irq_ret == IRQ_HANDLED)
>> +			return irq_ret;
> 
> Any chance there could be more than one SMMU faulting by the time we
> service the interrupt?

It certainly seems plausible if the interconnect is automatically 
load-balancing requests across the SMMU instances - say a driver bug 
caused a buffer to be unmapped too early, there could be many in-flight 
accesses to parts of that buffer that aren't all taking the same path 
and thus could now fault in parallel.

[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
buffer/tear down a domain/ ;) ]

Either way I think it would be easier to reason about if we just handled 
these like a typical shared interrupt and always checked all the instances.

>> +	}
>> +
>> +	return irq_ret;
>> +}
>> +
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
>> +					    struct arm_smmu_device *smmu,
>> +					    int idx, int inst)
>> +{
>> +	u32 fsr, fsynr, cbfrsynra;
>> +	unsigned long iova;
>> +	void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
>> +	void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx);
>> +
>> +	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
>> +	if (!(fsr & ARM_SMMU_FSR_FAULT))
>> +		return IRQ_NONE;
>> +
>> +	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
>> +	iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
>> +	cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
>> +
>> +	dev_err_ratelimited(smmu->dev,
>> +	"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
>> +			    fsr, iova, fsynr, cbfrsynra, idx);
>> +
>> +	writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
>> +{
>> +	int inst, idx;
> 
> Unsigned
> 
>> +	irqreturn_t irq_ret = IRQ_NONE;
>> +	struct iommu_domain *domain = dev;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +	for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
>> +		/*
>> +		 * Interrupt line shared between all context faults.
>> +		 * Check for faults across all contexts.
>> +		 */
>> +		for (idx = 0; idx < smmu->num_context_banks; idx++) {
>> +			irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>> +								 idx, inst);
>> +
>> +			if (irq_ret == IRQ_HANDLED)
>> +				return irq_ret;
> 
> Any reason why we don't check all banks?

As above, we certainly shouldn't bail out without checking the bank for 
the offending domain across all of its instances, and I guess the way 
this works means that we would have to iterate all the banks to achieve 
that.

>> +		}
>> +	}
>> +
>> +	return irq_ret;
>> +}
>> +
>>   static const struct arm_smmu_impl nvidia_smmu_impl = {
>>   	.read_reg = nvidia_smmu_read_reg,
>>   	.write_reg = nvidia_smmu_write_reg,
>> @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
>>   	.write_reg64 = nvidia_smmu_write_reg64,
>>   	.reset = nvidia_smmu_reset,
>>   	.tlb_sync = nvidia_smmu_tlb_sync,
>> +	.global_fault = nvidia_smmu_global_fault,
>> +	.context_fault = nvidia_smmu_context_fault,
>>   };
>>   
>>   struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 243bc4cb2705b..3bb0aba15a356 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -673,6 +673,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>>   	enum io_pgtable_fmt fmt;
>>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>> +	irqreturn_t (*context_fault)(int irq, void *dev);
>>   
>>   	mutex_lock(&smmu_domain->init_mutex);
>>   	if (smmu_domain->smmu)
>> @@ -835,7 +836,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>>   	 * handler seeing a half-initialised domain state.
>>   	 */
>>   	irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
>> -	ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault,
>> +
>> +	if (smmu->impl && smmu->impl->context_fault)
>> +		context_fault = smmu->impl->context_fault;
>> +	else
>> +		context_fault = arm_smmu_context_fault;
> 
> Why not see the default smmu->impl->context_fault to
> arm_smmu_context_fault in arm_smmu_impl_init() and then allow the
> various implementations to override as necessary? Then you can get rid
> of this context_fault variable here and just use
> smmu->impl->context_fault below.

Because the default smmu->impl is NULL. And as I've said before, NAK to 
forcing the common case to allocate a set of "quirks" purely to override 
the default IRQ handler with the default IRQ handler ;)

Robin.

>> +
>> +	ret = devm_request_irq(smmu->dev, irq, context_fault,
>>   			       IRQF_SHARED, "arm-smmu-context-fault", domain);
>>   	if (ret < 0) {
>>   		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
>> @@ -2107,6 +2114,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   	struct arm_smmu_device *smmu;
>>   	struct device *dev = &pdev->dev;
>>   	int num_irqs, i, err;
>> +	irqreturn_t (*global_fault)(int irq, void *dev);
>>   
>>   	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>   	if (!smmu) {
>> @@ -2193,9 +2201,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   		smmu->num_context_irqs = smmu->num_context_banks;
>>   	}
>>   
>> +	if (smmu->impl && smmu->impl->global_fault)
>> +		global_fault = smmu->impl->global_fault;
>> +	else
>> +		global_fault = arm_smmu_global_fault;
>> +
> 
> Same here.
> 
> Jon
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
  2020-06-30  6:01   ` Pritesh Raithatha
  2020-06-30  8:21   ` Jon Hunter
@ 2020-06-30 12:27   ` Robin Murphy
  2020-07-01 18:28     ` Krishna Reddy
  2 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2020-06-30 12:27 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: snikam, nicoleotsuka, mperttunen, bhuntsman, linux-kernel,
	praithatha, talho, iommu, nicolinc, linux-tegra, yhsu, treding,
	will, linux-arm-kernel, bbiswas

On 2020-06-30 01:10, Krishna Reddy wrote:
> Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
> on ARM MMU-500.
> 
> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
> ---
>   Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index d7ceb4c34423b..5b2586ac715ed 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -38,6 +38,11 @@ properties:
>                 - qcom,sc7180-smmu-500
>                 - qcom,sdm845-smmu-500
>             - const: arm,mmu-500
> +      - description: NVIDIA SoCs that use more than one "arm,mmu-500"

Hmm, there must be a better way to word that to express that it only 
applies to the sets of SMMUs that must be programmed identically, and 
not any other independent MMU-500s that might also happen to be in the 
same SoC.

> +        items:
> +          - enum:
> +              - nvdia,tegra194-smmu
> +          - const: arm,mmu-500

Is the fallback compatible appropriate here? If software treats this as 
a standard MMU-500 it will only program the first instance (because the 
second isn't presented as a separate MMU-500) - is there any way that 
isn't going to blow up?

Robin.

>         - items:
>             - const: arm,mmu-500
>             - const: arm,smmu-v2
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30 12:13     ` Robin Murphy
@ 2020-06-30 12:42       ` Jon Hunter
  2020-07-01 18:48       ` Krishna Reddy
  1 sibling, 0 replies; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 12:42 UTC (permalink / raw)
  To: Robin Murphy, Krishna Reddy
  Cc: talho, treding, bhuntsman, linux-kernel, iommu, mperttunen,
	nicoleotsuka, snikam, nicolinc, linux-tegra, yhsu, praithatha,
	will, linux-arm-kernel, bbiswas


On 30/06/2020 13:13, Robin Murphy wrote:
> On 2020-06-30 09:37, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> Add global/context fault hooks to allow NVIDIA SMMU implementation
>>> handle faults across multiple SMMUs.
>>
>> Nit ... this is not just for NVIDIA, but this allows anyone to add
>> custom global/context and fault hooks. So I think that the changelog
>> should be clear that this change permits custom fault hooks and that
>> custom fault hooks are needed for the Tegra194 SMMU. You may also want
>> to say why.
>>
>>>
>>> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
>>> ---
>>>   drivers/iommu/arm-smmu-nvidia.c | 98 +++++++++++++++++++++++++++++++++
>>>   drivers/iommu/arm-smmu.c        | 17 +++++-
>>>   drivers/iommu/arm-smmu.h        |  3 +
>>>   3 files changed, 116 insertions(+), 2 deletions(-)

...

>>> @@ -835,7 +836,13 @@ static int arm_smmu_init_domain_context(struct
>>> iommu_domain *domain,
>>>        * handler seeing a half-initialised domain state.
>>>        */
>>>       irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
>>> -    ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault,
>>> +
>>> +    if (smmu->impl && smmu->impl->context_fault)
>>> +        context_fault = smmu->impl->context_fault;
>>> +    else
>>> +        context_fault = arm_smmu_context_fault;
>>
>> Why not see the default smmu->impl->context_fault to
>> arm_smmu_context_fault in arm_smmu_impl_init() and then allow the
>> various implementations to override as necessary? Then you can get rid
>> of this context_fault variable here and just use
>> smmu->impl->context_fault below.
> 
> Because the default smmu->impl is NULL. And as I've said before, NAK to
> forcing the common case to allocate a set of "quirks" purely to override
> the default IRQ handler with the default IRQ handler ;)


Ah OK, makes sense. Sorry I am a bit late to the review :-)

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  8:19   ` Jon Hunter
@ 2020-06-30 14:53     ` Robin Murphy
  2020-06-30 15:17       ` Jon Hunter
  2020-07-01 18:18       ` Krishna Reddy
  2020-06-30 17:04     ` Krishna Reddy
  1 sibling, 2 replies; 38+ messages in thread
From: Robin Murphy @ 2020-06-30 14:53 UTC (permalink / raw)
  To: Jon Hunter, Krishna Reddy
  Cc: talho, treding, bhuntsman, linux-kernel, iommu, mperttunen,
	nicoleotsuka, snikam, nicolinc, linux-tegra, yhsu, praithatha,
	will, linux-arm-kernel, bbiswas

On 2020-06-30 09:19, Jon Hunter wrote:
> 
> On 30/06/2020 01:10, Krishna Reddy wrote:
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>> string for Tegra194 SoC SMMU topology.
> 
> There is no description here of the 3rd SMMU that you mention below.
> I think that we should describe the full picture here.
>   
>> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>
>> ---
>>   MAINTAINERS                     |   2 +
>>   drivers/iommu/Makefile          |   2 +-
>>   drivers/iommu/arm-smmu-impl.c   |   3 +
>>   drivers/iommu/arm-smmu-nvidia.c | 196 ++++++++++++++++++++++++++++++++
>>   drivers/iommu/arm-smmu.h        |   1 +
>>   5 files changed, 203 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/iommu/arm-smmu-nvidia.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 7b5ffd646c6b9..64c37dbdd4426 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -16808,8 +16808,10 @@ F:	drivers/i2c/busses/i2c-tegra.c
>>   
>>   TEGRA IOMMU DRIVERS
>>   M:	Thierry Reding <thierry.reding@gmail.com>
>> +R:	Krishna Reddy <vdumpa@nvidia.com>
>>   L:	linux-tegra@vger.kernel.org
>>   S:	Supported
>> +F:	drivers/iommu/arm-smmu-nvidia.c
>>   F:	drivers/iommu/tegra*
>>   
>>   TEGRA KBC DRIVER
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index 342190196dfb0..2b8203db73ec3 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
>>   obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
>>   obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
>>   obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
>> -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
>> +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
>>   obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>>   obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
>>   obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
>> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
>> index c75b9d957b702..70f7318017617 100644
>> --- a/drivers/iommu/arm-smmu-impl.c
>> +++ b/drivers/iommu/arm-smmu-impl.c
>> @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>>   	if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
>>   		smmu->impl = &calxeda_impl;
>>   
>> +	if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))

Nit: please use "np" like all the surrounding code does.

>> +		return nvidia_smmu_impl_init(smmu);
>> +
>>   	if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
>>   	    of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
>>   		return qcom_smmu_impl_init(smmu);
>> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
>> new file mode 100644
>> index 0000000000000..1124f0ac1823a
>> --- /dev/null
>> +++ b/drivers/iommu/arm-smmu-nvidia.c
>> @@ -0,0 +1,196 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +// NVIDIA ARM SMMU v2 implementation quirks
>> +// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
>> +
>> +#include <linux/bitfield.h>
>> +#include <linux/delay.h>
>> +#include <linux/of.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/slab.h>
>> +
>> +#include "arm-smmu.h"
>> +
>> +/*
>> + * Tegra194 has three ARM MMU-500 Instances.
>> + * Two of them are used together for interleaved IOVA accesses and
>> + * used by non-isochronous HW devices for SMMU translations.
>> + * Third one is used for SMMU translations from isochronous HW devices.
>> + * It is possible to use this implementation to program either
>> + * all three or two of the instances identically as desired through
>> + * DT node.
>> + *
>> + * Programming all the three instances identically comes with redundant TLB
>> + * invalidations as all three never need to be TLB invalidated for a HW device.
>> + *
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 device
>> + * in DT and be programmed independently for efficient TLB invalidates.

I don't understand the "When" there - the driver has always supported 
multiple independent SMMUs, and it's not something that could be 
configured out or otherwise disabled. Plus I really don't see why you 
would ever want to force unrelated SMMUs to be programmed together - 
beyond the TLB thing mentioned it would also waste precious context bank 
resources and might lead to weird device grouping via false stream ID 
aliasing, with no obvious upside at all.

>> + */
>> +#define MAX_SMMU_INSTANCES 3
>> +
>> +#define TLB_LOOP_TIMEOUT_IN_US		1000000	/* 1s! */
>> +#define TLB_SPIN_COUNT			10
>> +
>> +struct nvidia_smmu {
>> +	struct arm_smmu_device	smmu;
>> +	unsigned int		num_inst;
>> +	void __iomem		*bases[MAX_SMMU_INSTANCES];
>> +};
>> +
>> +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
>> +{
>> +	return container_of(smmu, struct nvidia_smmu, smmu);
>> +}
>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +			       unsigned int inst, int page)
> 
> If you run checkpatch --strict on these you will get a lot of ...
> 
> CHECK: Alignment should match open parenthesis
> #116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
> +			       unsigned int inst, int page)
> 
> We should fix these.
> 
>> +{
>> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +	if (!nvidia_smmu->bases[0])
>> +		nvidia_smmu->bases[0] = smmu->base;
>> +
>> +	return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
>> +}
>> +
>> +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
>> +			      int page, int offset)
>> +{
>> +	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
>> +
>> +	return readl_relaxed(reg);
>> +}
>> +
>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +			    int page, int offset, u32 val)
>> +{
>> +	unsigned int i;
>> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +	for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
> 
> Personally, I would declare 'reg' outside of the loop as I feel it will make
> the code cleaner and easier to read.
> 
>> +
>> +		writel_relaxed(val, reg);
>> +	}
>> +}
>> +
>> +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
>> +				int page, int offset)
>> +{
>> +	void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
>> +
>> +	return readq_relaxed(reg);
>> +}
>> +
>> +static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
>> +				  int page, int offset, u64 val)
>> +{
>> +	unsigned int i;
>> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +	for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
>> +
>> +		writeq_relaxed(val, reg);
>> +	}
>> +}
>> +
>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
>> +			   int sync, int status)
>> +{
>> +	unsigned int delay;
>> +
>> +	arm_smmu_writel(smmu, page, sync, 0);
>> +
>> +	for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
> 
> So we are doubling the delay every time? Is this better than just using
> the same on each loop?

This is the same logic as the main driver (see 8513c8930069) - the sync 
is expected to complete relatively quickly, hence why we have the inner 
spin loop to avoid the delay entirely in the typical case, and the 
longer it's taking, the more likely it is that something's wrong and it 
will never complete anyway. Realistically, a heavily loaded SMMU at a 
modest clock rate might take us through a couple of iterations of the 
outer loop, but beyond that we're pretty much just killing time until we 
declare it wedged and give up, and by then there's not much point in 
burning power frantically hamering on the interconnect.

> 
>> +		unsigned int spin_cnt;
>> +
>> +		for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>> +			u32 val = 0;
>> +			unsigned int i;
>> +			struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
> 
> Why not do this once at the beginning of the function?
> 
>> +
>> +			for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +				void __iomem *reg =
>> +					nvidia_smmu_page(smmu, i, page) + status;
>> +
>> +				val |= readl_relaxed(reg);
>> +			}
>> +
>> +			if (!(val & ARM_SMMU_sTLBGSTATUS_GSACTIVE))
>> +				return;
>> +
>> +			cpu_relax();
>> +		}
>> +
>> +		udelay(delay);
>> +	}
>> +
>> +	dev_err_ratelimited(smmu->dev,
>> +			    "TLB sync timed out -- SMMU may be deadlocked\n");
>> +}
>> +
>> +static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) {
>> +		u32 val;
>> +		void __iomem *reg = nvidia_smmu_page(smmu, i, ARM_SMMU_GR0) +
>> +				    ARM_SMMU_GR0_sGFSR;
> 
> I feel that declaring variables here clutters the code.
>   
>> +
>> +		/* clear global FSR */
>> +		val = readl_relaxed(reg);
>> +		writel_relaxed(val, reg);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct arm_smmu_impl nvidia_smmu_impl = {
>> +	.read_reg = nvidia_smmu_read_reg,
>> +	.write_reg = nvidia_smmu_write_reg,
>> +	.read_reg64 = nvidia_smmu_read_reg64,
>> +	.write_reg64 = nvidia_smmu_write_reg64,
>> +	.reset = nvidia_smmu_reset,
>> +	.tlb_sync = nvidia_smmu_tlb_sync,
>> +};
>> +
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
>> +{
>> +	unsigned int i;
>> +	struct nvidia_smmu *nvidia_smmu;
>> +	struct platform_device *pdev = to_platform_device(smmu->dev);
>> +
>> +	nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
>> +	if (!nvidia_smmu)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	nvidia_smmu->smmu = *smmu;
>> +	/* Instance 0 is ioremapped by arm-smmu.c after this function returns */
>> +	nvidia_smmu->num_inst = 1;
>> +
>> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +		struct resource *res;
>> +
>> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +		if (!res)
>> +			break;
>> +
>> +		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +		if (IS_ERR(nvidia_smmu->bases[i]))
>> +			return ERR_CAST(nvidia_smmu->bases[i]);
>> +
>> +		nvidia_smmu->num_inst++;
>> +	}
>> +
>> +	nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
>> +	/*
>> +	 * Free the arm_smmu_device struct allocated in arm-smmu.c.
>> +	 * Once this function returns, arm-smmu.c would use arm_smmu_device
>> +	 * allocated as part of nvidia_smmu struct.
>> +	 */
>> +	devm_kfree(smmu->dev, smmu);
> 
> Why don't we just store the pointer of the smmu struct passed to this function
> in the nvidia_smmu struct and then we do not need to free this here. In other
> words make ...
> 
>   struct nvidia_smmu {
> 	struct arm_smmu_device	*smmu;
> 	unsigned int		num_inst;
> 	void __iomem		*bases[MAX_SMMU_INSTANCES];
>   };
> 
> This seems more appropriate, than copying the struct and freeing memory
> allocated else-where.

But then how do you get back to struct nvidia_smmu given just a pointer 
to struct arm_smmu_device?

I'll admit my quickly-hacked-up design for post-hoc subclassing isn't 
the prettiest, but in fairness it was only ever intended for a couple of 
exceptional cases. I guess it would be equally possible to wrap struct 
arm_smmu_impl, and container_of() back to the private data from 
smmu->impl itself, which avoids the minor weirdness of reassigning the 
smmu pointer in the middle of probing, but then you'd have to store the 
function pointers in writeable memory, which in principle might make the 
integrity folks pull a sad face.

The aim was to avoid the non-architectural stuff cluttering up the main 
driver as far as possible, which is why I preferred a subclassing 
approach over cramming more random pointers into struct arm_smmu_device. 
Not to say it can't be changed if you've got a really solid idea for an 
alternative...

Robin.

>   
>> +
>> +	return &nvidia_smmu->smmu;
>> +}
>> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
>> index d172c024be618..8cf1511ed9874 100644
>> --- a/drivers/iommu/arm-smmu.h
>> +++ b/drivers/iommu/arm-smmu.h
>> @@ -450,6 +450,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
>>   	arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v))
>>   
>>   struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
>>   struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
>>   
>>   int arm_mmu500_reset(struct arm_smmu_device *smmu);
>>
> 
> Cheers
> Jon
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 14:53     ` Robin Murphy
@ 2020-06-30 15:17       ` Jon Hunter
  2020-07-01 18:18       ` Krishna Reddy
  1 sibling, 0 replies; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 15:17 UTC (permalink / raw)
  To: Robin Murphy, Krishna Reddy
  Cc: talho, treding, bhuntsman, linux-kernel, iommu, mperttunen,
	nicoleotsuka, snikam, nicolinc, linux-tegra, yhsu, praithatha,
	will, linux-arm-kernel, bbiswas


On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>>  
>>> Signed-off-by: Krishna Reddy <vdumpa@nvidia.com>

...

>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> +               int sync, int status)
>>> +{
>>> +    unsigned int delay;
>>> +
>>> +    arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> +    for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
> 
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.

Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.

 >>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> +    unsigned int i;
>>> +    struct nvidia_smmu *nvidia_smmu;
>>> +    struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> +    nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> +    if (!nvidia_smmu)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    nvidia_smmu->smmu = *smmu;
>>> +    /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> +    nvidia_smmu->num_inst = 1;
>>> +
>>> +    for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +        struct resource *res;
>>> +
>>> +        res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +        if (!res)
>>> +            break;
>>> +
>>> +        nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +        if (IS_ERR(nvidia_smmu->bases[i]))
>>> +            return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> +        nvidia_smmu->num_inst++;
>>> +    }
>>> +
>>> +    nvidia_smmu->smmu.impl = &nvidia_smmu_impl;
>>> +    /*
>>> +     * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> +     * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> +     * allocated as part of nvidia_smmu struct.
>>> +     */
>>> +    devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>>   struct nvidia_smmu {
>>     struct arm_smmu_device    *smmu;
>>     unsigned int        num_inst;
>>     void __iomem        *bases[MAX_SMMU_INSTANCES];
>>   };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
> 
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?

Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.

Cheers
Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 10:17   ` Jon Hunter
@ 2020-06-30 16:23     ` Krishna Reddy
  2020-06-30 16:32       ` Jon Hunter
  0 siblings, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30 16:23 UTC (permalink / raw)
  To: Jonathan Hunter
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas

>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +	unsigned int i;
....
>> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +		struct resource *res;
>> +
>> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +		if (!res)
>> +			break;

>Currently this driver is only supported for Tegra194 which I understand has 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think it is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is misconfig in DT, SMMU faults would catch issues.

>> +		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +		if (IS_ERR(nvidia_smmu->bases[i]))
>> +			return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 16:23     ` Krishna Reddy
@ 2020-06-30 16:32       ` Jon Hunter
  2020-06-30 16:44         ` Jon Hunter
  0 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 16:32 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas



On 30/06/2020 17:23, Krishna Reddy wrote:
>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>>> +*smmu) {
>>> +	unsigned int i;
> ....
>>> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +		struct resource *res;
>>> +
>>> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +		if (!res)
>>> +			break;
> 
>> Currently this driver is only supported for Tegra194 which I understand has 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think it is better to return an error if all 3 cannot be initialised.
> 
> Initialization of all the three SMMU instances is not necessary here.

That is not what I am saying.

> The driver can work with all the possible number of instances 1, 2 and 3 based on the DT config though it doesn't make much sense to use it with 1 instance.
> There is no silent failure here from driver point of view. If there is misconfig in DT, SMMU faults would catch issues.

I disagree and you should return a proper error here.

>>> +		nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +		if (IS_ERR(nvidia_smmu->bases[i]))
>>> +			return ERR_CAST(nvidia_smmu->bases[i]);
> 
>> You want to use PTR_ERR() here.
> 
> PTR_ERR() returns long integer. 
> This function returns a pointer. ERR_CAST is the right one to use here. 

Ah yes, indeed. OK that's fine.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 16:32       ` Jon Hunter
@ 2020-06-30 16:44         ` Jon Hunter
  2020-06-30 17:16           ` Krishna Reddy
  0 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 16:44 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas


On 30/06/2020 17:32, Jon Hunter wrote:
> On 30/06/2020 17:23, Krishna Reddy wrote:
>>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>>>> +*smmu) {
>>>> +	unsigned int i;
>> ....
>>>> +	for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>>> +		struct resource *res;
>>>> +
>>>> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>>> +		if (!res)
>>>> +			break;
>>
>>> Currently this driver is only supported for Tegra194 which I understand has 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think it is better to return an error if all 3 cannot be initialised.
>>
>> Initialization of all the three SMMU instances is not necessary here.
> 
> That is not what I am saying.
> 
>> The driver can work with all the possible number of instances 1, 2 and 3 based on the DT config though it doesn't make much sense to use it with 1 instance.
>> There is no silent failure here from driver point of view. If there is misconfig in DT, SMMU faults would catch issues.
> 
> I disagree and you should return a proper error here.

OK, well I see what you are saying, but if we intended to support all 3
for Tegra194, then we should ensure all 3 are initialised correctly.

My concern here is testing, because when things break in upstream I am
usually the one that tracks it down. Not having clear warning/error
messages when something is not initialised as expected makes it harder.

It would be better to query the number of SMMUs populated in device-tree
and then ensure that all are initialised correctly.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30  8:19   ` Jon Hunter
  2020-06-30 14:53     ` Robin Murphy
@ 2020-06-30 17:04     ` Krishna Reddy
  1 sibling, 0 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30 17:04 UTC (permalink / raw)
  To: Jonathan Hunter
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas

>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit message.
However, Implementation supports option to configure 3 instances identically with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +			       unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+			       unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +			    int page, int offset, u32 val) {
>> +	unsigned int i;
>> +	struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +	for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +		void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 16:44         ` Jon Hunter
@ 2020-06-30 17:16           ` Krishna Reddy
  2020-06-30 19:03             ` Jon Hunter
  0 siblings, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30 17:16 UTC (permalink / raw)
  To: Jonathan Hunter
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas

>OK, well I see what you are saying, but if we intended to support all 3 for Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = <inst0_base, size>, <inst1_base, size>, <inst2_base, size>;
The reg can have all three or less and driver just configures based on reg and it works fine.

>It would be better to query the number of SMMUs populated in device-tree and then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 17:16           ` Krishna Reddy
@ 2020-06-30 19:03             ` Jon Hunter
  2020-06-30 20:21               ` Krishna Reddy
  0 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-06-30 19:03 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas


On 30/06/2020 18:16, Krishna Reddy wrote:
>> OK, well I see what you are saying, but if we intended to support all 3 for Tegra194, then we should ensure all 3 are initialised correctly.
> 
> The driver intend to support up to 3 instances. It doesn't really mandate that all three instances be present in same DT node.
> Each mmio aperture in "reg" property is an instance here. reg = <inst0_base, size>, <inst1_base, size>, <inst2_base, size>;
> The reg can have all three or less and driver just configures based on reg and it works fine.

So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we
have up to 3 (for Tegra194). So the question is do we have a use-case
where we only use 2 and not 3? If not, then it still seems that we
should require that all 3 are present.

The other problem I see here is that currently the arm-smmu binding
defines the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe
that this will get caught by the 'dt_binding_check' when we try to
populate the binding.

>> It would be better to query the number of SMMUs populated in device-tree and then ensure that all are initialised correctly.
> 
> Getting the IORESOURCE_MEM is the way to count the instances driver need to support.  
> In a way, It is already querying through IORESOURCE_MEM here. 

Yes I was wondering that. I think we just need to decide if the 3rd SMMU
is optional or not. The DT binding should detail and min and max supported.	

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 19:03             ` Jon Hunter
@ 2020-06-30 20:21               ` Krishna Reddy
  0 siblings, 0 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-06-30 20:21 UTC (permalink / raw)
  To: Jonathan Hunter
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	will, linux-kernel, Pritesh Raithatha, Timo Alho, iommu,
	Nicolin Chen, linux-tegra, Yu-Huan Hsu, Thierry Reding,
	robin.murphy, linux-arm-kernel, Bitan Biswas

>> The driver intend to support up to 3 instances. It doesn't really mandate that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> <inst0_base, size>, <inst1_base, size>, <inst2_base, size>; The reg can have all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have up to 3 (for Tegra194). So the question is do we have a use-case where we only use 2 and not 3? If not, then it still seems that we should require that all 3 are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let me fail the one instance case as it can use regular arm smmu implementation and don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-06-30 14:53     ` Robin Murphy
  2020-06-30 15:17       ` Jon Hunter
@ 2020-07-01 18:18       ` Krishna Reddy
  2020-07-01 18:56         ` Robin Murphy
  1 sibling, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 18:18 UTC (permalink / raw)
  To: Robin Murphy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple independent SMMUs, and it's not something that could be configured out or otherwise disabled. Plus I really don't see why you would ever want to force unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it would also waste precious context bank resources and might lead to weird device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support multiple SMMU drivers at the same is not possible as one of them(that gets probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this point, It makes more sense to just get rid of 3rd instance programming in arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical programming.

-KR



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-06-30 12:27   ` Robin Murphy
@ 2020-07-01 18:28     ` Krishna Reddy
  2020-07-01 18:47       ` Jon Hunter
  0 siblings, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 18:28 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas

>> +      - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> Hmm, there must be a better way to word that to express that it only applies to the sets of SMMUs that must be programmed identically, and not any other independent MMU-500s that might also happen to be in the same SoC.

Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s identically".

>> +        items:
>> +          - enum:
>> +              - nvdia,tegra194-smmu
>> +          - const: arm,mmu-500

>Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?

When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.

-KR

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 18:28     ` Krishna Reddy
@ 2020-07-01 18:47       ` Jon Hunter
  2020-07-01 19:00         ` Krishna Reddy
  2020-07-01 19:03         ` Robin Murphy
  0 siblings, 2 replies; 38+ messages in thread
From: Jon Hunter @ 2020-07-01 18:47 UTC (permalink / raw)
  To: Krishna Reddy, Robin Murphy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas


On 01/07/2020 19:28, Krishna Reddy wrote:
>>> +      - description: NVIDIA SoCs that use more than one "arm,mmu-500"
>> Hmm, there must be a better way to word that to express that it only applies to the sets of SMMUs that must be programmed identically, and not any other independent MMU-500s that might also happen to be in the same SoC.
> 
> Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s identically".
> 
>>> +        items:
>>> +          - enum:
>>> +              - nvdia,tegra194-smmu
>>> +          - const: arm,mmu-500
> 
>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
> 
> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.

The problem is, if for some reason someone had a Tegra194, but only set
the compatible string to 'arm,mmu-500' it would assume that it was a
normal arm,mmu-500 and only one instance would be programmed. We always
want at least 2 of the 3 instances programmed and so we should only
match 'nvidia,tegra194-smmu'. In fact, I think that we also need to
update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with
the data set to &arm_mmu500.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-06-30 12:13     ` Robin Murphy
  2020-06-30 12:42       ` Jon Hunter
@ 2020-07-01 18:48       ` Krishna Reddy
  2020-07-01 19:14         ` Robin Murphy
  1 sibling, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 18:48 UTC (permalink / raw)
  To: Robin Murphy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

>>> +    for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>> +            irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>> +            if (irq_ret == IRQ_HANDLED)
>>> +                    return irq_ret;
>>
>> Any chance there could be more than one SMMU faulting by the time we 
>> service the interrupt?

>It certainly seems plausible if the interconnect is automatically load-balancing requests across the SMMU instances - say a driver bug caused a buffer to be unmapped too early, there could be many in-flight accesses to parts of that buffer that aren't all taking the same path and thus could now fault in parallel.
>[ And anyone inclined to nitpick global vs. context faults, s/unmap a buffer/tear down a domain/ ;) ]
>Either way I think it would be easier to reason about if we just handled these like a typical shared interrupt and always checked all the instances.

It would be optimal to check at the same time across all instances. 

>>> +            for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>> +                    irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>> +                                                             idx, 
>>> + inst);
>>> +
>>> +                    if (irq_ret == IRQ_HANDLED)
>>> +                            return irq_ret;
>>
>> Any reason why we don't check all banks?

>As above, we certainly shouldn't bail out without checking the bank for the offending domain across all of its instances, and I guess the way this works means that we would have to iterate all the banks to achieve that.

With shared irq line, the context fault identification is not optimal already.  Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs. 

-KR
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-07-01 18:18       ` Krishna Reddy
@ 2020-07-01 18:56         ` Robin Murphy
  2020-07-01 19:12           ` Krishna Reddy
  0 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2020-07-01 18:56 UTC (permalink / raw)
  To: Krishna Reddy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

On 2020-07-01 19:18, Krishna Reddy wrote:
>>> + * When Linux kernel supports multiple SMMU devices, the SMMU
>>> device +used for + * isochornous HW devices should be added as a
>>> separate ARM MMU-500 +device + * in DT and be programmed
>>> independently for efficient TLB invalidates.
> 
>> I don't understand the "When" there - the driver has always
>> supported multiple independent SMMUs, and it's not something that
>> could be configured out or otherwise disabled. Plus I really don't
>> see why you would ever want to force unrelated SMMUs to be
>> >programmed together - beyond the TLB thing mentioned it would also
>> waste precious context bank resources and might lead to weird
>> device grouping via false stream ID aliasing, with no obvious
>> upside at all.
> 
> Sorry, I missed this comment. During the initial patches, when the
> iommu_ops were different between, support multiple SMMU drivers at
> the same is not possible as one of them(that gets probed last)
> overwrites the platform bus ops. On revisiting the original issue,
> This problem is no longer relevant. At this point, It makes more
> sense to just get rid of 3rd instance programming in
> arm-smmu-nvidia.c and just limit it to the SMMU instances that need
> identical programming.

Yeah, I realised later last night that this probably originated from 
forking the whole driver downstream. But even then you could have 
treated the other one as a separate nsmmu with a single instance ;)

Since it does add a bit of confusion to the code and comments, let's 
just keep things simple. I do like Jon's suggestion of actually 
enforcing that the number of "reg" regions exactly matches the number 
expected for the given compatible - I guess for now that means just 
hard-coding 2 and hoping the hardware folks don't cook up any more of 
these...

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 18:47       ` Jon Hunter
@ 2020-07-01 19:00         ` Krishna Reddy
  2020-07-01 19:31           ` Jon Hunter
  2020-07-01 19:03         ` Robin Murphy
  1 sibling, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 19:00 UTC (permalink / raw)
  To: Jonathan Hunter, Robin Murphy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas

>>>> +        items:
>>>> +          - enum:
>>>> +              - nvdia,tegra194-smmu
>>>> +          - const: arm,mmu-500
> >
>>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
> >
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.

>The problem is, if for some reason someone had a Tegra194, but only set the compatible string to 'arm,mmu-500' it would assume that it was a normal arm,mmu-500 and only one instance would be programmed. We always want at least 2 of the 3 instances >programmed and so we should only match 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to &arm_mmu500.

In that case, new binding "nvidia,smmu-v2" can be added with data set to &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future. 

-KR
--
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 18:47       ` Jon Hunter
  2020-07-01 19:00         ` Krishna Reddy
@ 2020-07-01 19:03         ` Robin Murphy
  1 sibling, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2020-07-01 19:03 UTC (permalink / raw)
  To: Jon Hunter, Krishna Reddy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas

On 2020-07-01 19:47, Jon Hunter wrote:
> 
> On 01/07/2020 19:28, Krishna Reddy wrote:
>>>> +      - description: NVIDIA SoCs that use more than one "arm,mmu-500"
>>> Hmm, there must be a better way to word that to express that it only applies to the sets of SMMUs that must be programmed identically, and not any other independent MMU-500s that might also happen to be in the same SoC.
>>
>> Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s identically".
>>
>>>> +        items:
>>>> +          - enum:
>>>> +              - nvdia,tegra194-smmu
>>>> +          - const: arm,mmu-500
>>
>>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
>>
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.
> 
> The problem is, if for some reason someone had a Tegra194, but only set
> the compatible string to 'arm,mmu-500' it would assume that it was a
> normal arm,mmu-500 and only one instance would be programmed. We always
> want at least 2 of the 3 instances programmed and so we should only
> match 'nvidia,tegra194-smmu'. In fact, I think that we also need to
> update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with
> the data set to &arm_mmu500.

Right, the main concern is if the shiny new DT gets passed to an older 
kernel (or other OS entirely) which doesn't know the 
"nvdia,tegra194-smmu" compatible but would match on "arm,mmu-500" and 
bind a standard driver thinking it's going to work OK. The user would 
probably prefer that no driver matched and both instances were left 
turned off, than face the fallout of only one of them being set up.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  2020-07-01 18:56         ` Robin Murphy
@ 2020-07-01 19:12           ` Krishna Reddy
  0 siblings, 0 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 19:12 UTC (permalink / raw)
  To: Robin Murphy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

>Yeah, I realised later last night that this probably originated from forking the whole driver downstream. But even then you could have treated the other one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only handle one instance of usage. With your implementation hooks design, it should be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep things simple. I do like Jon's suggestion of actually enforcing that the number of "reg" regions exactly matches the number expected for the given compatible - I guess for now that means just hard-coding 2 and hoping the hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two MMU-500s together as of now. 

-KR 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-07-01 18:48       ` Krishna Reddy
@ 2020-07-01 19:14         ` Robin Murphy
  2020-07-01 19:22           ` Krishna Reddy
  0 siblings, 1 reply; 38+ messages in thread
From: Robin Murphy @ 2020-07-01 19:14 UTC (permalink / raw)
  To: Krishna Reddy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

On 2020-07-01 19:48, Krishna Reddy wrote:
>>>> +    for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>>> +            irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>>> +            if (irq_ret == IRQ_HANDLED)
>>>> +                    return irq_ret;
>>>
>>> Any chance there could be more than one SMMU faulting by the time we
>>> service the interrupt?
> 
>> It certainly seems plausible if the interconnect is automatically load-balancing requests across the SMMU instances - say a driver bug caused a buffer to be unmapped too early, there could be many in-flight accesses to parts of that buffer that aren't all taking the same path and thus could now fault in parallel.
>> [ And anyone inclined to nitpick global vs. context faults, s/unmap a buffer/tear down a domain/ ;) ]
>> Either way I think it would be easier to reason about if we just handled these like a typical shared interrupt and always checked all the instances.
> 
> It would be optimal to check at the same time across all instances.
> 
>>>> +            for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>>> +                    irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>>> +                                                             idx,
>>>> + inst);
>>>> +
>>>> +                    if (irq_ret == IRQ_HANDLED)
>>>> +                            return irq_ret;
>>>
>>> Any reason why we don't check all banks?
> 
>> As above, we certainly shouldn't bail out without checking the bank for the offending domain across all of its instances, and I guess the way this works means that we would have to iterate all the banks to achieve that.
> 
> With shared irq line, the context fault identification is not optimal already.  Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs.

Right, I did ponder the idea of a whole programmatic 
"request_context_irq" hook that would allow registering the handler for 
both interrupts with the appropriate context bank and instance data, but 
since all interrupts are currently unexpected it seems somewhat hard to 
justify the extra complexity. Obviously we can revisit this in future if 
you want to start actually doing something with faults like the qcom GPU 
folks do.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
  2020-07-01 19:14         ` Robin Murphy
@ 2020-07-01 19:22           ` Krishna Reddy
  0 siblings, 0 replies; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 19:22 UTC (permalink / raw)
  To: Robin Murphy, Jonathan Hunter
  Cc: Timo Alho, Thierry Reding, Bryan Huntsman, linux-kernel, iommu,
	Mikko Perttunen, nicoleotsuka, Sachin Nikam, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Pritesh Raithatha, will,
	linux-arm-kernel, Bitan Biswas

>> With shared irq line, the context fault identification is not optimal already.  Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs.

>Right, I did ponder the idea of a whole programmatic "request_context_irq" hook that would allow registering the handler for both interrupts with the appropriate context bank and instance data, but since all interrupts are currently unexpected it seems somewhat hard to justify the extra complexity. Obviously we can revisit this in future if you want to start actually doing something with faults like the qcom GPU folks do.

Thanks, I would just avoid making changes to interrupt handlers till it is really necessary in future. The current code would just be simple and functional with more interrupts when there are multiple faults.

-KR
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 19:00         ` Krishna Reddy
@ 2020-07-01 19:31           ` Jon Hunter
  2020-07-01 19:39             ` Krishna Reddy
  0 siblings, 1 reply; 38+ messages in thread
From: Jon Hunter @ 2020-07-01 19:31 UTC (permalink / raw)
  To: Krishna Reddy, Robin Murphy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas


On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>> +        items:
>>>>> +          - enum:
>>>>> +              - nvdia,tegra194-smmu
>>>>> +          - const: arm,mmu-500
>>>
>>>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
>>>
>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.
> 
>> The problem is, if for some reason someone had a Tegra194, but only set the compatible string to 'arm,mmu-500' it would assume that it was a normal arm,mmu-500 and only one instance would be programmed. We always want at least 2 of the 3 instances >programmed and so we should only match 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to &arm_mmu500.
> 
> In that case, new binding "nvidia,smmu-v2" can be added with data set to &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future. 

I think you would be better off with nvidia,smmu-500 as smmu-v2 appears
to be something different. I see others have a smmu-v2 but I am not sure
if that is legacy. We have an smmu-500 and so that would seem more
appropriate.

Jon

-- 
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 19:31           ` Jon Hunter
@ 2020-07-01 19:39             ` Krishna Reddy
  2020-07-02 16:05               ` Robin Murphy
  0 siblings, 1 reply; 38+ messages in thread
From: Krishna Reddy @ 2020-07-01 19:39 UTC (permalink / raw)
  To: Jonathan Hunter, Robin Murphy
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas

On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>>> +        items:
>>>>>> +          - enum:
>>>>>> +              - nvdia,tegra194-smmu
>>>>>> +          - const: arm,mmu-500
>>>>
>>>>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
>>>>
>>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.
>> 
>>> The problem is, if for some reason someone had a Tegra194, but only set the compatible string to 'arm,mmu-500' it would assume that it was a normal arm,mmu-500 and only one instance would be programmed. We always want at least 2 of the 3 instances >>programmed and so we should only match 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to &arm_mmu500.
>> 
>> In that case, new binding "nvidia,smmu-v2" can be added with data set to &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future. 

>I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be something different. I see others have a smmu-v2 but I am not sure if that is legacy. We have an smmu-500 and so that would seem more appropriate.

I tried to use the binding synonymous to other vendors. 
V2 is the architecture version.  MMU-500 is the actual implementation from ARM based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as nvidia,smmu-500 or similar as well.
Others probably having their own implementation based on V2 arch.

KR
--
nvpublic
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  2020-07-01 19:39             ` Krishna Reddy
@ 2020-07-02 16:05               ` Robin Murphy
  0 siblings, 0 replies; 38+ messages in thread
From: Robin Murphy @ 2020-07-02 16:05 UTC (permalink / raw)
  To: Krishna Reddy, Jonathan Hunter
  Cc: Sachin Nikam, nicoleotsuka, Mikko Perttunen, Bryan Huntsman,
	linux-kernel, Pritesh Raithatha, Timo Alho, iommu, Nicolin Chen,
	linux-tegra, Yu-Huan Hsu, Thierry Reding, will, linux-arm-kernel,
	Bitan Biswas

On 2020-07-01 20:39, Krishna Reddy wrote:
> On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>>>> +        items:
>>>>>>> +          - enum:
>>>>>>> +              - nvdia,tegra194-smmu
>>>>>>> +          - const: arm,mmu-500
>>>>>
>>>>>> Is the fallback compatible appropriate here? If software treats this as a standard MMU-500 it will only program the first instance (because the second isn't presented as a separate MMU-500) - is there any way that isn't going to blow up?
>>>>>
>>>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully.
>>>
>>>> The problem is, if for some reason someone had a Tegra194, but only set the compatible string to 'arm,mmu-500' it would assume that it was a normal arm,mmu-500 and only one instance would be programmed. We always want at least 2 of the 3 instances >>programmed and so we should only match 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to &arm_mmu500.
>>>
>>> In that case, new binding "nvidia,smmu-v2" can be added with data set to &arm_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future.
> 
>> I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be something different. I see others have a smmu-v2 but I am not sure if that is legacy. We have an smmu-500 and so that would seem more appropriate.
> 
> I tried to use the binding synonymous to other vendors.
> V2 is the architecture version.  MMU-500 is the actual implementation from ARM based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as nvidia,smmu-500 or similar as well.

Yup, that sounds OK to me if you want a broader compatible to 
potentially match other future SoCs as well.

> Others probably having their own implementation based on V2 arch.

Exactly - "cavium,smmu-v2" and "qcom,smmu-v2" are their own in-house 
microarchitectures, not one of Arm's designs, so they don't really have 
a suitable 'product name' we could have used for the bindings.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-07-02 16:05 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30  0:10 [PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation Krishna Reddy
2020-06-30  0:10 ` [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage Krishna Reddy
2020-06-30  0:16   ` Nicolin Chen
2020-06-30  5:54   ` Pritesh Raithatha
2020-06-30  8:19   ` Jon Hunter
2020-06-30 14:53     ` Robin Murphy
2020-06-30 15:17       ` Jon Hunter
2020-07-01 18:18       ` Krishna Reddy
2020-07-01 18:56         ` Robin Murphy
2020-07-01 19:12           ` Krishna Reddy
2020-06-30 17:04     ` Krishna Reddy
2020-06-30 10:17   ` Jon Hunter
2020-06-30 16:23     ` Krishna Reddy
2020-06-30 16:32       ` Jon Hunter
2020-06-30 16:44         ` Jon Hunter
2020-06-30 17:16           ` Krishna Reddy
2020-06-30 19:03             ` Jon Hunter
2020-06-30 20:21               ` Krishna Reddy
2020-06-30  0:10 ` [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU Krishna Reddy
2020-06-30  6:01   ` Pritesh Raithatha
2020-06-30  8:21   ` Jon Hunter
2020-06-30 12:27   ` Robin Murphy
2020-07-01 18:28     ` Krishna Reddy
2020-07-01 18:47       ` Jon Hunter
2020-07-01 19:00         ` Krishna Reddy
2020-07-01 19:31           ` Jon Hunter
2020-07-01 19:39             ` Krishna Reddy
2020-07-02 16:05               ` Robin Murphy
2020-07-01 19:03         ` Robin Murphy
2020-06-30  0:10 ` [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks Krishna Reddy
2020-06-30  0:19   ` Nicolin Chen
2020-06-30  5:58   ` Pritesh Raithatha
2020-06-30  8:37   ` Jon Hunter
2020-06-30 12:13     ` Robin Murphy
2020-06-30 12:42       ` Jon Hunter
2020-07-01 18:48       ` Krishna Reddy
2020-07-01 19:14         ` Robin Murphy
2020-07-01 19:22           ` Krishna Reddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).