All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] iommu/arm: Add support for non-coherent page tables
@ 2019-01-17  9:27 ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: linux-arm-kernel, linux-kernel, linux-arm-msm, tfiga, Vivek Gautam

As discussed in the Qcom system cache support thread [1], it is
imperative that we enable the support for non-cacheable page tables
for SMMU implementations for which removing snoop latency on walks
by making mappings as non-cacheable, outweighs the cost of cache
maintenance on PTE updates.

This series adds a new SMMU device tree option to let the particular
SMMU configuration setup cacheable or non-cacheable mappings for
page-tables out of box. We set a new quirk for i/o page tables -
IO_PGTABLE_QUIRK_NON_COHERENT, that lets us set different TCR
configurations.

This quirk enables the non-cacheable page tables for all masters
sitting on SMMU. Should this control be available per smmu_domain
as each master may have a different perf requirement?
Enabling this for the entire SMMU may not be desirable for all
masters.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Vivek Gautam (2):
  iommu/io-pgtable-arm: Add support for non-coherent page tables
  iommu/arm-smmu: Add support for non-coherent page table mappings

 drivers/iommu/arm-smmu.c       |  7 +++++++
 drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
 drivers/iommu/io-pgtable.h     |  6 ++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 0/2] iommu/arm: Add support for non-coherent page tables
@ 2019-01-17  9:27 ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: Vivek Gautam, linux-arm-msm, linux-kernel, linux-arm-kernel, tfiga

As discussed in the Qcom system cache support thread [1], it is
imperative that we enable the support for non-cacheable page tables
for SMMU implementations for which removing snoop latency on walks
by making mappings as non-cacheable, outweighs the cost of cache
maintenance on PTE updates.

This series adds a new SMMU device tree option to let the particular
SMMU configuration setup cacheable or non-cacheable mappings for
page-tables out of box. We set a new quirk for i/o page tables -
IO_PGTABLE_QUIRK_NON_COHERENT, that lets us set different TCR
configurations.

This quirk enables the non-cacheable page tables for all masters
sitting on SMMU. Should this control be available per smmu_domain
as each master may have a different perf requirement?
Enabling this for the entire SMMU may not be desirable for all
masters.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Vivek Gautam (2):
  iommu/io-pgtable-arm: Add support for non-coherent page tables
  iommu/arm-smmu: Add support for non-coherent page table mappings

 drivers/iommu/arm-smmu.c       |  7 +++++++
 drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
 drivers/iommu/io-pgtable.h     |  6 ++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
  2019-01-17  9:27 ` Vivek Gautam
  (?)
@ 2019-01-17  9:27     ` Vivek Gautam
  -1 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

>From Robin's comment [1] about touching TCR configurations -

"TBH if we're going to touch the TCR attributes at all then we should
probably correct that sloppiness first - there's an occasional argument
for using non-cacheable pagetables even on a coherent SMMU if reducing
snoop traffic/latency on walks outweighs the cost of cache maintenance
on PTE updates, but anyone thinking they can get that by overriding
dma-coherent silently gets the worst of both worlds thanks to this
current TCR value."

We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
anybody _not_ using dma-coherent smmu to have non-cacheable page table
mappings.
Having another quirk flag can help in having non-cacheable memory for
page tables once and for all.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
 drivers/iommu/io-pgtable.h     |  6 ++++++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 237cacd4a62b..c76919c30f1a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	struct arm_lpae_io_pgtable *data;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		return NULL;
 
 	/* TCR */
-	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+	reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
+		reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
+	else
+		reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* The NS quirk doesn't apply at stage 2 */
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 47d5ae559329..46604cf7b017 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -75,6 +75,11 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 	 *	on unmap, for DMA domains using the flush queue mechanism for
 	 *	delayed invalidation.
+	 *
+	 * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
+	 *      pagetables even on a coherent SMMU for cases where reducing
+	 *      snoop traffic/latency on walks outweighs the cost of cache
+	 *      maintenance on PTE updates.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -82,6 +87,7 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(5)
+	#define IO_PGTABLE_QUIRK_NON_COHERENT	BIT(6)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
@ 2019-01-17  9:27     ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: linux-arm-kernel, linux-kernel, linux-arm-msm, tfiga, Vivek Gautam

From Robin's comment [1] about touching TCR configurations -

"TBH if we're going to touch the TCR attributes at all then we should
probably correct that sloppiness first - there's an occasional argument
for using non-cacheable pagetables even on a coherent SMMU if reducing
snoop traffic/latency on walks outweighs the cost of cache maintenance
on PTE updates, but anyone thinking they can get that by overriding
dma-coherent silently gets the worst of both worlds thanks to this
current TCR value."

We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
anybody _not_ using dma-coherent smmu to have non-cacheable page table
mappings.
Having another quirk flag can help in having non-cacheable memory for
page tables once and for all.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
 drivers/iommu/io-pgtable.h     |  6 ++++++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 237cacd4a62b..c76919c30f1a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	struct arm_lpae_io_pgtable *data;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		return NULL;
 
 	/* TCR */
-	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+	reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
+		reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
+	else
+		reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* The NS quirk doesn't apply at stage 2 */
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 47d5ae559329..46604cf7b017 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -75,6 +75,11 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 	 *	on unmap, for DMA domains using the flush queue mechanism for
 	 *	delayed invalidation.
+	 *
+	 * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
+	 *      pagetables even on a coherent SMMU for cases where reducing
+	 *      snoop traffic/latency on walks outweighs the cost of cache
+	 *      maintenance on PTE updates.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -82,6 +87,7 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(5)
+	#define IO_PGTABLE_QUIRK_NON_COHERENT	BIT(6)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
@ 2019-01-17  9:27     ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: Vivek Gautam, linux-arm-msm, linux-kernel, linux-arm-kernel, tfiga

From Robin's comment [1] about touching TCR configurations -

"TBH if we're going to touch the TCR attributes at all then we should
probably correct that sloppiness first - there's an occasional argument
for using non-cacheable pagetables even on a coherent SMMU if reducing
snoop traffic/latency on walks outweighs the cost of cache maintenance
on PTE updates, but anyone thinking they can get that by overriding
dma-coherent silently gets the worst of both worlds thanks to this
current TCR value."

We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
anybody _not_ using dma-coherent smmu to have non-cacheable page table
mappings.
Having another quirk flag can help in having non-cacheable memory for
page tables once and for all.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
 drivers/iommu/io-pgtable.h     |  6 ++++++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 237cacd4a62b..c76919c30f1a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	struct arm_lpae_io_pgtable *data;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		return NULL;
 
 	/* TCR */
-	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+	reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
+		reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
+	else
+		reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
+		       ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* The NS quirk doesn't apply at stage 2 */
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_NON_COHERENT))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 47d5ae559329..46604cf7b017 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -75,6 +75,11 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 	 *	on unmap, for DMA domains using the flush queue mechanism for
 	 *	delayed invalidation.
+	 *
+	 * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
+	 *      pagetables even on a coherent SMMU for cases where reducing
+	 *      snoop traffic/latency on walks outweighs the cost of cache
+	 *      maintenance on PTE updates.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -82,6 +87,7 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(5)
+	#define IO_PGTABLE_QUIRK_NON_COHERENT	BIT(6)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
  2019-01-17  9:27 ` Vivek Gautam
  (?)
@ 2019-01-17  9:27     ` Vivek Gautam
  -1 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, robin.murphy-5wv7dgnIgG8,
	joro-zLv9SwRftAIdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Adding a device tree option for arm smmu to enable non-cacheable
memory for page tables.
We already enable a smmu feature for coherent walk based on
whether the smmu device is dma-coherent or not. Have an option
to enable non-cacheable page table memory to force set it for
particular smmu devices.

Signed-off-by: Vivek Gautam <vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index af18a7e7f917..7ebbcf1b2eb3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -188,6 +188,7 @@ struct arm_smmu_device {
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1)
 	u32				options;
 	enum arm_smmu_arch_version	version;
 	enum arm_smmu_implementation	model;
@@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+	{ ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" },
 	{ 0, NULL},
 };
 
@@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->non_strict)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+	/* Non coherent page table mappings only for Stage-1 */
+	if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT &&
+	    smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT;
+
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-17  9:27     ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: linux-arm-kernel, linux-kernel, linux-arm-msm, tfiga, Vivek Gautam

Adding a device tree option for arm smmu to enable non-cacheable
memory for page tables.
We already enable a smmu feature for coherent walk based on
whether the smmu device is dma-coherent or not. Have an option
to enable non-cacheable page table memory to force set it for
particular smmu devices.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index af18a7e7f917..7ebbcf1b2eb3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -188,6 +188,7 @@ struct arm_smmu_device {
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1)
 	u32				options;
 	enum arm_smmu_arch_version	version;
 	enum arm_smmu_implementation	model;
@@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+	{ ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" },
 	{ 0, NULL},
 };
 
@@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->non_strict)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+	/* Non coherent page table mappings only for Stage-1 */
+	if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT &&
+	    smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT;
+
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-17  9:27     ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-17  9:27 UTC (permalink / raw)
  To: will.deacon, robin.murphy, joro, iommu
  Cc: Vivek Gautam, linux-arm-msm, linux-kernel, linux-arm-kernel, tfiga

Adding a device tree option for arm smmu to enable non-cacheable
memory for page tables.
We already enable a smmu feature for coherent walk based on
whether the smmu device is dma-coherent or not. Have an option
to enable non-cacheable page table memory to force set it for
particular smmu devices.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index af18a7e7f917..7ebbcf1b2eb3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -188,6 +188,7 @@ struct arm_smmu_device {
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1)
 	u32				options;
 	enum arm_smmu_arch_version	version;
 	enum arm_smmu_implementation	model;
@@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+	{ ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" },
 	{ 0, NULL},
 };
 
@@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (smmu_domain->non_strict)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+	/* Non coherent page table mappings only for Stage-1 */
+	if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT &&
+	    smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
+		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT;
+
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
  2019-01-17  9:27     ` Vivek Gautam
  (?)
@ 2019-01-20  0:01         ` Will Deacon
  -1 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2019-01-20  0:01 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	robin.murphy-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> Adding a device tree option for arm smmu to enable non-cacheable
> memory for page tables.
> We already enable a smmu feature for coherent walk based on
> whether the smmu device is dma-coherent or not. Have an option
> to enable non-cacheable page table memory to force set it for
> particular smmu devices.

Hmm, I must be missing something here. What is the difference between this
new property, and simply omitting dma-coherent on the SMMU?

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-20  0:01         ` Will Deacon
  0 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2019-01-20  0:01 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: robin.murphy, joro, iommu, linux-arm-kernel, linux-kernel,
	linux-arm-msm, tfiga

On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> Adding a device tree option for arm smmu to enable non-cacheable
> memory for page tables.
> We already enable a smmu feature for coherent walk based on
> whether the smmu device is dma-coherent or not. Have an option
> to enable non-cacheable page table memory to force set it for
> particular smmu devices.

Hmm, I must be missing something here. What is the difference between this
new property, and simply omitting dma-coherent on the SMMU?

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-20  0:01         ` Will Deacon
  0 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2019-01-20  0:01 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: linux-arm-msm, joro, linux-kernel, tfiga, iommu, robin.murphy,
	linux-arm-kernel

On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> Adding a device tree option for arm smmu to enable non-cacheable
> memory for page tables.
> We already enable a smmu feature for coherent walk based on
> whether the smmu device is dma-coherent or not. Have an option
> to enable non-cacheable page table memory to force set it for
> particular smmu devices.

Hmm, I must be missing something here. What is the difference between this
new property, and simply omitting dma-coherent on the SMMU?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
  2019-01-20  0:01         ` Will Deacon
  (?)
@ 2019-01-21  6:05           ` Vivek Gautam
  -1 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-21  6:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-msm,
	list-Y9sIeH5OGRo@public.gmane.org:IOMMU DRIVERS
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>,
	, Robin Murphy, open list, Linux ARM

Hi Will,


On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>
> On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > Adding a device tree option for arm smmu to enable non-cacheable
> > memory for page tables.
> > We already enable a smmu feature for coherent walk based on
> > whether the smmu device is dma-coherent or not. Have an option
> > to enable non-cacheable page table memory to force set it for
> > particular smmu devices.
>
> Hmm, I must be missing something here. What is the difference between this
> new property, and simply omitting dma-coherent on the SMMU?

So, this is what I understood from the email thread for Last level
cache support -
Robin pointed to the fact that we may need to add support for setting
non-cacheable
mappings in the TCR.
Currently, we don't do that for SMMUs that omit dma-coherent.
We rely on the interconnect to handle the configuration set in TCR,
and let interconnect
ignore the cacheability if it can't support.

Moreover, Robin suggested that we should take care of SMMUs, for which
removing snoop latency on walks by making mappings as non-cacheable
outweighs the cost of cache maintenance on PTE updates.

So, this change adds another property to do this non-cacheable mappings
explicitly. As I pointed, omitting 'dma-coherent', and corresponding
IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things.

Should we handle the TCR settings too with this quirk?

Regards
Vivek
>
> Will
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-21  6:05           ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-21  6:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-msm, open list,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Robin Murphy, Linux ARM

Hi Will,


On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
>
> On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > Adding a device tree option for arm smmu to enable non-cacheable
> > memory for page tables.
> > We already enable a smmu feature for coherent walk based on
> > whether the smmu device is dma-coherent or not. Have an option
> > to enable non-cacheable page table memory to force set it for
> > particular smmu devices.
>
> Hmm, I must be missing something here. What is the difference between this
> new property, and simply omitting dma-coherent on the SMMU?

So, this is what I understood from the email thread for Last level
cache support -
Robin pointed to the fact that we may need to add support for setting
non-cacheable
mappings in the TCR.
Currently, we don't do that for SMMUs that omit dma-coherent.
We rely on the interconnect to handle the configuration set in TCR,
and let interconnect
ignore the cacheability if it can't support.

Moreover, Robin suggested that we should take care of SMMUs, for which
removing snoop latency on walks by making mappings as non-cacheable
outweighs the cost of cache maintenance on PTE updates.

So, this change adds another property to do this non-cacheable mappings
explicitly. As I pointed, omitting 'dma-coherent', and corresponding
IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things.

Should we handle the TCR settings too with this quirk?

Regards
Vivek
>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-21  6:05           ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-21  6:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-msm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Robin Murphy, open list, Linux ARM

Hi Will,


On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
>
> On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > Adding a device tree option for arm smmu to enable non-cacheable
> > memory for page tables.
> > We already enable a smmu feature for coherent walk based on
> > whether the smmu device is dma-coherent or not. Have an option
> > to enable non-cacheable page table memory to force set it for
> > particular smmu devices.
>
> Hmm, I must be missing something here. What is the difference between this
> new property, and simply omitting dma-coherent on the SMMU?

So, this is what I understood from the email thread for Last level
cache support -
Robin pointed to the fact that we may need to add support for setting
non-cacheable
mappings in the TCR.
Currently, we don't do that for SMMUs that omit dma-coherent.
We rely on the interconnect to handle the configuration set in TCR,
and let interconnect
ignore the cacheability if it can't support.

Moreover, Robin suggested that we should take care of SMMUs, for which
removing snoop latency on walks by making mappings as non-cacheable
outweighs the cost of cache maintenance on PTE updates.

So, this change adds another property to do this non-cacheable mappings
explicitly. As I pointed, omitting 'dma-coherent', and corresponding
IO_PGTABLE_QUIRK_NO_DMA' does takes care of few things.

Should we handle the TCR settings too with this quirk?

Regards
Vivek
>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
  2019-01-17  9:27     ` Vivek Gautam
@ 2019-01-21 13:12       ` Robin Murphy
  -1 siblings, 0 replies; 23+ messages in thread
From: Robin Murphy @ 2019-01-21 13:12 UTC (permalink / raw)
  To: Vivek Gautam, will.deacon, joro, iommu
  Cc: linux-arm-msm, linux-kernel, linux-arm-kernel

On 17/01/2019 09:27, Vivek Gautam wrote:
>  From Robin's comment [1] about touching TCR configurations -
> 
> "TBH if we're going to touch the TCR attributes at all then we should
> probably correct that sloppiness first - there's an occasional argument
> for using non-cacheable pagetables even on a coherent SMMU if reducing
> snoop traffic/latency on walks outweighs the cost of cache maintenance
> on PTE updates, but anyone thinking they can get that by overriding
> dma-coherent silently gets the worst of both worlds thanks to this
> current TCR value."
> 
> We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
> anybody _not_ using dma-coherent smmu to have non-cacheable page table
> mappings.
> Having another quirk flag can help in having non-cacheable memory for
> page tables once and for all.
> 
> [1] https://lore.kernel.org/patchwork/patch/1020906/
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>   drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
>   drivers/iommu/io-pgtable.h     |  6 ++++++
>   2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 237cacd4a62b..c76919c30f1a 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   	struct arm_lpae_io_pgtable *data;
>   
>   	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_NON_COHERENT))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   		return NULL;
>   
>   	/* TCR */
> -	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> -	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> +	reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
> +		reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
> +		       ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
> +	else
> +		reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
> +		       ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
>   
>   	switch (ARM_LPAE_GRANULE(data)) {
>   	case SZ_4K:
> @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	/* The NS quirk doesn't apply at stage 2 */
>   	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_NON_COHERENT))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 47d5ae559329..46604cf7b017 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -75,6 +75,11 @@ struct io_pgtable_cfg {
>   	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>   	 *	on unmap, for DMA domains using the flush queue mechanism for
>   	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
> +	 *      pagetables even on a coherent SMMU for cases where reducing
> +	 *      snoop traffic/latency on walks outweighs the cost of cache
> +	 *      maintenance on PTE updates.

Hmm, we can't actually "enforce" anything with this as-is - all we're 
doing is setting the attributes that the IOMMU will use for pagetable 
walks, and that has no impact on how the CPU actually writes PTEs to 
memory. In particular, in the case of a hardware-coherent IOMMU which is 
described as such, even if we make the dma_map/sync calls they still 
won't do anything since they 'know' that the IOMMU is coherent. Thus if 
we then set up a non-cacheable TCR we would have no proper means of 
making pagetables correctly visible to the walker.

Aw crap, this is turning out to be a microcosm of the PCIe no-snoop 
mess... :(

To start with, at least, what we want is to set a non-cacheable TCR if 
the IOMMU is *not* coherent (as far as Linux is concerned - that 
includes the firmware-lying-about-the-hardware situation I was alluding 
to before), but even that isn't necessarily as straightforward as it 
seems. AFAICS, if QUIRK_NO_DMA is set then we definitely have to use a 
cacheable TCR; we can't strictly rely on the inverse being true, but in 
practice we *might* get away with it since we already disallow most 
cases in which the DMA API calls would actually do anything for a 
known-coherent IOMMU device.

Robin.

>   	 */
>   	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>   	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
> @@ -82,6 +87,7 @@ struct io_pgtable_cfg {
>   	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
>   	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
>   	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(5)
> +	#define IO_PGTABLE_QUIRK_NON_COHERENT	BIT(6)
>   	unsigned long			quirks;
>   	unsigned long			pgsize_bitmap;
>   	unsigned int			ias;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
@ 2019-01-21 13:12       ` Robin Murphy
  0 siblings, 0 replies; 23+ messages in thread
From: Robin Murphy @ 2019-01-21 13:12 UTC (permalink / raw)
  To: Vivek Gautam, will.deacon, joro, iommu
  Cc: linux-arm-msm, linux-kernel, linux-arm-kernel

On 17/01/2019 09:27, Vivek Gautam wrote:
>  From Robin's comment [1] about touching TCR configurations -
> 
> "TBH if we're going to touch the TCR attributes at all then we should
> probably correct that sloppiness first - there's an occasional argument
> for using non-cacheable pagetables even on a coherent SMMU if reducing
> snoop traffic/latency on walks outweighs the cost of cache maintenance
> on PTE updates, but anyone thinking they can get that by overriding
> dma-coherent silently gets the worst of both worlds thanks to this
> current TCR value."
> 
> We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
> anybody _not_ using dma-coherent smmu to have non-cacheable page table
> mappings.
> Having another quirk flag can help in having non-cacheable memory for
> page tables once and for all.
> 
> [1] https://lore.kernel.org/patchwork/patch/1020906/
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>   drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
>   drivers/iommu/io-pgtable.h     |  6 ++++++
>   2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 237cacd4a62b..c76919c30f1a 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   	struct arm_lpae_io_pgtable *data;
>   
>   	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_NON_COHERENT))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   		return NULL;
>   
>   	/* TCR */
> -	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> -	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> +	reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
> +		reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
> +		       ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
> +	else
> +		reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
> +		       ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
>   
>   	switch (ARM_LPAE_GRANULE(data)) {
>   	case SZ_4K:
> @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	/* The NS quirk doesn't apply at stage 2 */
>   	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_NON_COHERENT))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 47d5ae559329..46604cf7b017 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -75,6 +75,11 @@ struct io_pgtable_cfg {
>   	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>   	 *	on unmap, for DMA domains using the flush queue mechanism for
>   	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
> +	 *      pagetables even on a coherent SMMU for cases where reducing
> +	 *      snoop traffic/latency on walks outweighs the cost of cache
> +	 *      maintenance on PTE updates.

Hmm, we can't actually "enforce" anything with this as-is - all we're 
doing is setting the attributes that the IOMMU will use for pagetable 
walks, and that has no impact on how the CPU actually writes PTEs to 
memory. In particular, in the case of a hardware-coherent IOMMU which is 
described as such, even if we make the dma_map/sync calls they still 
won't do anything since they 'know' that the IOMMU is coherent. Thus if 
we then set up a non-cacheable TCR we would have no proper means of 
making pagetables correctly visible to the walker.

Aw crap, this is turning out to be a microcosm of the PCIe no-snoop 
mess... :(

To start with, at least, what we want is to set a non-cacheable TCR if 
the IOMMU is *not* coherent (as far as Linux is concerned - that 
includes the firmware-lying-about-the-hardware situation I was alluding 
to before), but even that isn't necessarily as straightforward as it 
seems. AFAICS, if QUIRK_NO_DMA is set then we definitely have to use a 
cacheable TCR; we can't strictly rely on the inverse being true, but in 
practice we *might* get away with it since we already disallow most 
cases in which the DMA API calls would actually do anything for a 
known-coherent IOMMU device.

Robin.

>   	 */
>   	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>   	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
> @@ -82,6 +87,7 @@ struct io_pgtable_cfg {
>   	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
>   	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
>   	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(5)
> +	#define IO_PGTABLE_QUIRK_NON_COHERENT	BIT(6)
>   	unsigned long			quirks;
>   	unsigned long			pgsize_bitmap;
>   	unsigned int			ias;
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
  2019-01-21  6:05           ` Vivek Gautam
@ 2019-01-22  5:43             ` Will Deacon
  -1 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2019-01-22  5:43 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: linux-arm-msm, open list,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Robin Murphy, Linux ARM

On Mon, Jan 21, 2019 at 11:35:30AM +0530, Vivek Gautam wrote:
> On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > > Adding a device tree option for arm smmu to enable non-cacheable
> > > memory for page tables.
> > > We already enable a smmu feature for coherent walk based on
> > > whether the smmu device is dma-coherent or not. Have an option
> > > to enable non-cacheable page table memory to force set it for
> > > particular smmu devices.
> >
> > Hmm, I must be missing something here. What is the difference between this
> > new property, and simply omitting dma-coherent on the SMMU?
> 
> So, this is what I understood from the email thread for Last level
> cache support -
> Robin pointed to the fact that we may need to add support for setting
> non-cacheable
> mappings in the TCR.
> Currently, we don't do that for SMMUs that omit dma-coherent.
> We rely on the interconnect to handle the configuration set in TCR,
> and let interconnect
> ignore the cacheability if it can't support.

I think that's a bug. With that fixed, can you get what you want by omitting
"dma-coherent"?

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-22  5:43             ` Will Deacon
  0 siblings, 0 replies; 23+ messages in thread
From: Will Deacon @ 2019-01-22  5:43 UTC (permalink / raw)
  To: Vivek Gautam
  Cc: linux-arm-msm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Robin Murphy, open list, Linux ARM

On Mon, Jan 21, 2019 at 11:35:30AM +0530, Vivek Gautam wrote:
> On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > > Adding a device tree option for arm smmu to enable non-cacheable
> > > memory for page tables.
> > > We already enable a smmu feature for coherent walk based on
> > > whether the smmu device is dma-coherent or not. Have an option
> > > to enable non-cacheable page table memory to force set it for
> > > particular smmu devices.
> >
> > Hmm, I must be missing something here. What is the difference between this
> > new property, and simply omitting dma-coherent on the SMMU?
> 
> So, this is what I understood from the email thread for Last level
> cache support -
> Robin pointed to the fact that we may need to add support for setting
> non-cacheable
> mappings in the TCR.
> Currently, we don't do that for SMMUs that omit dma-coherent.
> We rely on the interconnect to handle the configuration set in TCR,
> and let interconnect
> ignore the cacheability if it can't support.

I think that's a bug. With that fixed, can you get what you want by omitting
"dma-coherent"?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
  2019-01-21 13:12       ` Robin Murphy
  (?)
@ 2019-01-28 12:20         ` Vivek Gautam
  -1 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-28 12:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,

On Mon, Jan 21, 2019 at 6:43 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 17/01/2019 09:27, Vivek Gautam wrote:
> >  From Robin's comment [1] about touching TCR configurations -
> >
> > "TBH if we're going to touch the TCR attributes at all then we should
> > probably correct that sloppiness first - there's an occasional argument
> > for using non-cacheable pagetables even on a coherent SMMU if reducing
> > snoop traffic/latency on walks outweighs the cost of cache maintenance
> > on PTE updates, but anyone thinking they can get that by overriding
> > dma-coherent silently gets the worst of both worlds thanks to this
> > current TCR value."
> >
> > We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
> > anybody _not_ using dma-coherent smmu to have non-cacheable page table
> > mappings.
> > Having another quirk flag can help in having non-cacheable memory for
> > page tables once and for all.
> >
> > [1] https://lore.kernel.org/patchwork/patch/1020906/
> >
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > ---
> >   drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
> >   drivers/iommu/io-pgtable.h     |  6 ++++++
> >   2 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 237cacd4a62b..c76919c30f1a 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >       struct arm_lpae_io_pgtable *data;
> >
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >               return NULL;
> >
> >       /* TCR */
> > -     reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +     reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
> > +
> > +     if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
> > +             reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
> > +     else
> > +             reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
> >
> >       switch (ARM_LPAE_GRANULE(data)) {
> >       case SZ_4K:
> > @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >
> >       /* The NS quirk doesn't apply at stage 2 */
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 47d5ae559329..46604cf7b017 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -75,6 +75,11 @@ struct io_pgtable_cfg {
> >        * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
> >        *      on unmap, for DMA domains using the flush queue mechanism for
> >        *      delayed invalidation.
> > +      *
> > +      * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
> > +      *      pagetables even on a coherent SMMU for cases where reducing
> > +      *      snoop traffic/latency on walks outweighs the cost of cache
> > +      *      maintenance on PTE updates.
>
> Hmm, we can't actually "enforce" anything with this as-is - all we're
> doing is setting the attributes that the IOMMU will use for pagetable
> walks, and that has no impact on how the CPU actually writes PTEs to
> memory. In particular, in the case of a hardware-coherent IOMMU which is
> described as such, even if we make the dma_map/sync calls they still
> won't do anything since they 'know' that the IOMMU is coherent. Thus if
> we then set up a non-cacheable TCR we would have no proper means of
> making pagetables correctly visible to the walker.

Right, I get this point. With non-cacheable TCR, the PTW will always go to
the main memory rather then snooping in CPU caches for the latest page
tables.

>
> Aw crap, this is turning out to be a microcosm of the PCIe no-snoop
> mess... :(
>
> To start with, at least, what we want is to set a non-cacheable TCR if
> the IOMMU is *not* coherent (as far as Linux is concerned - that
> includes the firmware-lying-about-the-hardware situation I was alluding
> to before), but even that isn't necessarily as straightforward as it
> seems. AFAICS, if QUIRK_NO_DMA is set then we definitely have to use a
> cacheable TCR;

Okay, so for QUIRK_NO_DMA we set IRGN0 and ORGN0 to WBWA in TCR,
But, for SMMUs that omit 'dma-coherent' and thus QUIRK_NO_DMA is not set
do we allow them to have a Non-Cacheable set to IRGN0 and ORGN0, as the
PTW will anyways have to read from memory after the CPU flushes the
PTEs to the memory (which we are already doing).

Regards
Vivek


> we can't strictly rely on the inverse being true, but in
> practice we *might* get away with it since we already disallow most
> cases in which the DMA API calls would actually do anything for a
> known-coherent IOMMU device.
>
> Robin.
>
> >        */
> >       #define IO_PGTABLE_QUIRK_ARM_NS         BIT(0)
> >       #define IO_PGTABLE_QUIRK_NO_PERMS       BIT(1)
> > @@ -82,6 +87,7 @@ struct io_pgtable_cfg {
> >       #define IO_PGTABLE_QUIRK_ARM_MTK_4GB    BIT(3)
> >       #define IO_PGTABLE_QUIRK_NO_DMA         BIT(4)
> >       #define IO_PGTABLE_QUIRK_NON_STRICT     BIT(5)
> > +     #define IO_PGTABLE_QUIRK_NON_COHERENT   BIT(6)
> >       unsigned long                   quirks;
> >       unsigned long                   pgsize_bitmap;
> >       unsigned int                    ias;
> >



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
@ 2019-01-28 12:20         ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-28 12:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-arm-msm, open list, Linux ARM

On Mon, Jan 21, 2019 at 6:43 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 17/01/2019 09:27, Vivek Gautam wrote:
> >  From Robin's comment [1] about touching TCR configurations -
> >
> > "TBH if we're going to touch the TCR attributes at all then we should
> > probably correct that sloppiness first - there's an occasional argument
> > for using non-cacheable pagetables even on a coherent SMMU if reducing
> > snoop traffic/latency on walks outweighs the cost of cache maintenance
> > on PTE updates, but anyone thinking they can get that by overriding
> > dma-coherent silently gets the worst of both worlds thanks to this
> > current TCR value."
> >
> > We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
> > anybody _not_ using dma-coherent smmu to have non-cacheable page table
> > mappings.
> > Having another quirk flag can help in having non-cacheable memory for
> > page tables once and for all.
> >
> > [1] https://lore.kernel.org/patchwork/patch/1020906/
> >
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > ---
> >   drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
> >   drivers/iommu/io-pgtable.h     |  6 ++++++
> >   2 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 237cacd4a62b..c76919c30f1a 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >       struct arm_lpae_io_pgtable *data;
> >
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >               return NULL;
> >
> >       /* TCR */
> > -     reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +     reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
> > +
> > +     if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
> > +             reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
> > +     else
> > +             reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
> >
> >       switch (ARM_LPAE_GRANULE(data)) {
> >       case SZ_4K:
> > @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >
> >       /* The NS quirk doesn't apply at stage 2 */
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 47d5ae559329..46604cf7b017 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -75,6 +75,11 @@ struct io_pgtable_cfg {
> >        * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
> >        *      on unmap, for DMA domains using the flush queue mechanism for
> >        *      delayed invalidation.
> > +      *
> > +      * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
> > +      *      pagetables even on a coherent SMMU for cases where reducing
> > +      *      snoop traffic/latency on walks outweighs the cost of cache
> > +      *      maintenance on PTE updates.
>
> Hmm, we can't actually "enforce" anything with this as-is - all we're
> doing is setting the attributes that the IOMMU will use for pagetable
> walks, and that has no impact on how the CPU actually writes PTEs to
> memory. In particular, in the case of a hardware-coherent IOMMU which is
> described as such, even if we make the dma_map/sync calls they still
> won't do anything since they 'know' that the IOMMU is coherent. Thus if
> we then set up a non-cacheable TCR we would have no proper means of
> making pagetables correctly visible to the walker.

Right, I get this point. With non-cacheable TCR, the PTW will always go to
the main memory rather then snooping in CPU caches for the latest page
tables.

>
> Aw crap, this is turning out to be a microcosm of the PCIe no-snoop
> mess... :(
>
> To start with, at least, what we want is to set a non-cacheable TCR if
> the IOMMU is *not* coherent (as far as Linux is concerned - that
> includes the firmware-lying-about-the-hardware situation I was alluding
> to before), but even that isn't necessarily as straightforward as it
> seems. AFAICS, if QUIRK_NO_DMA is set then we definitely have to use a
> cacheable TCR;

Okay, so for QUIRK_NO_DMA we set IRGN0 and ORGN0 to WBWA in TCR,
But, for SMMUs that omit 'dma-coherent' and thus QUIRK_NO_DMA is not set
do we allow them to have a Non-Cacheable set to IRGN0 and ORGN0, as the
PTW will anyways have to read from memory after the CPU flushes the
PTEs to the memory (which we are already doing).

Regards
Vivek


> we can't strictly rely on the inverse being true, but in
> practice we *might* get away with it since we already disallow most
> cases in which the DMA API calls would actually do anything for a
> known-coherent IOMMU device.
>
> Robin.
>
> >        */
> >       #define IO_PGTABLE_QUIRK_ARM_NS         BIT(0)
> >       #define IO_PGTABLE_QUIRK_NO_PERMS       BIT(1)
> > @@ -82,6 +87,7 @@ struct io_pgtable_cfg {
> >       #define IO_PGTABLE_QUIRK_ARM_MTK_4GB    BIT(3)
> >       #define IO_PGTABLE_QUIRK_NO_DMA         BIT(4)
> >       #define IO_PGTABLE_QUIRK_NON_STRICT     BIT(5)
> > +     #define IO_PGTABLE_QUIRK_NON_COHERENT   BIT(6)
> >       unsigned long                   quirks;
> >       unsigned long                   pgsize_bitmap;
> >       unsigned int                    ias;
> >



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
@ 2019-01-28 12:20         ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-28 12:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-arm-msm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Will Deacon, open list,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux ARM

On Mon, Jan 21, 2019 at 6:43 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 17/01/2019 09:27, Vivek Gautam wrote:
> >  From Robin's comment [1] about touching TCR configurations -
> >
> > "TBH if we're going to touch the TCR attributes at all then we should
> > probably correct that sloppiness first - there's an occasional argument
> > for using non-cacheable pagetables even on a coherent SMMU if reducing
> > snoop traffic/latency on walks outweighs the cost of cache maintenance
> > on PTE updates, but anyone thinking they can get that by overriding
> > dma-coherent silently gets the worst of both worlds thanks to this
> > current TCR value."
> >
> > We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
> > anybody _not_ using dma-coherent smmu to have non-cacheable page table
> > mappings.
> > Having another quirk flag can help in having non-cacheable memory for
> > page tables once and for all.
> >
> > [1] https://lore.kernel.org/patchwork/patch/1020906/
> >
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > ---
> >   drivers/iommu/io-pgtable-arm.c | 17 ++++++++++++-----
> >   drivers/iommu/io-pgtable.h     |  6 ++++++
> >   2 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 237cacd4a62b..c76919c30f1a 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >       struct arm_lpae_io_pgtable *data;
> >
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >               return NULL;
> >
> >       /* TCR */
> > -     reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> > -           (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> > +     reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
> > +
> > +     if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
> > +             reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
> > +     else
> > +             reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
> > +                    ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
> >
> >       switch (ARM_LPAE_GRANULE(data)) {
> >       case SZ_4K:
> > @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >
> >       /* The NS quirk doesn't apply at stage 2 */
> >       if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
> > -                         IO_PGTABLE_QUIRK_NON_STRICT))
> > +                         IO_PGTABLE_QUIRK_NON_STRICT |
> > +                         IO_PGTABLE_QUIRK_NON_COHERENT))
> >               return NULL;
> >
> >       data = arm_lpae_alloc_pgtable(cfg);
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 47d5ae559329..46604cf7b017 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -75,6 +75,11 @@ struct io_pgtable_cfg {
> >        * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
> >        *      on unmap, for DMA domains using the flush queue mechanism for
> >        *      delayed invalidation.
> > +      *
> > +      * IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
> > +      *      pagetables even on a coherent SMMU for cases where reducing
> > +      *      snoop traffic/latency on walks outweighs the cost of cache
> > +      *      maintenance on PTE updates.
>
> Hmm, we can't actually "enforce" anything with this as-is - all we're
> doing is setting the attributes that the IOMMU will use for pagetable
> walks, and that has no impact on how the CPU actually writes PTEs to
> memory. In particular, in the case of a hardware-coherent IOMMU which is
> described as such, even if we make the dma_map/sync calls they still
> won't do anything since they 'know' that the IOMMU is coherent. Thus if
> we then set up a non-cacheable TCR we would have no proper means of
> making pagetables correctly visible to the walker.

Right, I get this point. With non-cacheable TCR, the PTW will always go to
the main memory rather then snooping in CPU caches for the latest page
tables.

>
> Aw crap, this is turning out to be a microcosm of the PCIe no-snoop
> mess... :(
>
> To start with, at least, what we want is to set a non-cacheable TCR if
> the IOMMU is *not* coherent (as far as Linux is concerned - that
> includes the firmware-lying-about-the-hardware situation I was alluding
> to before), but even that isn't necessarily as straightforward as it
> seems. AFAICS, if QUIRK_NO_DMA is set then we definitely have to use a
> cacheable TCR;

Okay, so for QUIRK_NO_DMA we set IRGN0 and ORGN0 to WBWA in TCR,
But, for SMMUs that omit 'dma-coherent' and thus QUIRK_NO_DMA is not set
do we allow them to have a Non-Cacheable set to IRGN0 and ORGN0, as the
PTW will anyways have to read from memory after the CPU flushes the
PTEs to the memory (which we are already doing).

Regards
Vivek


> we can't strictly rely on the inverse being true, but in
> practice we *might* get away with it since we already disallow most
> cases in which the DMA API calls would actually do anything for a
> known-coherent IOMMU device.
>
> Robin.
>
> >        */
> >       #define IO_PGTABLE_QUIRK_ARM_NS         BIT(0)
> >       #define IO_PGTABLE_QUIRK_NO_PERMS       BIT(1)
> > @@ -82,6 +87,7 @@ struct io_pgtable_cfg {
> >       #define IO_PGTABLE_QUIRK_ARM_MTK_4GB    BIT(3)
> >       #define IO_PGTABLE_QUIRK_NO_DMA         BIT(4)
> >       #define IO_PGTABLE_QUIRK_NON_STRICT     BIT(5)
> > +     #define IO_PGTABLE_QUIRK_NON_COHERENT   BIT(6)
> >       unsigned long                   quirks;
> >       unsigned long                   pgsize_bitmap;
> >       unsigned int                    ias;
> >



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
  2019-01-22  5:43             ` Will Deacon
@ 2019-01-29 10:43               ` Vivek Gautam
  -1 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-29 10:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-msm, open list,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Robin Murphy, Linux ARM

Hi Will,

On Tue, Jan 22, 2019 at 11:14 AM Will Deacon <will.deacon@arm.com> wrote:
>
> On Mon, Jan 21, 2019 at 11:35:30AM +0530, Vivek Gautam wrote:
> > On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
> > > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > > > Adding a device tree option for arm smmu to enable non-cacheable
> > > > memory for page tables.
> > > > We already enable a smmu feature for coherent walk based on
> > > > whether the smmu device is dma-coherent or not. Have an option
> > > > to enable non-cacheable page table memory to force set it for
> > > > particular smmu devices.
> > >
> > > Hmm, I must be missing something here. What is the difference between this
> > > new property, and simply omitting dma-coherent on the SMMU?
> >
> > So, this is what I understood from the email thread for Last level
> > cache support -
> > Robin pointed to the fact that we may need to add support for setting
> > non-cacheable
> > mappings in the TCR.
> > Currently, we don't do that for SMMUs that omit dma-coherent.
> > We rely on the interconnect to handle the configuration set in TCR,
> > and let interconnect
> > ignore the cacheability if it can't support.
>
> I think that's a bug. With that fixed, can you get what you want by omitting
> "dma-coherent"?

Based on the discussion on the first patch in this series [1], I can
update the series.
First thing can be -
if QUIRK_NO_DMA is set (i.e. the IOMMU _is_ coherent) then we use a
cacheable TCR;
So, we may need an additional check for this when setting the TCR.

For the second case -
IOMMUs that are *not* coherent, i.e ones that are omitting
'dma-coherent' property,
anyways have to access the page table directly from memory. We take
care of the CPU
side of this by allocating non-coherent memory, and making sure that we sync the
PTEs from map call.
Shouldn't we mark TCR for these IOMMUs as non-cacheable for inner and outer
cacheability attribute?


[1] https://lore.kernel.org/patchwork/patch/1032939/

Regards
Vivek

>
> Will



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
@ 2019-01-29 10:43               ` Vivek Gautam
  0 siblings, 0 replies; 23+ messages in thread
From: Vivek Gautam @ 2019-01-29 10:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-msm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Robin Murphy, open list, Linux ARM

Hi Will,

On Tue, Jan 22, 2019 at 11:14 AM Will Deacon <will.deacon@arm.com> wrote:
>
> On Mon, Jan 21, 2019 at 11:35:30AM +0530, Vivek Gautam wrote:
> > On Sun, Jan 20, 2019 at 5:31 AM Will Deacon <will.deacon@arm.com> wrote:
> > > On Thu, Jan 17, 2019 at 02:57:18PM +0530, Vivek Gautam wrote:
> > > > Adding a device tree option for arm smmu to enable non-cacheable
> > > > memory for page tables.
> > > > We already enable a smmu feature for coherent walk based on
> > > > whether the smmu device is dma-coherent or not. Have an option
> > > > to enable non-cacheable page table memory to force set it for
> > > > particular smmu devices.
> > >
> > > Hmm, I must be missing something here. What is the difference between this
> > > new property, and simply omitting dma-coherent on the SMMU?
> >
> > So, this is what I understood from the email thread for Last level
> > cache support -
> > Robin pointed to the fact that we may need to add support for setting
> > non-cacheable
> > mappings in the TCR.
> > Currently, we don't do that for SMMUs that omit dma-coherent.
> > We rely on the interconnect to handle the configuration set in TCR,
> > and let interconnect
> > ignore the cacheability if it can't support.
>
> I think that's a bug. With that fixed, can you get what you want by omitting
> "dma-coherent"?

Based on the discussion on the first patch in this series [1], I can
update the series.
First thing can be -
if QUIRK_NO_DMA is set (i.e. the IOMMU _is_ coherent) then we use a
cacheable TCR;
So, we may need an additional check for this when setting the TCR.

For the second case -
IOMMUs that are *not* coherent, i.e ones that are omitting
'dma-coherent' property,
anyways have to access the page table directly from memory. We take
care of the CPU
side of this by allocating non-coherent memory, and making sure that we sync the
PTEs from map call.
Shouldn't we mark TCR for these IOMMUs as non-cacheable for inner and outer
cacheability attribute?


[1] https://lore.kernel.org/patchwork/patch/1032939/

Regards
Vivek

>
> Will



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-01-29 10:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17  9:27 [PATCH 0/2] iommu/arm: Add support for non-coherent page tables Vivek Gautam
2019-01-17  9:27 ` Vivek Gautam
     [not found] ` <20190117092718.1396-1-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2019-01-17  9:27   ` [PATCH 1/2] iommu/io-pgtable-arm: " Vivek Gautam
2019-01-17  9:27     ` Vivek Gautam
2019-01-17  9:27     ` Vivek Gautam
2019-01-21 13:12     ` Robin Murphy
2019-01-21 13:12       ` Robin Murphy
2019-01-28 12:20       ` Vivek Gautam
2019-01-28 12:20         ` Vivek Gautam
2019-01-28 12:20         ` Vivek Gautam
2019-01-17  9:27   ` [PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings Vivek Gautam
2019-01-17  9:27     ` Vivek Gautam
2019-01-17  9:27     ` Vivek Gautam
     [not found]     ` <20190117092718.1396-3-vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2019-01-20  0:01       ` Will Deacon
2019-01-20  0:01         ` Will Deacon
2019-01-20  0:01         ` Will Deacon
2019-01-21  6:05         ` Vivek Gautam
2019-01-21  6:05           ` Vivek Gautam
2019-01-21  6:05           ` Vivek Gautam
2019-01-22  5:43           ` Will Deacon
2019-01-22  5:43             ` Will Deacon
2019-01-29 10:43             ` Vivek Gautam
2019-01-29 10:43               ` Vivek Gautam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.