All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-15 23:51 ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi all,

I polished up my original proof-of-concept a little while back, but now
that I've got my hands on my Juno again I've been able to actually test
it to my satisfaction, so here are proper patches!

It probably makes sense for patches #1 and #2 to stay together and both
go via drm-misc, provided Will's OK with that.

Robin.


Robin Murphy (3):
  iommu/io-pgtable-arm: Support coherency for Mali LPAE
  drm/panfrost: Support cache-coherent integrations
  arm64: dts: meson: Describe G12b GPU as coherent

 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
 drivers/iommu/io-pgtable-arm.c              | 5 ++++-
 6 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.28.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-15 23:51 ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi all,

I polished up my original proof-of-concept a little while back, but now
that I've got my hands on my Juno again I've been able to actually test
it to my satisfaction, so here are proper patches!

It probably makes sense for patches #1 and #2 to stay together and both
go via drm-misc, provided Will's OK with that.

Robin.


Robin Murphy (3):
  iommu/io-pgtable-arm: Support coherency for Mali LPAE
  drm/panfrost: Support cache-coherent integrations
  arm64: dts: meson: Describe G12b GPU as coherent

 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
 drivers/iommu/io-pgtable-arm.c              | 5 ++++-
 6 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.28.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-15 23:51 ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi all,

I polished up my original proof-of-concept a little while back, but now
that I've got my hands on my Juno again I've been able to actually test
it to my satisfaction, so here are proper patches!

It probably makes sense for patches #1 and #2 to stay together and both
go via drm-misc, provided Will's OK with that.

Robin.


Robin Murphy (3):
  iommu/io-pgtable-arm: Support coherency for Mali LPAE
  drm/panfrost: Support cache-coherent integrations
  arm64: dts: meson: Describe G12b GPU as coherent

 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
 drivers/iommu/io-pgtable-arm.c              | 5 ++++-
 6 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.28.0.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-15 23:51 ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi all,

I polished up my original proof-of-concept a little while back, but now
that I've got my hands on my Juno again I've been able to actually test
it to my satisfaction, so here are proper patches!

It probably makes sense for patches #1 and #2 to stay together and both
go via drm-misc, provided Will's OK with that.

Robin.


Robin Murphy (3):
  iommu/io-pgtable-arm: Support coherency for Mali LPAE
  drm/panfrost: Support cache-coherent integrations
  arm64: dts: meson: Describe G12b GPU as coherent

 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
 drivers/iommu/io-pgtable-arm.c              | 5 ++++-
 6 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.28.0.dirty


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
  2020-09-15 23:51 ` Robin Murphy
  (?)
  (?)
@ 2020-09-15 23:51   ` Robin Murphy
  -1 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..e47012006dcc 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
-	if (prot & IOMMU_CACHE)
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1049,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+	if (cfg->coherent_walk)
+		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
 	return &data->iop;
 
 out_free_data:
-- 
2.28.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..e47012006dcc 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
-	if (prot & IOMMU_CACHE)
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1049,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+	if (cfg->coherent_walk)
+		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
 	return &data->iop;
 
 out_free_data:
-- 
2.28.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..e47012006dcc 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
-	if (prot & IOMMU_CACHE)
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1049,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+	if (cfg->coherent_walk)
+		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
 	return &data->iop;
 
 out_free_data:
-- 
2.28.0.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..e47012006dcc 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
-	if (prot & IOMMU_CACHE)
+	if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
 		pte |= ARM_LPAE_PTE_SH_IS;
 	else
 		pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1049,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
 					  ARM_MALI_LPAE_TTBR_READ_INNER |
 					  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+	if (cfg->coherent_walk)
+		cfg->arm_mali_lpae_cfg.transtab |= ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
 	return &data->iop;
 
 out_free_data:
-- 
2.28.0.dirty


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
  2020-09-15 23:51 ` Robin Murphy
  (?)
  (?)
@ 2020-09-15 23:51   ` Robin Murphy
  -1 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

When the GPU's ACE-Lite interface is fully wired up and capable of
snooping CPU caches, it may be described as "dma-coherent" in
devicetree, which will already inform the DMA layer not to perform
unnecessary cache maintenance. However, we still need to ensure that
the GPU uses the appropriate cacheable outer-shareable attributes in
order to generate the requisite snoop signals, and that CPU mappings
don't create a mismatch by using a non-cacheable type either.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
 4 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index c30c719a8059..b31f45315e96 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -84,6 +84,7 @@ struct panfrost_device {
 	/* pm_domains for devices with more than one. */
 	struct device *pm_domain_devs[MAX_PM_DOMAINS];
 	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+	bool coherent;
 
 	struct panfrost_features features;
 	const struct panfrost_compatible *comp;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ada51df9a7a3..2a6f2f716b2f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (!pfdev->comp)
 		return -ENODEV;
 
+	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
+
 	/* Allocate and initialze the DRM device. */
 	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
 	if (IS_ERR(ddev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..cdf1a8754eba 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
  */
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
 {
+	struct panfrost_device *pfdev = dev->dev_private;
 	struct panfrost_gem_object *obj;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
 	INIT_LIST_HEAD(&obj->mappings.list);
 	mutex_init(&obj->mappings.lock);
 	obj->base.base.funcs = &panfrost_gem_funcs;
+	obj->base.map_cached = pfdev->coherent;
 
 	return &obj->base.base;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..8852fd378f7a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
+		.coherent_walk	= pfdev->coherent,
 		.tlb		= &mmu_tlb_ops,
 		.iommu_dev	= pfdev->dev,
 	};
-- 
2.28.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

When the GPU's ACE-Lite interface is fully wired up and capable of
snooping CPU caches, it may be described as "dma-coherent" in
devicetree, which will already inform the DMA layer not to perform
unnecessary cache maintenance. However, we still need to ensure that
the GPU uses the appropriate cacheable outer-shareable attributes in
order to generate the requisite snoop signals, and that CPU mappings
don't create a mismatch by using a non-cacheable type either.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
 4 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index c30c719a8059..b31f45315e96 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -84,6 +84,7 @@ struct panfrost_device {
 	/* pm_domains for devices with more than one. */
 	struct device *pm_domain_devs[MAX_PM_DOMAINS];
 	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+	bool coherent;
 
 	struct panfrost_features features;
 	const struct panfrost_compatible *comp;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ada51df9a7a3..2a6f2f716b2f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (!pfdev->comp)
 		return -ENODEV;
 
+	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
+
 	/* Allocate and initialze the DRM device. */
 	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
 	if (IS_ERR(ddev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..cdf1a8754eba 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
  */
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
 {
+	struct panfrost_device *pfdev = dev->dev_private;
 	struct panfrost_gem_object *obj;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
 	INIT_LIST_HEAD(&obj->mappings.list);
 	mutex_init(&obj->mappings.lock);
 	obj->base.base.funcs = &panfrost_gem_funcs;
+	obj->base.map_cached = pfdev->coherent;
 
 	return &obj->base.base;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..8852fd378f7a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
+		.coherent_walk	= pfdev->coherent,
 		.tlb		= &mmu_tlb_ops,
 		.iommu_dev	= pfdev->dev,
 	};
-- 
2.28.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

When the GPU's ACE-Lite interface is fully wired up and capable of
snooping CPU caches, it may be described as "dma-coherent" in
devicetree, which will already inform the DMA layer not to perform
unnecessary cache maintenance. However, we still need to ensure that
the GPU uses the appropriate cacheable outer-shareable attributes in
order to generate the requisite snoop signals, and that CPU mappings
don't create a mismatch by using a non-cacheable type either.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
 4 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index c30c719a8059..b31f45315e96 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -84,6 +84,7 @@ struct panfrost_device {
 	/* pm_domains for devices with more than one. */
 	struct device *pm_domain_devs[MAX_PM_DOMAINS];
 	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+	bool coherent;
 
 	struct panfrost_features features;
 	const struct panfrost_compatible *comp;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ada51df9a7a3..2a6f2f716b2f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (!pfdev->comp)
 		return -ENODEV;
 
+	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
+
 	/* Allocate and initialze the DRM device. */
 	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
 	if (IS_ERR(ddev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..cdf1a8754eba 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
  */
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
 {
+	struct panfrost_device *pfdev = dev->dev_private;
 	struct panfrost_gem_object *obj;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
 	INIT_LIST_HEAD(&obj->mappings.list);
 	mutex_init(&obj->mappings.lock);
 	obj->base.base.funcs = &panfrost_gem_funcs;
+	obj->base.map_cached = pfdev->coherent;
 
 	return &obj->base.base;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..8852fd378f7a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
+		.coherent_walk	= pfdev->coherent,
 		.tlb		= &mmu_tlb_ops,
 		.iommu_dev	= pfdev->dev,
 	};
-- 
2.28.0.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

When the GPU's ACE-Lite interface is fully wired up and capable of
snooping CPU caches, it may be described as "dma-coherent" in
devicetree, which will already inform the DMA layer not to perform
unnecessary cache maintenance. However, we still need to ensure that
the GPU uses the appropriate cacheable outer-shareable attributes in
order to generate the requisite snoop signals, and that CPU mappings
don't create a mismatch by using a non-cacheable type either.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
 4 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index c30c719a8059..b31f45315e96 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -84,6 +84,7 @@ struct panfrost_device {
 	/* pm_domains for devices with more than one. */
 	struct device *pm_domain_devs[MAX_PM_DOMAINS];
 	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+	bool coherent;
 
 	struct panfrost_features features;
 	const struct panfrost_compatible *comp;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ada51df9a7a3..2a6f2f716b2f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
 	if (!pfdev->comp)
 		return -ENODEV;
 
+	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
+
 	/* Allocate and initialze the DRM device. */
 	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
 	if (IS_ERR(ddev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..cdf1a8754eba 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
  */
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
 {
+	struct panfrost_device *pfdev = dev->dev_private;
 	struct panfrost_gem_object *obj;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
 	INIT_LIST_HEAD(&obj->mappings.list);
 	mutex_init(&obj->mappings.lock);
 	obj->base.base.funcs = &panfrost_gem_funcs;
+	obj->base.map_cached = pfdev->coherent;
 
 	return &obj->base.base;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..8852fd378f7a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
 		.pgsize_bitmap	= SZ_4K | SZ_2M,
 		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
 		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
+		.coherent_walk	= pfdev->coherent,
 		.tlb		= &mmu_tlb_ops,
 		.iommu_dev	= pfdev->dev,
 	};
-- 
2.28.0.dirty


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-09-15 23:51 ` Robin Murphy
  (?)
  (?)
@ 2020-09-15 23:51   ` Robin Murphy
  -1 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

According to a downstream commit I found in the Khadas vendor kernel,
the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
how to handle this properly) we should describe it as such. Otherwise
the mismatch leads to all manner of fun with mismatched attributes and
inadvertently snooping stale data from caches, which would account for
at least some of the brokenness observed on this platform.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
index 9b8548e5f6e5..ee8fcae9f9f0 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
@@ -135,3 +135,7 @@ map1 {
 		};
 	};
 };
+
+&mali {
+	dma-coherent;
+};
-- 
2.28.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

According to a downstream commit I found in the Khadas vendor kernel,
the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
how to handle this properly) we should describe it as such. Otherwise
the mismatch leads to all manner of fun with mismatched attributes and
inadvertently snooping stale data from caches, which would account for
at least some of the brokenness observed on this platform.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
index 9b8548e5f6e5..ee8fcae9f9f0 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
@@ -135,3 +135,7 @@ map1 {
 		};
 	};
 };
+
+&mali {
+	dma-coherent;
+};
-- 
2.28.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

According to a downstream commit I found in the Khadas vendor kernel,
the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
how to handle this properly) we should describe it as such. Otherwise
the mismatch leads to all manner of fun with mismatched attributes and
inadvertently snooping stale data from caches, which would account for
at least some of the brokenness observed on this platform.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
index 9b8548e5f6e5..ee8fcae9f9f0 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
@@ -135,3 +135,7 @@ map1 {
 		};
 	};
 };
+
+&mali {
+	dma-coherent;
+};
-- 
2.28.0.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-15 23:51   ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-15 23:51 UTC (permalink / raw)
  To: will, robh, tomeu.vizoso, steven.price, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

According to a downstream commit I found in the Khadas vendor kernel,
the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
how to handle this properly) we should describe it as such. Otherwise
the mismatch leads to all manner of fun with mismatched attributes and
inadvertently snooping stale data from caches, which would account for
at least some of the brokenness observed on this platform.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
index 9b8548e5f6e5..ee8fcae9f9f0 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
@@ -135,3 +135,7 @@ map1 {
 		};
 	};
 };
+
+&mali {
+	dma-coherent;
+};
-- 
2.28.0.dirty


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-09-15 23:51   ` Robin Murphy
  (?)
  (?)
@ 2020-09-16  8:26     ` Neil Armstrong
  -1 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16  8:26 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Neil
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16  8:26     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16  8:26 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Neil

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16  8:26     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16  8:26 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Neil
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16  8:26     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16  8:26 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Neil

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-15 23:51 ` Robin Murphy
  (?)
  (?)
@ 2020-09-16 14:54   ` Neil Armstrong
  -1 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> Hi all,
> 
> I polished up my original proof-of-concept a little while back, but now
> that I've got my hands on my Juno again I've been able to actually test
> it to my satisfaction, so here are proper patches!

I tested on the Kkadas VIM3, and yes it fixes the random FAULTS I have *without*:
[  152.417127] panfrost ffe40000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x58, head=0x3091400, tail=0x3091400, sched_job=000000004d83c2d7
[  152.530928] panfrost ffe40000.gpu: js fault, js=1, status=INSTR_INVALID_ENC, head=0x30913c0, tail=0x30913c0
[  152.539797] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x51, head=0x30913c0, tail=0x30913c0, sched_job=0000000038cecaf6
[  156.943505] panfrost ffe40000.gpu: js fault, js=0, status=TILE_RANGE_FAULT, head=0x3091400, tail=0x3091400

but, with this patchset, I get the following fps with kmscube:
Rendered 97 frames in 2.016291 sec (48.108145 fps)
Rendered 206 frames in 4.016723 sec (51.285584 fps)
Rendered 316 frames in 6.017208 sec (52.516052 fps)
Rendered 430 frames in 8.017456 sec (53.632975 fps)

but when I resurrect my BROKEN_NS patchset (simply disabling shareability), I get:
Rendered 120 frames in 2.000143 sec (59.995724 fps)
Rendered 241 frames in 4.016760 sec (59.998605 fps)
Rendered 362 frames in 6.033443 sec (59.998911 fps)
Rendered 482 frames in 8.033531 sec (59.998522 fps)

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

So:
Tested-by: Neil Armstrong <narmstrong@baylibre.com>

Neil

> 
> It probably makes sense for patches #1 and #2 to stay together and both
> go via drm-misc, provided Will's OK with that.
> 
> Robin.
> 
> 
> Robin Murphy (3):
>   iommu/io-pgtable-arm: Support coherency for Mali LPAE
>   drm/panfrost: Support cache-coherent integrations
>   arm64: dts: meson: Describe G12b GPU as coherent
> 
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
>  drivers/iommu/io-pgtable-arm.c              | 5 ++++-
>  6 files changed, 14 insertions(+), 1 deletion(-)
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 14:54   ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> Hi all,
> 
> I polished up my original proof-of-concept a little while back, but now
> that I've got my hands on my Juno again I've been able to actually test
> it to my satisfaction, so here are proper patches!

I tested on the Kkadas VIM3, and yes it fixes the random FAULTS I have *without*:
[  152.417127] panfrost ffe40000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x58, head=0x3091400, tail=0x3091400, sched_job=000000004d83c2d7
[  152.530928] panfrost ffe40000.gpu: js fault, js=1, status=INSTR_INVALID_ENC, head=0x30913c0, tail=0x30913c0
[  152.539797] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x51, head=0x30913c0, tail=0x30913c0, sched_job=0000000038cecaf6
[  156.943505] panfrost ffe40000.gpu: js fault, js=0, status=TILE_RANGE_FAULT, head=0x3091400, tail=0x3091400

but, with this patchset, I get the following fps with kmscube:
Rendered 97 frames in 2.016291 sec (48.108145 fps)
Rendered 206 frames in 4.016723 sec (51.285584 fps)
Rendered 316 frames in 6.017208 sec (52.516052 fps)
Rendered 430 frames in 8.017456 sec (53.632975 fps)

but when I resurrect my BROKEN_NS patchset (simply disabling shareability), I get:
Rendered 120 frames in 2.000143 sec (59.995724 fps)
Rendered 241 frames in 4.016760 sec (59.998605 fps)
Rendered 362 frames in 6.033443 sec (59.998911 fps)
Rendered 482 frames in 8.033531 sec (59.998522 fps)

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

So:
Tested-by: Neil Armstrong <narmstrong@baylibre.com>

Neil

> 
> It probably makes sense for patches #1 and #2 to stay together and both
> go via drm-misc, provided Will's OK with that.
> 
> Robin.
> 
> 
> Robin Murphy (3):
>   iommu/io-pgtable-arm: Support coherency for Mali LPAE
>   drm/panfrost: Support cache-coherent integrations
>   arm64: dts: meson: Describe G12b GPU as coherent
> 
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
>  drivers/iommu/io-pgtable-arm.c              | 5 ++++-
>  6 files changed, 14 insertions(+), 1 deletion(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 14:54   ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> Hi all,
> 
> I polished up my original proof-of-concept a little while back, but now
> that I've got my hands on my Juno again I've been able to actually test
> it to my satisfaction, so here are proper patches!

I tested on the Kkadas VIM3, and yes it fixes the random FAULTS I have *without*:
[  152.417127] panfrost ffe40000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x58, head=0x3091400, tail=0x3091400, sched_job=000000004d83c2d7
[  152.530928] panfrost ffe40000.gpu: js fault, js=1, status=INSTR_INVALID_ENC, head=0x30913c0, tail=0x30913c0
[  152.539797] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x51, head=0x30913c0, tail=0x30913c0, sched_job=0000000038cecaf6
[  156.943505] panfrost ffe40000.gpu: js fault, js=0, status=TILE_RANGE_FAULT, head=0x3091400, tail=0x3091400

but, with this patchset, I get the following fps with kmscube:
Rendered 97 frames in 2.016291 sec (48.108145 fps)
Rendered 206 frames in 4.016723 sec (51.285584 fps)
Rendered 316 frames in 6.017208 sec (52.516052 fps)
Rendered 430 frames in 8.017456 sec (53.632975 fps)

but when I resurrect my BROKEN_NS patchset (simply disabling shareability), I get:
Rendered 120 frames in 2.000143 sec (59.995724 fps)
Rendered 241 frames in 4.016760 sec (59.998605 fps)
Rendered 362 frames in 6.033443 sec (59.998911 fps)
Rendered 482 frames in 8.033531 sec (59.998522 fps)

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

So:
Tested-by: Neil Armstrong <narmstrong@baylibre.com>

Neil

> 
> It probably makes sense for patches #1 and #2 to stay together and both
> go via drm-misc, provided Will's OK with that.
> 
> Robin.
> 
> 
> Robin Murphy (3):
>   iommu/io-pgtable-arm: Support coherency for Mali LPAE
>   drm/panfrost: Support cache-coherent integrations
>   arm64: dts: meson: Describe G12b GPU as coherent
> 
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
>  drivers/iommu/io-pgtable-arm.c              | 5 ++++-
>  6 files changed, 14 insertions(+), 1 deletion(-)
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 14:54   ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> Hi all,
> 
> I polished up my original proof-of-concept a little while back, but now
> that I've got my hands on my Juno again I've been able to actually test
> it to my satisfaction, so here are proper patches!

I tested on the Kkadas VIM3, and yes it fixes the random FAULTS I have *without*:
[  152.417127] panfrost ffe40000.gpu: gpu sched timeout, js=0, config=0x7300, status=0x58, head=0x3091400, tail=0x3091400, sched_job=000000004d83c2d7
[  152.530928] panfrost ffe40000.gpu: js fault, js=1, status=INSTR_INVALID_ENC, head=0x30913c0, tail=0x30913c0
[  152.539797] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x51, head=0x30913c0, tail=0x30913c0, sched_job=0000000038cecaf6
[  156.943505] panfrost ffe40000.gpu: js fault, js=0, status=TILE_RANGE_FAULT, head=0x3091400, tail=0x3091400

but, with this patchset, I get the following fps with kmscube:
Rendered 97 frames in 2.016291 sec (48.108145 fps)
Rendered 206 frames in 4.016723 sec (51.285584 fps)
Rendered 316 frames in 6.017208 sec (52.516052 fps)
Rendered 430 frames in 8.017456 sec (53.632975 fps)

but when I resurrect my BROKEN_NS patchset (simply disabling shareability), I get:
Rendered 120 frames in 2.000143 sec (59.995724 fps)
Rendered 241 frames in 4.016760 sec (59.998605 fps)
Rendered 362 frames in 6.033443 sec (59.998911 fps)
Rendered 482 frames in 8.033531 sec (59.998522 fps)

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

So:
Tested-by: Neil Armstrong <narmstrong@baylibre.com>

Neil

> 
> It probably makes sense for patches #1 and #2 to stay together and both
> go via drm-misc, provided Will's OK with that.
> 
> Robin.
> 
> 
> Robin Murphy (3):
>   iommu/io-pgtable-arm: Support coherency for Mali LPAE
>   drm/panfrost: Support cache-coherent integrations
>   arm64: dts: meson: Describe G12b GPU as coherent
> 
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.c     | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c     | 1 +
>  drivers/iommu/io-pgtable-arm.c              | 5 ++++-
>  6 files changed, 14 insertions(+), 1 deletion(-)
> 


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-09-15 23:51   ` Robin Murphy
  (?)
  (?)
@ 2020-09-16 14:54     ` Neil Armstrong
  -1 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16 14:54     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16 14:54     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-09-16 14:54     ` Neil Armstrong
  0 siblings, 0 replies; 88+ messages in thread
From: Neil Armstrong @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, steven.price,
	alyssa.rosenzweig, khilman, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>  		};
>  	};
>  };
> +
> +&mali {
> +	dma-coherent;
> +};
> 

Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-16 14:54   ` Neil Armstrong
  (?)
  (?)
@ 2020-09-16 17:04     ` Alyssa Rosenzweig
  -1 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-16 17:04 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: robh, tomeu.vizoso, Robin Murphy, dri-devel, steven.price, iommu,
	khilman, linux-amlogic, will, linux-arm-kernel, jbrunet


[-- Attachment #1.1: Type: text/plain, Size: 168 bytes --]

> So I get a performance regression with the dma-coherent approach, even if it's
> clearly the cleaner.

That's bizarre -- this should really be the faster of the two.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:04     ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-16 17:04 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: robh, tomeu.vizoso, Robin Murphy, dri-devel, steven.price, iommu,
	khilman, linux-amlogic, will, linux-arm-kernel, jbrunet


[-- Attachment #1.1: Type: text/plain, Size: 168 bytes --]

> So I get a performance regression with the dma-coherent approach, even if it's
> clearly the cleaner.

That's bizarre -- this should really be the faster of the two.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:04     ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-16 17:04 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: tomeu.vizoso, Robin Murphy, dri-devel, steven.price, iommu,
	khilman, linux-amlogic, will, linux-arm-kernel, jbrunet


[-- Attachment #1.1: Type: text/plain, Size: 168 bytes --]

> So I get a performance regression with the dma-coherent approach, even if it's
> clearly the cleaner.

That's bizarre -- this should really be the faster of the two.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:04     ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-16 17:04 UTC (permalink / raw)
  To: Neil Armstrong
  Cc: robh, tomeu.vizoso, Robin Murphy, dri-devel, steven.price, iommu,
	khilman, linux-amlogic, will, linux-arm-kernel, jbrunet


[-- Attachment #1.1: Type: text/plain, Size: 168 bytes --]

> So I get a performance regression with the dma-coherent approach, even if it's
> clearly the cleaner.

That's bizarre -- this should really be the faster of the two.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 167 bytes --]

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-16 17:04     ` Alyssa Rosenzweig
  (?)
  (?)
@ 2020-09-16 17:46       ` Rob Herring
  -1 siblings, 0 replies; 88+ messages in thread
From: Rob Herring @ 2020-09-16 17:46 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Steven Price, Linux IOMMU, Kevin Hilman,
	open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> > So I get a performance regression with the dma-coherent approach, even if it's
> > clearly the cleaner.
>
> That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.

Rob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:46       ` Rob Herring
  0 siblings, 0 replies; 88+ messages in thread
From: Rob Herring @ 2020-09-16 17:46 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Steven Price, Linux IOMMU, Kevin Hilman,
	open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> > So I get a performance regression with the dma-coherent approach, even if it's
> > clearly the cleaner.
>
> That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:46       ` Rob Herring
  0 siblings, 0 replies; 88+ messages in thread
From: Rob Herring @ 2020-09-16 17:46 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Steven Price, Linux IOMMU, Kevin Hilman,
	open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> > So I get a performance regression with the dma-coherent approach, even if it's
> > clearly the cleaner.
>
> That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.

Rob
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-16 17:46       ` Rob Herring
  0 siblings, 0 replies; 88+ messages in thread
From: Rob Herring @ 2020-09-16 17:46 UTC (permalink / raw)
  To: Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Steven Price, Linux IOMMU, Kevin Hilman,
	open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
<alyssa.rosenzweig@collabora.com> wrote:
>
> > So I get a performance regression with the dma-coherent approach, even if it's
> > clearly the cleaner.
>
> That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.

Rob

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
  2020-09-15 23:51   ` Robin Murphy
  (?)
  (?)
@ 2020-09-17 10:37     ` Steven Price
  -1 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:37 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 00:51, Robin Murphy wrote:
> When the GPU's ACE-Lite interface is fully wired up and capable of
> snooping CPU caches, it may be described as "dma-coherent" in
> devicetree, which will already inform the DMA layer not to perform
> unnecessary cache maintenance. However, we still need to ensure that
> the GPU uses the appropriate cacheable outer-shareable attributes in
> order to generate the requisite snoop signals, and that CPU mappings
> don't create a mismatch by using a non-cacheable type either.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

LGTM

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
>   drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
>   4 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index c30c719a8059..b31f45315e96 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -84,6 +84,7 @@ struct panfrost_device {
>   	/* pm_domains for devices with more than one. */
>   	struct device *pm_domain_devs[MAX_PM_DOMAINS];
>   	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
> +	bool coherent;
>   
>   	struct panfrost_features features;
>   	const struct panfrost_compatible *comp;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index ada51df9a7a3..2a6f2f716b2f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
>   	if (!pfdev->comp)
>   		return -ENODEV;
>   
> +	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
> +
>   	/* Allocate and initialze the DRM device. */
>   	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
>   	if (IS_ERR(ddev))
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 33355dd302f1..cdf1a8754eba 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>    */
>   struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
>   {
> +	struct panfrost_device *pfdev = dev->dev_private;
>   	struct panfrost_gem_object *obj;
>   
>   	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
>   	INIT_LIST_HEAD(&obj->mappings.list);
>   	mutex_init(&obj->mappings.lock);
>   	obj->base.base.funcs = &panfrost_gem_funcs;
> +	obj->base.map_cached = pfdev->coherent;
>   
>   	return &obj->base.base;
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index e8f7b11352d2..8852fd378f7a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
>   		.pgsize_bitmap	= SZ_4K | SZ_2M,
>   		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>   		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> +		.coherent_walk	= pfdev->coherent,
>   		.tlb		= &mmu_tlb_ops,
>   		.iommu_dev	= pfdev->dev,
>   	};
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-17 10:37     ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:37 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 00:51, Robin Murphy wrote:
> When the GPU's ACE-Lite interface is fully wired up and capable of
> snooping CPU caches, it may be described as "dma-coherent" in
> devicetree, which will already inform the DMA layer not to perform
> unnecessary cache maintenance. However, we still need to ensure that
> the GPU uses the appropriate cacheable outer-shareable attributes in
> order to generate the requisite snoop signals, and that CPU mappings
> don't create a mismatch by using a non-cacheable type either.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

LGTM

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
>   drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
>   4 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index c30c719a8059..b31f45315e96 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -84,6 +84,7 @@ struct panfrost_device {
>   	/* pm_domains for devices with more than one. */
>   	struct device *pm_domain_devs[MAX_PM_DOMAINS];
>   	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
> +	bool coherent;
>   
>   	struct panfrost_features features;
>   	const struct panfrost_compatible *comp;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index ada51df9a7a3..2a6f2f716b2f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
>   	if (!pfdev->comp)
>   		return -ENODEV;
>   
> +	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
> +
>   	/* Allocate and initialze the DRM device. */
>   	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
>   	if (IS_ERR(ddev))
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 33355dd302f1..cdf1a8754eba 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>    */
>   struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
>   {
> +	struct panfrost_device *pfdev = dev->dev_private;
>   	struct panfrost_gem_object *obj;
>   
>   	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
>   	INIT_LIST_HEAD(&obj->mappings.list);
>   	mutex_init(&obj->mappings.lock);
>   	obj->base.base.funcs = &panfrost_gem_funcs;
> +	obj->base.map_cached = pfdev->coherent;
>   
>   	return &obj->base.base;
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index e8f7b11352d2..8852fd378f7a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
>   		.pgsize_bitmap	= SZ_4K | SZ_2M,
>   		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>   		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> +		.coherent_walk	= pfdev->coherent,
>   		.tlb		= &mmu_tlb_ops,
>   		.iommu_dev	= pfdev->dev,
>   	};
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-17 10:37     ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:37 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 00:51, Robin Murphy wrote:
> When the GPU's ACE-Lite interface is fully wired up and capable of
> snooping CPU caches, it may be described as "dma-coherent" in
> devicetree, which will already inform the DMA layer not to perform
> unnecessary cache maintenance. However, we still need to ensure that
> the GPU uses the appropriate cacheable outer-shareable attributes in
> order to generate the requisite snoop signals, and that CPU mappings
> don't create a mismatch by using a non-cacheable type either.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

LGTM

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
>   drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
>   4 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index c30c719a8059..b31f45315e96 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -84,6 +84,7 @@ struct panfrost_device {
>   	/* pm_domains for devices with more than one. */
>   	struct device *pm_domain_devs[MAX_PM_DOMAINS];
>   	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
> +	bool coherent;
>   
>   	struct panfrost_features features;
>   	const struct panfrost_compatible *comp;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index ada51df9a7a3..2a6f2f716b2f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
>   	if (!pfdev->comp)
>   		return -ENODEV;
>   
> +	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
> +
>   	/* Allocate and initialze the DRM device. */
>   	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
>   	if (IS_ERR(ddev))
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 33355dd302f1..cdf1a8754eba 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>    */
>   struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
>   {
> +	struct panfrost_device *pfdev = dev->dev_private;
>   	struct panfrost_gem_object *obj;
>   
>   	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
>   	INIT_LIST_HEAD(&obj->mappings.list);
>   	mutex_init(&obj->mappings.lock);
>   	obj->base.base.funcs = &panfrost_gem_funcs;
> +	obj->base.map_cached = pfdev->coherent;
>   
>   	return &obj->base.base;
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index e8f7b11352d2..8852fd378f7a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
>   		.pgsize_bitmap	= SZ_4K | SZ_2M,
>   		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>   		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> +		.coherent_walk	= pfdev->coherent,
>   		.tlb		= &mmu_tlb_ops,
>   		.iommu_dev	= pfdev->dev,
>   	};
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 2/3] drm/panfrost: Support cache-coherent integrations
@ 2020-09-17 10:37     ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:37 UTC (permalink / raw)
  To: Robin Murphy, will, robh, tomeu.vizoso, alyssa.rosenzweig,
	khilman, narmstrong, jbrunet
  Cc: linux-amlogic, iommu, dri-devel, linux-arm-kernel

On 16/09/2020 00:51, Robin Murphy wrote:
> When the GPU's ACE-Lite interface is fully wired up and capable of
> snooping CPU caches, it may be described as "dma-coherent" in
> devicetree, which will already inform the DMA layer not to perform
> unnecessary cache maintenance. However, we still need to ensure that
> the GPU uses the appropriate cacheable outer-shareable attributes in
> order to generate the requisite snoop signals, and that CPU mappings
> don't create a mismatch by using a non-cacheable type either.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

LGTM

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
>   drivers/gpu/drm/panfrost/panfrost_drv.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_gem.c    | 2 ++
>   drivers/gpu/drm/panfrost/panfrost_mmu.c    | 1 +
>   4 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index c30c719a8059..b31f45315e96 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -84,6 +84,7 @@ struct panfrost_device {
>   	/* pm_domains for devices with more than one. */
>   	struct device *pm_domain_devs[MAX_PM_DOMAINS];
>   	struct device_link *pm_domain_links[MAX_PM_DOMAINS];
> +	bool coherent;
>   
>   	struct panfrost_features features;
>   	const struct panfrost_compatible *comp;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index ada51df9a7a3..2a6f2f716b2f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
>   	if (!pfdev->comp)
>   		return -ENODEV;
>   
> +	pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
> +
>   	/* Allocate and initialze the DRM device. */
>   	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
>   	if (IS_ERR(ddev))
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 33355dd302f1..cdf1a8754eba 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>    */
>   struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
>   {
> +	struct panfrost_device *pfdev = dev->dev_private;
>   	struct panfrost_gem_object *obj;
>   
>   	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
>   	INIT_LIST_HEAD(&obj->mappings.list);
>   	mutex_init(&obj->mappings.lock);
>   	obj->base.base.funcs = &panfrost_gem_funcs;
> +	obj->base.map_cached = pfdev->coherent;
>   
>   	return &obj->base.base;
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index e8f7b11352d2..8852fd378f7a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv *priv)
>   		.pgsize_bitmap	= SZ_4K | SZ_2M,
>   		.ias		= FIELD_GET(0xff, pfdev->features.mmu_features),
>   		.oas		= FIELD_GET(0xff00, pfdev->features.mmu_features),
> +		.coherent_walk	= pfdev->coherent,
>   		.tlb		= &mmu_tlb_ops,
>   		.iommu_dev	= pfdev->dev,
>   	};
> 


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-16 17:46       ` Rob Herring
  (?)
  (?)
@ 2020-09-17 10:38         ` Steven Price
  -1 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Linux IOMMU, Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 16/09/2020 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

The DDK blob has the ability to mark only certain areas of memory as 
coherent for performance reasons. For simple things like kmscube I would 
expect that it's basically write-only from the CPU and almost all memory 
the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
and the coherency traffic is probably expensive. Whether the complexity 
is worth it for "real" content I don't know - it may just be silly 
benchmarks that benefit.

Steve
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:38         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Linux IOMMU, Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 16/09/2020 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

The DDK blob has the ability to mark only certain areas of memory as 
coherent for performance reasons. For simple things like kmscube I would 
expect that it's basically write-only from the CPU and almost all memory 
the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
and the coherency traffic is probably expensive. Whether the complexity 
is worth it for "real" content I don't know - it may just be silly 
benchmarks that benefit.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:38         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Linux IOMMU, Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 16/09/2020 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

The DDK blob has the ability to mark only certain areas of memory as 
coherent for performance reasons. For simple things like kmscube I would 
expect that it's basically write-only from the CPU and almost all memory 
the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
and the coherency traffic is probably expensive. Whether the complexity 
is worth it for "real" content I don't know - it may just be silly 
benchmarks that benefit.

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:38         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 10:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	Linux IOMMU, Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 16/09/2020 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

The DDK blob has the ability to mark only certain areas of memory as 
coherent for performance reasons. For simple things like kmscube I would 
expect that it's basically write-only from the CPU and almost all memory 
the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
and the coherency traffic is probably expensive. Whether the complexity 
is worth it for "real" content I don't know - it may just be silly 
benchmarks that benefit.

Steve

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-17 10:38         ` Steven Price
  (?)
  (?)
@ 2020-09-17 10:51           ` Tomeu Vizoso
  -1 siblings, 0 replies; 88+ messages in thread
From: Tomeu Vizoso @ 2020-09-17 10:51 UTC (permalink / raw)
  To: Steven Price, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 9/17/20 12:38 PM, Steven Price wrote:
> On 16/09/2020 18:46, Rob Herring wrote:
>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>> <alyssa.rosenzweig@collabora.com> wrote:
>>>
>>>> So I get a performance regression with the dma-coherent approach, 
>>>> even if it's
>>>> clearly the cleaner.
>>>
>>> That's bizarre -- this should really be the faster of the two.
>>
>> Coherency may not be free. CortexA9 had something like 4x slower
>> memcpy if SMP was enabled as an example. I don't know if there's
>> anything going on like that specifically here. If there's never any
>> CPU accesses mixed in with kmscube, then there would be no benefit to
>> coherency.
> 
> The DDK blob has the ability to mark only certain areas of memory as 
> coherent for performance reasons. For simple things like kmscube I would 
> expect that it's basically write-only from the CPU and almost all memory 
> the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
> and the coherency traffic is probably expensive. Whether the complexity 
> is worth it for "real" content I don't know - it may just be silly 
> benchmarks that benefit.

Or maybe it's only a problem for applications that do silly things? I 
don't think kmscube was ever optimized for performance.

Regards,

Tomeu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:51           ` Tomeu Vizoso
  0 siblings, 0 replies; 88+ messages in thread
From: Tomeu Vizoso @ 2020-09-17 10:51 UTC (permalink / raw)
  To: Steven Price, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 9/17/20 12:38 PM, Steven Price wrote:
> On 16/09/2020 18:46, Rob Herring wrote:
>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>> <alyssa.rosenzweig@collabora.com> wrote:
>>>
>>>> So I get a performance regression with the dma-coherent approach, 
>>>> even if it's
>>>> clearly the cleaner.
>>>
>>> That's bizarre -- this should really be the faster of the two.
>>
>> Coherency may not be free. CortexA9 had something like 4x slower
>> memcpy if SMP was enabled as an example. I don't know if there's
>> anything going on like that specifically here. If there's never any
>> CPU accesses mixed in with kmscube, then there would be no benefit to
>> coherency.
> 
> The DDK blob has the ability to mark only certain areas of memory as 
> coherent for performance reasons. For simple things like kmscube I would 
> expect that it's basically write-only from the CPU and almost all memory 
> the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
> and the coherency traffic is probably expensive. Whether the complexity 
> is worth it for "real" content I don't know - it may just be silly 
> benchmarks that benefit.

Or maybe it's only a problem for applications that do silly things? I 
don't think kmscube was ever optimized for performance.

Regards,

Tomeu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:51           ` Tomeu Vizoso
  0 siblings, 0 replies; 88+ messages in thread
From: Tomeu Vizoso @ 2020-09-17 10:51 UTC (permalink / raw)
  To: Steven Price, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 9/17/20 12:38 PM, Steven Price wrote:
> On 16/09/2020 18:46, Rob Herring wrote:
>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>> <alyssa.rosenzweig@collabora.com> wrote:
>>>
>>>> So I get a performance regression with the dma-coherent approach, 
>>>> even if it's
>>>> clearly the cleaner.
>>>
>>> That's bizarre -- this should really be the faster of the two.
>>
>> Coherency may not be free. CortexA9 had something like 4x slower
>> memcpy if SMP was enabled as an example. I don't know if there's
>> anything going on like that specifically here. If there's never any
>> CPU accesses mixed in with kmscube, then there would be no benefit to
>> coherency.
> 
> The DDK blob has the ability to mark only certain areas of memory as 
> coherent for performance reasons. For simple things like kmscube I would 
> expect that it's basically write-only from the CPU and almost all memory 
> the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
> and the coherency traffic is probably expensive. Whether the complexity 
> is worth it for "real" content I don't know - it may just be silly 
> benchmarks that benefit.

Or maybe it's only a problem for applications that do silly things? I 
don't think kmscube was ever optimized for performance.

Regards,

Tomeu
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 10:51           ` Tomeu Vizoso
  0 siblings, 0 replies; 88+ messages in thread
From: Tomeu Vizoso @ 2020-09-17 10:51 UTC (permalink / raw)
  To: Steven Price, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 9/17/20 12:38 PM, Steven Price wrote:
> On 16/09/2020 18:46, Rob Herring wrote:
>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>> <alyssa.rosenzweig@collabora.com> wrote:
>>>
>>>> So I get a performance regression with the dma-coherent approach, 
>>>> even if it's
>>>> clearly the cleaner.
>>>
>>> That's bizarre -- this should really be the faster of the two.
>>
>> Coherency may not be free. CortexA9 had something like 4x slower
>> memcpy if SMP was enabled as an example. I don't know if there's
>> anything going on like that specifically here. If there's never any
>> CPU accesses mixed in with kmscube, then there would be no benefit to
>> coherency.
> 
> The DDK blob has the ability to mark only certain areas of memory as 
> coherent for performance reasons. For simple things like kmscube I would 
> expect that it's basically write-only from the CPU and almost all memory 
> the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
> and the coherency traffic is probably expensive. Whether the complexity 
> is worth it for "real" content I don't know - it may just be silly 
> benchmarks that benefit.

Or maybe it's only a problem for applications that do silly things? I 
don't think kmscube was ever optimized for performance.

Regards,

Tomeu

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-17 10:51           ` Tomeu Vizoso
  (?)
  (?)
@ 2020-09-17 11:00             ` Steven Price
  -1 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 11:00 UTC (permalink / raw)
  To: Tomeu Vizoso, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 17/09/2020 11:51, Tomeu Vizoso wrote:
> On 9/17/20 12:38 PM, Steven Price wrote:
>> On 16/09/2020 18:46, Rob Herring wrote:
>>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>>> <alyssa.rosenzweig@collabora.com> wrote:
>>>>
>>>>> So I get a performance regression with the dma-coherent approach, 
>>>>> even if it's
>>>>> clearly the cleaner.
>>>>
>>>> That's bizarre -- this should really be the faster of the two.
>>>
>>> Coherency may not be free. CortexA9 had something like 4x slower
>>> memcpy if SMP was enabled as an example. I don't know if there's
>>> anything going on like that specifically here. If there's never any
>>> CPU accesses mixed in with kmscube, then there would be no benefit to
>>> coherency.
>>
>> The DDK blob has the ability to mark only certain areas of memory as 
>> coherent for performance reasons. For simple things like kmscube I 
>> would expect that it's basically write-only from the CPU and almost 
>> all memory the GPU touches isn't touched by the CPU. I.e. coherency 
>> isn't helping and the coherency traffic is probably expensive. Whether 
>> the complexity is worth it for "real" content I don't know - it may 
>> just be silly benchmarks that benefit.
> 
> Or maybe it's only a problem for applications that do silly things? I 
> don't think kmscube was ever optimized for performance.

Well doing silly things is almost the definition of a benchmark ;) A lot 
of the mobile graphics benchmarks suffer from not being very intelligent 
in how they render (e.g. many have meshes that are far too detailed so 
the triangles are smaller than the pixels).

Of course there are also applications that get things wrong too.

Steve
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 11:00             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 11:00 UTC (permalink / raw)
  To: Tomeu Vizoso, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 17/09/2020 11:51, Tomeu Vizoso wrote:
> On 9/17/20 12:38 PM, Steven Price wrote:
>> On 16/09/2020 18:46, Rob Herring wrote:
>>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>>> <alyssa.rosenzweig@collabora.com> wrote:
>>>>
>>>>> So I get a performance regression with the dma-coherent approach, 
>>>>> even if it's
>>>>> clearly the cleaner.
>>>>
>>>> That's bizarre -- this should really be the faster of the two.
>>>
>>> Coherency may not be free. CortexA9 had something like 4x slower
>>> memcpy if SMP was enabled as an example. I don't know if there's
>>> anything going on like that specifically here. If there's never any
>>> CPU accesses mixed in with kmscube, then there would be no benefit to
>>> coherency.
>>
>> The DDK blob has the ability to mark only certain areas of memory as 
>> coherent for performance reasons. For simple things like kmscube I 
>> would expect that it's basically write-only from the CPU and almost 
>> all memory the GPU touches isn't touched by the CPU. I.e. coherency 
>> isn't helping and the coherency traffic is probably expensive. Whether 
>> the complexity is worth it for "real" content I don't know - it may 
>> just be silly benchmarks that benefit.
> 
> Or maybe it's only a problem for applications that do silly things? I 
> don't think kmscube was ever optimized for performance.

Well doing silly things is almost the definition of a benchmark ;) A lot 
of the mobile graphics benchmarks suffer from not being very intelligent 
in how they render (e.g. many have meshes that are far too detailed so 
the triangles are smaller than the pixels).

Of course there are also applications that get things wrong too.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 11:00             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 11:00 UTC (permalink / raw)
  To: Tomeu Vizoso, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 17/09/2020 11:51, Tomeu Vizoso wrote:
> On 9/17/20 12:38 PM, Steven Price wrote:
>> On 16/09/2020 18:46, Rob Herring wrote:
>>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>>> <alyssa.rosenzweig@collabora.com> wrote:
>>>>
>>>>> So I get a performance regression with the dma-coherent approach, 
>>>>> even if it's
>>>>> clearly the cleaner.
>>>>
>>>> That's bizarre -- this should really be the faster of the two.
>>>
>>> Coherency may not be free. CortexA9 had something like 4x slower
>>> memcpy if SMP was enabled as an example. I don't know if there's
>>> anything going on like that specifically here. If there's never any
>>> CPU accesses mixed in with kmscube, then there would be no benefit to
>>> coherency.
>>
>> The DDK blob has the ability to mark only certain areas of memory as 
>> coherent for performance reasons. For simple things like kmscube I 
>> would expect that it's basically write-only from the CPU and almost 
>> all memory the GPU touches isn't touched by the CPU. I.e. coherency 
>> isn't helping and the coherency traffic is probably expensive. Whether 
>> the complexity is worth it for "real" content I don't know - it may 
>> just be silly benchmarks that benefit.
> 
> Or maybe it's only a problem for applications that do silly things? I 
> don't think kmscube was ever optimized for performance.

Well doing silly things is almost the definition of a benchmark ;) A lot 
of the mobile graphics benchmarks suffer from not being very intelligent 
in how they render (e.g. many have meshes that are far too detailed so 
the triangles are smaller than the pixels).

Of course there are also applications that get things wrong too.

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 11:00             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-09-17 11:00 UTC (permalink / raw)
  To: Tomeu Vizoso, Rob Herring, Alyssa Rosenzweig
  Cc: Neil Armstrong, Robin Murphy, dri-devel, Linux IOMMU,
	Kevin Hilman, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 17/09/2020 11:51, Tomeu Vizoso wrote:
> On 9/17/20 12:38 PM, Steven Price wrote:
>> On 16/09/2020 18:46, Rob Herring wrote:
>>> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
>>> <alyssa.rosenzweig@collabora.com> wrote:
>>>>
>>>>> So I get a performance regression with the dma-coherent approach, 
>>>>> even if it's
>>>>> clearly the cleaner.
>>>>
>>>> That's bizarre -- this should really be the faster of the two.
>>>
>>> Coherency may not be free. CortexA9 had something like 4x slower
>>> memcpy if SMP was enabled as an example. I don't know if there's
>>> anything going on like that specifically here. If there's never any
>>> CPU accesses mixed in with kmscube, then there would be no benefit to
>>> coherency.
>>
>> The DDK blob has the ability to mark only certain areas of memory as 
>> coherent for performance reasons. For simple things like kmscube I 
>> would expect that it's basically write-only from the CPU and almost 
>> all memory the GPU touches isn't touched by the CPU. I.e. coherency 
>> isn't helping and the coherency traffic is probably expensive. Whether 
>> the complexity is worth it for "real" content I don't know - it may 
>> just be silly benchmarks that benefit.
> 
> Or maybe it's only a problem for applications that do silly things? I 
> don't think kmscube was ever optimized for performance.

Well doing silly things is almost the definition of a benchmark ;) A lot 
of the mobile graphics benchmarks suffer from not being very intelligent 
in how they render (e.g. many have meshes that are far too detailed so 
the triangles are smaller than the pixels).

Of course there are also applications that get things wrong too.

Steve

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-17 10:38         ` Steven Price
  (?)
  (?)
@ 2020-09-17 12:03           ` Alyssa Rosenzweig
  -1 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-17 12:03 UTC (permalink / raw)
  To: Steven Price
  Cc: Rob Herring, Tomeu Vizoso, Neil Armstrong, Kevin Hilman,
	Linux IOMMU, dri-devel, Will Deacon,
	open list:ARM/Amlogic Meson...,
	Robin Murphy, linux-arm-kernel, Jerome Brunet


[-- Attachment #1.1: Type: text/plain, Size: 846 bytes --]

> The DDK blob has the ability to mark only certain areas of memory as
> coherent for performance reasons. For simple things like kmscube I would
> expect that it's basically write-only from the CPU and almost all memory the
> GPU touches isn't touched by the CPU. I.e. coherency isn't helping and the
> coherency traffic is probably expensive. Whether the complexity is worth it
> for "real" content I don't know - it may just be silly benchmarks that
> benefit.

Right, Panfrost userspace specifically assumes GPU reads to be expensive
and treats GPU memory as write-only *except* for a few special cases
(compute-like workloads, glReadPixels, some blits, etc).

The vast majority of the GPU memory - everything used in kmscube - will be
write-only to the CPU and fed directly into the display zero-copy (or
read by the GPU later as a dmabuf).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:03           ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-17 12:03 UTC (permalink / raw)
  To: Steven Price
  Cc: Rob Herring, Tomeu Vizoso, Neil Armstrong, Kevin Hilman,
	Linux IOMMU, dri-devel, Will Deacon,
	open list:ARM/Amlogic Meson...,
	Robin Murphy, linux-arm-kernel, Jerome Brunet


[-- Attachment #1.1: Type: text/plain, Size: 846 bytes --]

> The DDK blob has the ability to mark only certain areas of memory as
> coherent for performance reasons. For simple things like kmscube I would
> expect that it's basically write-only from the CPU and almost all memory the
> GPU touches isn't touched by the CPU. I.e. coherency isn't helping and the
> coherency traffic is probably expensive. Whether the complexity is worth it
> for "real" content I don't know - it may just be silly benchmarks that
> benefit.

Right, Panfrost userspace specifically assumes GPU reads to be expensive
and treats GPU memory as write-only *except* for a few special cases
(compute-like workloads, glReadPixels, some blits, etc).

The vast majority of the GPU memory - everything used in kmscube - will be
write-only to the CPU and fed directly into the display zero-copy (or
read by the GPU later as a dmabuf).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:03           ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-17 12:03 UTC (permalink / raw)
  To: Steven Price
  Cc: Tomeu Vizoso, Neil Armstrong, Kevin Hilman, Linux IOMMU,
	dri-devel, Will Deacon, open list:ARM/Amlogic Meson...,
	Robin Murphy, linux-arm-kernel, Jerome Brunet


[-- Attachment #1.1: Type: text/plain, Size: 846 bytes --]

> The DDK blob has the ability to mark only certain areas of memory as
> coherent for performance reasons. For simple things like kmscube I would
> expect that it's basically write-only from the CPU and almost all memory the
> GPU touches isn't touched by the CPU. I.e. coherency isn't helping and the
> coherency traffic is probably expensive. Whether the complexity is worth it
> for "real" content I don't know - it may just be silly benchmarks that
> benefit.

Right, Panfrost userspace specifically assumes GPU reads to be expensive
and treats GPU memory as write-only *except* for a few special cases
(compute-like workloads, glReadPixels, some blits, etc).

The vast majority of the GPU memory - everything used in kmscube - will be
write-only to the CPU and fed directly into the display zero-copy (or
read by the GPU later as a dmabuf).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:03           ` Alyssa Rosenzweig
  0 siblings, 0 replies; 88+ messages in thread
From: Alyssa Rosenzweig @ 2020-09-17 12:03 UTC (permalink / raw)
  To: Steven Price
  Cc: Rob Herring, Tomeu Vizoso, Neil Armstrong, Kevin Hilman,
	Linux IOMMU, dri-devel, Will Deacon,
	open list:ARM/Amlogic Meson...,
	Robin Murphy, linux-arm-kernel, Jerome Brunet


[-- Attachment #1.1: Type: text/plain, Size: 846 bytes --]

> The DDK blob has the ability to mark only certain areas of memory as
> coherent for performance reasons. For simple things like kmscube I would
> expect that it's basically write-only from the CPU and almost all memory the
> GPU touches isn't touched by the CPU. I.e. coherency isn't helping and the
> coherency traffic is probably expensive. Whether the complexity is worth it
> for "real" content I don't know - it may just be silly benchmarks that
> benefit.

Right, Panfrost userspace specifically assumes GPU reads to be expensive
and treats GPU memory as write-only *except* for a few special cases
(compute-like workloads, glReadPixels, some blits, etc).

The vast majority of the GPU memory - everything used in kmscube - will be
write-only to the CPU and fed directly into the display zero-copy (or
read by the GPU later as a dmabuf).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 167 bytes --]

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
  2020-09-16 17:46       ` Rob Herring
  (?)
  (?)
@ 2020-09-17 12:38         ` Robin Murphy
  -1 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-17 12:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Kevin Hilman, dri-devel,
	Steven Price, Linux IOMMU, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 2020-09-16 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

There will still be CPU benefits in terms of not having to spend time 
cache-cleaning every BO upon allocation, and less overhead on writing 
out descriptors in the first place (due to cacheable vs. non-cacheable).

I haven't tried the NSh hack on Juno, but I don't see any notable 
performance issue as-is - kmscube hits a solid 60FPS from the off (now 
that it works without spewing faults). Given that the hardware on Juno 
can be generously described as "less good", it would certainly be 
interesting to figure out what difference is at play here...

The usual argument against I/O coherency is that it adds latency to 
every access, but if you already have a coherent interconnect anyway 
then the sensible answer to that is implementing decent snoop filters, 
rather than making software more complicated.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:38         ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-17 12:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Kevin Hilman, dri-devel,
	Steven Price, Linux IOMMU, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 2020-09-16 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

There will still be CPU benefits in terms of not having to spend time 
cache-cleaning every BO upon allocation, and less overhead on writing 
out descriptors in the first place (due to cacheable vs. non-cacheable).

I haven't tried the NSh hack on Juno, but I don't see any notable 
performance issue as-is - kmscube hits a solid 60FPS from the off (now 
that it works without spewing faults). Given that the hardware on Juno 
can be generously described as "less good", it would certainly be 
interesting to figure out what difference is at play here...

The usual argument against I/O coherency is that it adds latency to 
every access, but if you already have a coherent interconnect anyway 
then the sensible answer to that is implementing decent snoop filters, 
rather than making software more complicated.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:38         ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-17 12:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Kevin Hilman, dri-devel,
	Steven Price, Linux IOMMU, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 2020-09-16 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

There will still be CPU benefits in terms of not having to spend time 
cache-cleaning every BO upon allocation, and less overhead on writing 
out descriptors in the first place (due to cacheable vs. non-cacheable).

I haven't tried the NSh hack on Juno, but I don't see any notable 
performance issue as-is - kmscube hits a solid 60FPS from the off (now 
that it works without spewing faults). Given that the hardware on Juno 
can be generously described as "less good", it would certainly be 
interesting to figure out what difference is at play here...

The usual argument against I/O coherency is that it adds latency to 
every access, but if you already have a coherent interconnect anyway 
then the sensible answer to that is implementing decent snoop filters, 
rather than making software more complicated.

Robin.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 0/3] drm: panfrost: Coherency support
@ 2020-09-17 12:38         ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-17 12:38 UTC (permalink / raw)
  To: Rob Herring, Alyssa Rosenzweig
  Cc: Tomeu Vizoso, Neil Armstrong, Kevin Hilman, dri-devel,
	Steven Price, Linux IOMMU, open list:ARM/Amlogic Meson...,
	Will Deacon, linux-arm-kernel, Jerome Brunet

On 2020-09-16 18:46, Rob Herring wrote:
> On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
> <alyssa.rosenzweig@collabora.com> wrote:
>>
>>> So I get a performance regression with the dma-coherent approach, even if it's
>>> clearly the cleaner.
>>
>> That's bizarre -- this should really be the faster of the two.
> 
> Coherency may not be free. CortexA9 had something like 4x slower
> memcpy if SMP was enabled as an example. I don't know if there's
> anything going on like that specifically here. If there's never any
> CPU accesses mixed in with kmscube, then there would be no benefit to
> coherency.

There will still be CPU benefits in terms of not having to spend time 
cache-cleaning every BO upon allocation, and less overhead on writing 
out descriptors in the first place (due to cacheable vs. non-cacheable).

I haven't tried the NSh hack on Juno, but I don't see any notable 
performance issue as-is - kmscube hits a solid 60FPS from the off (now 
that it works without spewing faults). Given that the hardware on Juno 
can be generously described as "less good", it would certainly be 
interesting to figure out what difference is at play here...

The usual argument against I/O coherency is that it adds latency to 
every access, but if you already have a coherent interconnect anyway 
then the sensible answer to that is implementing decent snoop filters, 
rather than making software more complicated.

Robin.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
  2020-09-15 23:51   ` Robin Murphy
  (?)
  (?)
@ 2020-09-21 17:57     ` Will Deacon
  -1 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 17:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.

I can't help but think some of this commentary deserves to be in the code
as well.

Do you know if this sort of thing is done for other SoCs too, or is this
just a Juno quirk?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 17:57     ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 17:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.

I can't help but think some of this commentary deserves to be in the code
as well.

Do you know if this sort of thing is done for other SoCs too, or is this
just a Juno quirk?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 17:57     ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 17:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.

I can't help but think some of this commentary deserves to be in the code
as well.

Do you know if this sort of thing is done for other SoCs too, or is this
just a Juno quirk?

Will
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 17:57     ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 17:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> Midgard GPUs have ACE-Lite master interfaces which allows systems to
> integrate them in an I/O-coherent manner. It seems that from the GPU's
> viewpoint, the rest of the system is its outer shareable domain, and so
> even when snoop signals are wired up, they are only emitted for outer
> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> indeed get coherent pagetable walks working nicely for the coherent
> T620 in the Arm Juno SoC.

I can't help but think some of this commentary deserves to be in the code
as well.

Do you know if this sort of thing is done for other SoCs too, or is this
just a Juno quirk?

Will

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
  2020-09-21 17:57     ` Will Deacon
  (?)
  (?)
@ 2020-09-21 21:53       ` Robin Murphy
  -1 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-21 21:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On 2020-09-21 18:57, Will Deacon wrote:
> On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
>> Midgard GPUs have ACE-Lite master interfaces which allows systems to
>> integrate them in an I/O-coherent manner. It seems that from the GPU's
>> viewpoint, the rest of the system is its outer shareable domain, and so
>> even when snoop signals are wired up, they are only emitted for outer
>> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
>> indeed get coherent pagetable walks working nicely for the coherent
>> T620 in the Arm Juno SoC.
> 
> I can't help but think some of this commentary deserves to be in the code
> as well.

Sure, if you want.

> Do you know if this sort of thing is done for other SoCs too, or is this
> just a Juno quirk?

Yup, this is a "Midgard working as designed" thing. Juno is the coherent 
example I have to hand, but off the top of my head I believe some of the 
Exynos SoCs can also use their GPUs coherently if a switch is flipped in 
the interconnect to change routing between the CCI and a direct-to-RAM 
path; I expect there are probably further Midgard examples that I'm not 
aware of. Then there are definitely coherent Bifrost GPUs like the 
Amlogic S922/A311 that prompted me to revive this patch, which we 
currently drive in "Legacy" mode and thus behave the same way as Midgard 
(Bifrost's "AArch64" mode realigns Ish and Osh with the rest of the 
system, and instead invents a new "Internal Shareable" value in between 
Nsh and Ish to represent the shareability between cores within the GPU 
for which Midgard hijacked Ish).

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 21:53       ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-21 21:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On 2020-09-21 18:57, Will Deacon wrote:
> On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
>> Midgard GPUs have ACE-Lite master interfaces which allows systems to
>> integrate them in an I/O-coherent manner. It seems that from the GPU's
>> viewpoint, the rest of the system is its outer shareable domain, and so
>> even when snoop signals are wired up, they are only emitted for outer
>> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
>> indeed get coherent pagetable walks working nicely for the coherent
>> T620 in the Arm Juno SoC.
> 
> I can't help but think some of this commentary deserves to be in the code
> as well.

Sure, if you want.

> Do you know if this sort of thing is done for other SoCs too, or is this
> just a Juno quirk?

Yup, this is a "Midgard working as designed" thing. Juno is the coherent 
example I have to hand, but off the top of my head I believe some of the 
Exynos SoCs can also use their GPUs coherently if a switch is flipped in 
the interconnect to change routing between the CCI and a direct-to-RAM 
path; I expect there are probably further Midgard examples that I'm not 
aware of. Then there are definitely coherent Bifrost GPUs like the 
Amlogic S922/A311 that prompted me to revive this patch, which we 
currently drive in "Legacy" mode and thus behave the same way as Midgard 
(Bifrost's "AArch64" mode realigns Ish and Osh with the rest of the 
system, and instead invents a new "Internal Shareable" value in between 
Nsh and Ish to represent the shareability between cores within the GPU 
for which Midgard hijacked Ish).

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 21:53       ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-21 21:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On 2020-09-21 18:57, Will Deacon wrote:
> On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
>> Midgard GPUs have ACE-Lite master interfaces which allows systems to
>> integrate them in an I/O-coherent manner. It seems that from the GPU's
>> viewpoint, the rest of the system is its outer shareable domain, and so
>> even when snoop signals are wired up, they are only emitted for outer
>> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
>> indeed get coherent pagetable walks working nicely for the coherent
>> T620 in the Arm Juno SoC.
> 
> I can't help but think some of this commentary deserves to be in the code
> as well.

Sure, if you want.

> Do you know if this sort of thing is done for other SoCs too, or is this
> just a Juno quirk?

Yup, this is a "Midgard working as designed" thing. Juno is the coherent 
example I have to hand, but off the top of my head I believe some of the 
Exynos SoCs can also use their GPUs coherently if a switch is flipped in 
the interconnect to change routing between the CCI and a direct-to-RAM 
path; I expect there are probably further Midgard examples that I'm not 
aware of. Then there are definitely coherent Bifrost GPUs like the 
Amlogic S922/A311 that prompted me to revive this patch, which we 
currently drive in "Legacy" mode and thus behave the same way as Midgard 
(Bifrost's "AArch64" mode realigns Ish and Osh with the rest of the 
system, and instead invents a new "Internal Shareable" value in between 
Nsh and Ish to represent the shareability between cores within the GPU 
for which Midgard hijacked Ish).

Robin.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 21:53       ` Robin Murphy
  0 siblings, 0 replies; 88+ messages in thread
From: Robin Murphy @ 2020-09-21 21:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On 2020-09-21 18:57, Will Deacon wrote:
> On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
>> Midgard GPUs have ACE-Lite master interfaces which allows systems to
>> integrate them in an I/O-coherent manner. It seems that from the GPU's
>> viewpoint, the rest of the system is its outer shareable domain, and so
>> even when snoop signals are wired up, they are only emitted for outer
>> shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
>> indeed get coherent pagetable walks working nicely for the coherent
>> T620 in the Arm Juno SoC.
> 
> I can't help but think some of this commentary deserves to be in the code
> as well.

Sure, if you want.

> Do you know if this sort of thing is done for other SoCs too, or is this
> just a Juno quirk?

Yup, this is a "Midgard working as designed" thing. Juno is the coherent 
example I have to hand, but off the top of my head I believe some of the 
Exynos SoCs can also use their GPUs coherently if a switch is flipped in 
the interconnect to change routing between the CCI and a direct-to-RAM 
path; I expect there are probably further Midgard examples that I'm not 
aware of. Then there are definitely coherent Bifrost GPUs like the 
Amlogic S922/A311 that prompted me to revive this patch, which we 
currently drive in "Legacy" mode and thus behave the same way as Midgard 
(Bifrost's "AArch64" mode realigns Ish and Osh with the rest of the 
system, and instead invents a new "Internal Shareable" value in between 
Nsh and Ish to represent the shareability between cores within the GPU 
for which Midgard hijacked Ish).

Robin.

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
  2020-09-21 21:53       ` Robin Murphy
  (?)
  (?)
@ 2020-09-21 22:24         ` Will Deacon
  -1 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 22:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Mon, Sep 21, 2020 at 10:53:23PM +0100, Robin Murphy wrote:
> On 2020-09-21 18:57, Will Deacon wrote:
> > On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> > > Midgard GPUs have ACE-Lite master interfaces which allows systems to
> > > integrate them in an I/O-coherent manner. It seems that from the GPU's
> > > viewpoint, the rest of the system is its outer shareable domain, and so
> > > even when snoop signals are wired up, they are only emitted for outer
> > > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> > > indeed get coherent pagetable walks working nicely for the coherent
> > > T620 in the Arm Juno SoC.
> > 
> > I can't help but think some of this commentary deserves to be in the code
> > as well.
> 
> Sure, if you want.

Yes, please.

> > Do you know if this sort of thing is done for other SoCs too, or is this
> > just a Juno quirk?
> 
> Yup, this is a "Midgard working as designed" thing. Juno is the coherent
> example I have to hand, but off the top of my head I believe some of the
> Exynos SoCs can also use their GPUs coherently if a switch is flipped in the
> interconnect to change routing between the CCI and a direct-to-RAM path; I
> expect there are probably further Midgard examples that I'm not aware of.
> Then there are definitely coherent Bifrost GPUs like the Amlogic S922/A311
> that prompted me to revive this patch, which we currently drive in "Legacy"
> mode and thus behave the same way as Midgard (Bifrost's "AArch64" mode
> realigns Ish and Osh with the rest of the system, and instead invents a new
> "Internal Shareable" value in between Nsh and Ish to represent the
> shareability between cores within the GPU for which Midgard hijacked Ish).

That is more than I wanted to know :) "Internal Shareable", jeez...

Thanks,

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 22:24         ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 22:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Mon, Sep 21, 2020 at 10:53:23PM +0100, Robin Murphy wrote:
> On 2020-09-21 18:57, Will Deacon wrote:
> > On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> > > Midgard GPUs have ACE-Lite master interfaces which allows systems to
> > > integrate them in an I/O-coherent manner. It seems that from the GPU's
> > > viewpoint, the rest of the system is its outer shareable domain, and so
> > > even when snoop signals are wired up, they are only emitted for outer
> > > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> > > indeed get coherent pagetable walks working nicely for the coherent
> > > T620 in the Arm Juno SoC.
> > 
> > I can't help but think some of this commentary deserves to be in the code
> > as well.
> 
> Sure, if you want.

Yes, please.

> > Do you know if this sort of thing is done for other SoCs too, or is this
> > just a Juno quirk?
> 
> Yup, this is a "Midgard working as designed" thing. Juno is the coherent
> example I have to hand, but off the top of my head I believe some of the
> Exynos SoCs can also use their GPUs coherently if a switch is flipped in the
> interconnect to change routing between the CCI and a direct-to-RAM path; I
> expect there are probably further Midgard examples that I'm not aware of.
> Then there are definitely coherent Bifrost GPUs like the Amlogic S922/A311
> that prompted me to revive this patch, which we currently drive in "Legacy"
> mode and thus behave the same way as Midgard (Bifrost's "AArch64" mode
> realigns Ish and Osh with the rest of the system, and instead invents a new
> "Internal Shareable" value in between Nsh and Ish to represent the
> shareability between cores within the GPU for which Midgard hijacked Ish).

That is more than I wanted to know :) "Internal Shareable", jeez...

Thanks,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 22:24         ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 22:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Mon, Sep 21, 2020 at 10:53:23PM +0100, Robin Murphy wrote:
> On 2020-09-21 18:57, Will Deacon wrote:
> > On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> > > Midgard GPUs have ACE-Lite master interfaces which allows systems to
> > > integrate them in an I/O-coherent manner. It seems that from the GPU's
> > > viewpoint, the rest of the system is its outer shareable domain, and so
> > > even when snoop signals are wired up, they are only emitted for outer
> > > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> > > indeed get coherent pagetable walks working nicely for the coherent
> > > T620 in the Arm Juno SoC.
> > 
> > I can't help but think some of this commentary deserves to be in the code
> > as well.
> 
> Sure, if you want.

Yes, please.

> > Do you know if this sort of thing is done for other SoCs too, or is this
> > just a Juno quirk?
> 
> Yup, this is a "Midgard working as designed" thing. Juno is the coherent
> example I have to hand, but off the top of my head I believe some of the
> Exynos SoCs can also use their GPUs coherently if a switch is flipped in the
> interconnect to change routing between the CCI and a direct-to-RAM path; I
> expect there are probably further Midgard examples that I'm not aware of.
> Then there are definitely coherent Bifrost GPUs like the Amlogic S922/A311
> that prompted me to revive this patch, which we currently drive in "Legacy"
> mode and thus behave the same way as Midgard (Bifrost's "AArch64" mode
> realigns Ish and Osh with the rest of the system, and instead invents a new
> "Internal Shareable" value in between Nsh and Ish to represent the
> shareability between cores within the GPU for which Midgard hijacked Ish).

That is more than I wanted to know :) "Internal Shareable", jeez...

Thanks,

Will
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE
@ 2020-09-21 22:24         ` Will Deacon
  0 siblings, 0 replies; 88+ messages in thread
From: Will Deacon @ 2020-09-21 22:24 UTC (permalink / raw)
  To: Robin Murphy
  Cc: robh, tomeu.vizoso, narmstrong, khilman, dri-devel, steven.price,
	iommu, alyssa.rosenzweig, linux-amlogic, linux-arm-kernel,
	jbrunet

On Mon, Sep 21, 2020 at 10:53:23PM +0100, Robin Murphy wrote:
> On 2020-09-21 18:57, Will Deacon wrote:
> > On Wed, Sep 16, 2020 at 12:51:05AM +0100, Robin Murphy wrote:
> > > Midgard GPUs have ACE-Lite master interfaces which allows systems to
> > > integrate them in an I/O-coherent manner. It seems that from the GPU's
> > > viewpoint, the rest of the system is its outer shareable domain, and so
> > > even when snoop signals are wired up, they are only emitted for outer
> > > shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
> > > indeed get coherent pagetable walks working nicely for the coherent
> > > T620 in the Arm Juno SoC.
> > 
> > I can't help but think some of this commentary deserves to be in the code
> > as well.
> 
> Sure, if you want.

Yes, please.

> > Do you know if this sort of thing is done for other SoCs too, or is this
> > just a Juno quirk?
> 
> Yup, this is a "Midgard working as designed" thing. Juno is the coherent
> example I have to hand, but off the top of my head I believe some of the
> Exynos SoCs can also use their GPUs coherently if a switch is flipped in the
> interconnect to change routing between the CCI and a direct-to-RAM path; I
> expect there are probably further Midgard examples that I'm not aware of.
> Then there are definitely coherent Bifrost GPUs like the Amlogic S922/A311
> that prompted me to revive this patch, which we currently drive in "Legacy"
> mode and thus behave the same way as Midgard (Bifrost's "AArch64" mode
> realigns Ish and Osh with the rest of the system, and instead invents a new
> "Internal Shareable" value in between Nsh and Ish to represent the
> shareability between cores within the GPU for which Midgard hijacked Ish).

That is more than I wanted to know :) "Internal Shareable", jeez...

Thanks,

Will

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-09-16  8:26     ` Neil Armstrong
  (?)
  (?)
@ 2020-10-05  8:15       ` Boris Brezillon
  -1 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:15 UTC (permalink / raw)
  To: Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, steven.price, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

Hi Robin, Neil,

On Wed, 16 Sep 2020 10:26:43 +0200
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Hi Robin,
> 
> On 16/09/2020 01:51, Robin Murphy wrote:
> > According to a downstream commit I found in the Khadas vendor kernel,
> > the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> > how to handle this properly) we should describe it as such. Otherwise
> > the mismatch leads to all manner of fun with mismatched attributes and
> > inadvertently snooping stale data from caches, which would account for
> > at least some of the brokenness observed on this platform.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > index 9b8548e5f6e5..ee8fcae9f9f0 100644
> > --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > @@ -135,3 +135,7 @@ map1 {
> >  		};
> >  	};
> >  };
> > +
> > +&mali {
> > +	dma-coherent;
> > +};
> >   
> 
> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Sorry for the late reply. I triggered a dEQP run with this patch applied
and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
errors (see below for a full backtrace). That doesn't seem to happen when
we drop this dma-coherent property.

[  690.945731] ------------[ cut here ]------------
[  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
[  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.968854] Modules linked in:
[  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  690.981964] Hardware name: Khadas VIM3 (DT)
[  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.002836] sp : ffff800011bcbcd0
[  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500 
[  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001 
[  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800 
[  691.027159] x21: 0000000000000000 x20: 0000000000000000 
[  691.032420] x19: 0000000000010001 x18: 0000000000000020 
[  691.037681] x17: 0000000000000000 x16: 0000000000000000 
[  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000 
[  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  691.069249] x5 : 0000000000000000 x4 : 0000000000000000 
[  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  691.085033] Call trace:
[  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.092712]  irq_thread_fn+0x2c/0xa0
[  691.096246]  irq_thread+0x184/0x208
[  691.099699]  kthread+0x140/0x160
[  691.102890]  ret_from_fork+0x10/0x34
[  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:15       ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:15 UTC (permalink / raw)
  To: Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, steven.price, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

Hi Robin, Neil,

On Wed, 16 Sep 2020 10:26:43 +0200
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Hi Robin,
> 
> On 16/09/2020 01:51, Robin Murphy wrote:
> > According to a downstream commit I found in the Khadas vendor kernel,
> > the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> > how to handle this properly) we should describe it as such. Otherwise
> > the mismatch leads to all manner of fun with mismatched attributes and
> > inadvertently snooping stale data from caches, which would account for
> > at least some of the brokenness observed on this platform.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > index 9b8548e5f6e5..ee8fcae9f9f0 100644
> > --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > @@ -135,3 +135,7 @@ map1 {
> >  		};
> >  	};
> >  };
> > +
> > +&mali {
> > +	dma-coherent;
> > +};
> >   
> 
> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Sorry for the late reply. I triggered a dEQP run with this patch applied
and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
errors (see below for a full backtrace). That doesn't seem to happen when
we drop this dma-coherent property.

[  690.945731] ------------[ cut here ]------------
[  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
[  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.968854] Modules linked in:
[  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  690.981964] Hardware name: Khadas VIM3 (DT)
[  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.002836] sp : ffff800011bcbcd0
[  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500 
[  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001 
[  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800 
[  691.027159] x21: 0000000000000000 x20: 0000000000000000 
[  691.032420] x19: 0000000000010001 x18: 0000000000000020 
[  691.037681] x17: 0000000000000000 x16: 0000000000000000 
[  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000 
[  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  691.069249] x5 : 0000000000000000 x4 : 0000000000000000 
[  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  691.085033] Call trace:
[  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.092712]  irq_thread_fn+0x2c/0xa0
[  691.096246]  irq_thread+0x184/0x208
[  691.099699]  kthread+0x140/0x160
[  691.102890]  ret_from_fork+0x10/0x34
[  691.106424] ---[ end trace b5dd8c2dfada8236 ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:15       ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:15 UTC (permalink / raw)
  To: Neil Armstrong, tomeu.vizoso
  Cc: khilman, Robin Murphy, dri-devel, steven.price, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

Hi Robin, Neil,

On Wed, 16 Sep 2020 10:26:43 +0200
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Hi Robin,
> 
> On 16/09/2020 01:51, Robin Murphy wrote:
> > According to a downstream commit I found in the Khadas vendor kernel,
> > the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> > how to handle this properly) we should describe it as such. Otherwise
> > the mismatch leads to all manner of fun with mismatched attributes and
> > inadvertently snooping stale data from caches, which would account for
> > at least some of the brokenness observed on this platform.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > index 9b8548e5f6e5..ee8fcae9f9f0 100644
> > --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > @@ -135,3 +135,7 @@ map1 {
> >  		};
> >  	};
> >  };
> > +
> > +&mali {
> > +	dma-coherent;
> > +};
> >   
> 
> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Sorry for the late reply. I triggered a dEQP run with this patch applied
and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
errors (see below for a full backtrace). That doesn't seem to happen when
we drop this dma-coherent property.

[  690.945731] ------------[ cut here ]------------
[  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
[  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.968854] Modules linked in:
[  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  690.981964] Hardware name: Khadas VIM3 (DT)
[  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.002836] sp : ffff800011bcbcd0
[  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500 
[  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001 
[  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800 
[  691.027159] x21: 0000000000000000 x20: 0000000000000000 
[  691.032420] x19: 0000000000010001 x18: 0000000000000020 
[  691.037681] x17: 0000000000000000 x16: 0000000000000000 
[  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000 
[  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  691.069249] x5 : 0000000000000000 x4 : 0000000000000000 
[  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  691.085033] Call trace:
[  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.092712]  irq_thread_fn+0x2c/0xa0
[  691.096246]  irq_thread+0x184/0x208
[  691.099699]  kthread+0x140/0x160
[  691.102890]  ret_from_fork+0x10/0x34
[  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:15       ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:15 UTC (permalink / raw)
  To: Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, steven.price, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

Hi Robin, Neil,

On Wed, 16 Sep 2020 10:26:43 +0200
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Hi Robin,
> 
> On 16/09/2020 01:51, Robin Murphy wrote:
> > According to a downstream commit I found in the Khadas vendor kernel,
> > the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> > how to handle this properly) we should describe it as such. Otherwise
> > the mismatch leads to all manner of fun with mismatched attributes and
> > inadvertently snooping stale data from caches, which would account for
> > at least some of the brokenness observed on this platform.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > index 9b8548e5f6e5..ee8fcae9f9f0 100644
> > --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> > @@ -135,3 +135,7 @@ map1 {
> >  		};
> >  	};
> >  };
> > +
> > +&mali {
> > +	dma-coherent;
> > +};
> >   
> 
> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Sorry for the late reply. I triggered a dEQP run with this patch applied
and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
errors (see below for a full backtrace). That doesn't seem to happen when
we drop this dma-coherent property.

[  690.945731] ------------[ cut here ]------------
[  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
[  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.968854] Modules linked in:
[  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  690.981964] Hardware name: Khadas VIM3 (DT)
[  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.002836] sp : ffff800011bcbcd0
[  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500 
[  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001 
[  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800 
[  691.027159] x21: 0000000000000000 x20: 0000000000000000 
[  691.032420] x19: 0000000000010001 x18: 0000000000000020 
[  691.037681] x17: 0000000000000000 x16: 0000000000000000 
[  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000 
[  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  691.069249] x5 : 0000000000000000 x4 : 0000000000000000 
[  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  691.085033] Call trace:
[  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  691.092712]  irq_thread_fn+0x2c/0xa0
[  691.096246]  irq_thread+0x184/0x208
[  691.099699]  kthread+0x140/0x160
[  691.102890]  ret_from_fork+0x10/0x34
[  691.106424] ---[ end trace b5dd8c2dfada8236 ]---

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-10-05  8:15       ` Boris Brezillon
  (?)
  (?)
@ 2020-10-05  8:34         ` Steven Price
  -1 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05  8:34 UTC (permalink / raw)
  To: Boris Brezillon, Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, iommu, alyssa.rosenzweig,
	linux-amlogic, will, linux-arm-kernel, jbrunet

On 05/10/2020 09:15, Boris Brezillon wrote:
> Hi Robin, Neil,
> 
> On Wed, 16 Sep 2020 10:26:43 +0200
> Neil Armstrong <narmstrong@baylibre.com> wrote:
> 
>> Hi Robin,
>>
>> On 16/09/2020 01:51, Robin Murphy wrote:
>>> According to a downstream commit I found in the Khadas vendor kernel,
>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>> how to handle this properly) we should describe it as such. Otherwise
>>> the mismatch leads to all manner of fun with mismatched attributes and
>>> inadvertently snooping stale data from caches, which would account for
>>> at least some of the brokenness observed on this platform.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> @@ -135,3 +135,7 @@ map1 {
>>>   		};
>>>   	};
>>>   };
>>> +
>>> +&mali {
>>> +	dma-coherent;
>>> +};
>>>    
>>
>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
> 
> Sorry for the late reply. I triggered a dEQP run with this patch applied
> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> errors (see below for a full backtrace). That doesn't seem to happen when
> we drop this dma-coherent property.
> 
> [  690.945731] ------------[ cut here ]------------
> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.968854] Modules linked in:
> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> [  690.981964] Hardware name: Khadas VIM3 (DT)
> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.002836] sp : ffff800011bcbcd0
> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> [  691.085033] Call trace:
> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.092712]  irq_thread_fn+0x2c/0xa0
> [  691.096246]  irq_thread+0x184/0x208
> [  691.099699]  kthread+0x140/0x160
> [  691.102890]  ret_from_fork+0x10/0x34
> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> 

It's quite possible this is caused by the GPU seeing a stale page table 
entry, so perhaps coherency isn't working as well as it should...

Do you get an "Unhandled Page fault" message after this? It might give 
some clues. Coherency issues are a pain to debug though and it's of 
course possible there are issues on this specific platform.

Steve
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:34         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05  8:34 UTC (permalink / raw)
  To: Boris Brezillon, Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, iommu, alyssa.rosenzweig,
	linux-amlogic, will, linux-arm-kernel, jbrunet

On 05/10/2020 09:15, Boris Brezillon wrote:
> Hi Robin, Neil,
> 
> On Wed, 16 Sep 2020 10:26:43 +0200
> Neil Armstrong <narmstrong@baylibre.com> wrote:
> 
>> Hi Robin,
>>
>> On 16/09/2020 01:51, Robin Murphy wrote:
>>> According to a downstream commit I found in the Khadas vendor kernel,
>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>> how to handle this properly) we should describe it as such. Otherwise
>>> the mismatch leads to all manner of fun with mismatched attributes and
>>> inadvertently snooping stale data from caches, which would account for
>>> at least some of the brokenness observed on this platform.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> @@ -135,3 +135,7 @@ map1 {
>>>   		};
>>>   	};
>>>   };
>>> +
>>> +&mali {
>>> +	dma-coherent;
>>> +};
>>>    
>>
>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
> 
> Sorry for the late reply. I triggered a dEQP run with this patch applied
> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> errors (see below for a full backtrace). That doesn't seem to happen when
> we drop this dma-coherent property.
> 
> [  690.945731] ------------[ cut here ]------------
> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.968854] Modules linked in:
> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> [  690.981964] Hardware name: Khadas VIM3 (DT)
> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.002836] sp : ffff800011bcbcd0
> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> [  691.085033] Call trace:
> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.092712]  irq_thread_fn+0x2c/0xa0
> [  691.096246]  irq_thread+0x184/0x208
> [  691.099699]  kthread+0x140/0x160
> [  691.102890]  ret_from_fork+0x10/0x34
> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> 

It's quite possible this is caused by the GPU seeing a stale page table 
entry, so perhaps coherency isn't working as well as it should...

Do you get an "Unhandled Page fault" message after this? It might give 
some clues. Coherency issues are a pain to debug though and it's of 
course possible there are issues on this specific platform.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:34         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05  8:34 UTC (permalink / raw)
  To: Boris Brezillon, Neil Armstrong, tomeu.vizoso
  Cc: khilman, Robin Murphy, dri-devel, iommu, alyssa.rosenzweig,
	linux-amlogic, will, linux-arm-kernel, jbrunet

On 05/10/2020 09:15, Boris Brezillon wrote:
> Hi Robin, Neil,
> 
> On Wed, 16 Sep 2020 10:26:43 +0200
> Neil Armstrong <narmstrong@baylibre.com> wrote:
> 
>> Hi Robin,
>>
>> On 16/09/2020 01:51, Robin Murphy wrote:
>>> According to a downstream commit I found in the Khadas vendor kernel,
>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>> how to handle this properly) we should describe it as such. Otherwise
>>> the mismatch leads to all manner of fun with mismatched attributes and
>>> inadvertently snooping stale data from caches, which would account for
>>> at least some of the brokenness observed on this platform.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> @@ -135,3 +135,7 @@ map1 {
>>>   		};
>>>   	};
>>>   };
>>> +
>>> +&mali {
>>> +	dma-coherent;
>>> +};
>>>    
>>
>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
> 
> Sorry for the late reply. I triggered a dEQP run with this patch applied
> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> errors (see below for a full backtrace). That doesn't seem to happen when
> we drop this dma-coherent property.
> 
> [  690.945731] ------------[ cut here ]------------
> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.968854] Modules linked in:
> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> [  690.981964] Hardware name: Khadas VIM3 (DT)
> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.002836] sp : ffff800011bcbcd0
> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> [  691.085033] Call trace:
> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.092712]  irq_thread_fn+0x2c/0xa0
> [  691.096246]  irq_thread+0x184/0x208
> [  691.099699]  kthread+0x140/0x160
> [  691.102890]  ret_from_fork+0x10/0x34
> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> 

It's quite possible this is caused by the GPU seeing a stale page table 
entry, so perhaps coherency isn't working as well as it should...

Do you get an "Unhandled Page fault" message after this? It might give 
some clues. Coherency issues are a pain to debug though and it's of 
course possible there are issues on this specific platform.

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:34         ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05  8:34 UTC (permalink / raw)
  To: Boris Brezillon, Neil Armstrong, tomeu.vizoso
  Cc: robh, khilman, Robin Murphy, dri-devel, iommu, alyssa.rosenzweig,
	linux-amlogic, will, linux-arm-kernel, jbrunet

On 05/10/2020 09:15, Boris Brezillon wrote:
> Hi Robin, Neil,
> 
> On Wed, 16 Sep 2020 10:26:43 +0200
> Neil Armstrong <narmstrong@baylibre.com> wrote:
> 
>> Hi Robin,
>>
>> On 16/09/2020 01:51, Robin Murphy wrote:
>>> According to a downstream commit I found in the Khadas vendor kernel,
>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>> how to handle this properly) we should describe it as such. Otherwise
>>> the mismatch leads to all manner of fun with mismatched attributes and
>>> inadvertently snooping stale data from caches, which would account for
>>> at least some of the brokenness observed on this platform.
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>> @@ -135,3 +135,7 @@ map1 {
>>>   		};
>>>   	};
>>>   };
>>> +
>>> +&mali {
>>> +	dma-coherent;
>>> +};
>>>    
>>
>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
> 
> Sorry for the late reply. I triggered a dEQP run with this patch applied
> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> errors (see below for a full backtrace). That doesn't seem to happen when
> we drop this dma-coherent property.
> 
> [  690.945731] ------------[ cut here ]------------
> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.968854] Modules linked in:
> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> [  690.981964] Hardware name: Khadas VIM3 (DT)
> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.002836] sp : ffff800011bcbcd0
> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> [  691.085033] Call trace:
> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> [  691.092712]  irq_thread_fn+0x2c/0xa0
> [  691.096246]  irq_thread+0x184/0x208
> [  691.099699]  kthread+0x140/0x160
> [  691.102890]  ret_from_fork+0x10/0x34
> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> 

It's quite possible this is caused by the GPU seeing a stale page table 
entry, so perhaps coherency isn't working as well as it should...

Do you get an "Unhandled Page fault" message after this? It might give 
some clues. Coherency issues are a pain to debug though and it's of 
course possible there are issues on this specific platform.

Steve

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-10-05  8:34         ` Steven Price
  (?)
  (?)
@ 2020-10-05  8:39           ` Boris Brezillon
  -1 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:39 UTC (permalink / raw)
  To: Steven Price
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On Mon, 5 Oct 2020 09:34:06 +0100
Steven Price <steven.price@arm.com> wrote:

> On 05/10/2020 09:15, Boris Brezillon wrote:
> > Hi Robin, Neil,
> > 
> > On Wed, 16 Sep 2020 10:26:43 +0200
> > Neil Armstrong <narmstrong@baylibre.com> wrote:
> >   
> >> Hi Robin,
> >>
> >> On 16/09/2020 01:51, Robin Murphy wrote:  
> >>> According to a downstream commit I found in the Khadas vendor kernel,
> >>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> >>> how to handle this properly) we should describe it as such. Otherwise
> >>> the mismatch leads to all manner of fun with mismatched attributes and
> >>> inadvertently snooping stale data from caches, which would account for
> >>> at least some of the brokenness observed on this platform.
> >>>
> >>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>> ---
> >>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >>>   1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> >>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> @@ -135,3 +135,7 @@ map1 {
> >>>   		};
> >>>   	};
> >>>   };
> >>> +
> >>> +&mali {
> >>> +	dma-coherent;
> >>> +};
> >>>      
> >>
> >> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !  
> > 
> > Sorry for the late reply. I triggered a dEQP run with this patch applied
> > and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> > errors (see below for a full backtrace). That doesn't seem to happen when
> > we drop this dma-coherent property.
> > 
> > [  690.945731] ------------[ cut here ]------------
> > [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> > [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.968854] Modules linked in:
> > [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> > [  690.981964] Hardware name: Khadas VIM3 (DT)
> > [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.002836] sp : ffff800011bcbcd0
> > [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> > [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> > [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> > [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> > [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> > [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> > [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> > [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> > [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> > [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> > [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> > [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> > [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> > [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> > [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> > [  691.085033] Call trace:
> > [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.092712]  irq_thread_fn+0x2c/0xa0
> > [  691.096246]  irq_thread+0x184/0x208
> > [  691.099699]  kthread+0x140/0x160
> > [  691.102890]  ret_from_fork+0x10/0x34
> > [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> >   
> 
> It's quite possible this is caused by the GPU seeing a stale page table 
> entry, so perhaps coherency isn't working as well as it should...
> 
> Do you get an "Unhandled Page fault" message after this?

Yep (see below).

--->8---

[  689.640491] ------------[ cut here ]------------
[  689.644754] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 3146000)
[  689.644802] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.663607] Modules linked in:
[  689.666631] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  689.676717] Hardware name: Khadas VIM3 (DT)
[  689.680860] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  689.686380] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.691987] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.697590] sp : ffff800011bcbcd0
[  689.700867] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  689.706128] x27: ffff0000d89cf750 x26: ffff0000da34a800 
[  689.711389] x25: ffff0000f32409c0 x24: 0000000000000001 
[  689.716650] x23: ffff0000f3240880 x22: ffff0000d456e000 
[  689.721911] x21: 0000000000000000 x20: 0000000000000000 
[  689.727173] x19: 0000000000010001 x18: 0000000000000020 
[  689.732434] x17: 0000000000000000 x16: 0000000000000000 
[  689.737695] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  689.742957] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  689.748218] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  689.753479] x9 : 00000000fffffffe x8 : 0000000000000000 
[  689.758740] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  689.764001] x5 : 0000000000000000 x4 : 0000000000000000 
[  689.769263] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  689.774524] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  689.779786] Call trace:
[  689.782204]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.787465]  irq_thread_fn+0x2c/0xa0
[  689.790999]  irq_thread+0x184/0x208
[  689.794451]  kthread+0x140/0x160
[  689.797642]  ret_from_fork+0x10/0x34
[  689.801177] ---[ end trace b5dd8c2dfada8235 ]---
[  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
[  689.805864] Reason: TODO
[  689.805864] raw fault status: 0x10003C3
[  689.805864] decoded fault status: SLAVE FAULT
[  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
[  689.805864] access type 0x3: WRITE
[  689.805864] source id 0x100
[  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
[  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945731] ------------[ cut here ]------------

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:39           ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:39 UTC (permalink / raw)
  To: Steven Price
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On Mon, 5 Oct 2020 09:34:06 +0100
Steven Price <steven.price@arm.com> wrote:

> On 05/10/2020 09:15, Boris Brezillon wrote:
> > Hi Robin, Neil,
> > 
> > On Wed, 16 Sep 2020 10:26:43 +0200
> > Neil Armstrong <narmstrong@baylibre.com> wrote:
> >   
> >> Hi Robin,
> >>
> >> On 16/09/2020 01:51, Robin Murphy wrote:  
> >>> According to a downstream commit I found in the Khadas vendor kernel,
> >>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> >>> how to handle this properly) we should describe it as such. Otherwise
> >>> the mismatch leads to all manner of fun with mismatched attributes and
> >>> inadvertently snooping stale data from caches, which would account for
> >>> at least some of the brokenness observed on this platform.
> >>>
> >>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>> ---
> >>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >>>   1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> >>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> @@ -135,3 +135,7 @@ map1 {
> >>>   		};
> >>>   	};
> >>>   };
> >>> +
> >>> +&mali {
> >>> +	dma-coherent;
> >>> +};
> >>>      
> >>
> >> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !  
> > 
> > Sorry for the late reply. I triggered a dEQP run with this patch applied
> > and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> > errors (see below for a full backtrace). That doesn't seem to happen when
> > we drop this dma-coherent property.
> > 
> > [  690.945731] ------------[ cut here ]------------
> > [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> > [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.968854] Modules linked in:
> > [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> > [  690.981964] Hardware name: Khadas VIM3 (DT)
> > [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.002836] sp : ffff800011bcbcd0
> > [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> > [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> > [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> > [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> > [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> > [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> > [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> > [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> > [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> > [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> > [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> > [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> > [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> > [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> > [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> > [  691.085033] Call trace:
> > [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.092712]  irq_thread_fn+0x2c/0xa0
> > [  691.096246]  irq_thread+0x184/0x208
> > [  691.099699]  kthread+0x140/0x160
> > [  691.102890]  ret_from_fork+0x10/0x34
> > [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> >   
> 
> It's quite possible this is caused by the GPU seeing a stale page table 
> entry, so perhaps coherency isn't working as well as it should...
> 
> Do you get an "Unhandled Page fault" message after this?

Yep (see below).

--->8---

[  689.640491] ------------[ cut here ]------------
[  689.644754] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 3146000)
[  689.644802] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.663607] Modules linked in:
[  689.666631] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  689.676717] Hardware name: Khadas VIM3 (DT)
[  689.680860] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  689.686380] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.691987] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.697590] sp : ffff800011bcbcd0
[  689.700867] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  689.706128] x27: ffff0000d89cf750 x26: ffff0000da34a800 
[  689.711389] x25: ffff0000f32409c0 x24: 0000000000000001 
[  689.716650] x23: ffff0000f3240880 x22: ffff0000d456e000 
[  689.721911] x21: 0000000000000000 x20: 0000000000000000 
[  689.727173] x19: 0000000000010001 x18: 0000000000000020 
[  689.732434] x17: 0000000000000000 x16: 0000000000000000 
[  689.737695] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  689.742957] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  689.748218] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  689.753479] x9 : 00000000fffffffe x8 : 0000000000000000 
[  689.758740] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  689.764001] x5 : 0000000000000000 x4 : 0000000000000000 
[  689.769263] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  689.774524] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  689.779786] Call trace:
[  689.782204]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.787465]  irq_thread_fn+0x2c/0xa0
[  689.790999]  irq_thread+0x184/0x208
[  689.794451]  kthread+0x140/0x160
[  689.797642]  ret_from_fork+0x10/0x34
[  689.801177] ---[ end trace b5dd8c2dfada8235 ]---
[  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
[  689.805864] Reason: TODO
[  689.805864] raw fault status: 0x10003C3
[  689.805864] decoded fault status: SLAVE FAULT
[  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
[  689.805864] access type 0x3: WRITE
[  689.805864] source id 0x100
[  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
[  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945731] ------------[ cut here ]------------


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:39           ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:39 UTC (permalink / raw)
  To: Steven Price
  Cc: tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On Mon, 5 Oct 2020 09:34:06 +0100
Steven Price <steven.price@arm.com> wrote:

> On 05/10/2020 09:15, Boris Brezillon wrote:
> > Hi Robin, Neil,
> > 
> > On Wed, 16 Sep 2020 10:26:43 +0200
> > Neil Armstrong <narmstrong@baylibre.com> wrote:
> >   
> >> Hi Robin,
> >>
> >> On 16/09/2020 01:51, Robin Murphy wrote:  
> >>> According to a downstream commit I found in the Khadas vendor kernel,
> >>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> >>> how to handle this properly) we should describe it as such. Otherwise
> >>> the mismatch leads to all manner of fun with mismatched attributes and
> >>> inadvertently snooping stale data from caches, which would account for
> >>> at least some of the brokenness observed on this platform.
> >>>
> >>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>> ---
> >>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >>>   1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> >>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> @@ -135,3 +135,7 @@ map1 {
> >>>   		};
> >>>   	};
> >>>   };
> >>> +
> >>> +&mali {
> >>> +	dma-coherent;
> >>> +};
> >>>      
> >>
> >> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !  
> > 
> > Sorry for the late reply. I triggered a dEQP run with this patch applied
> > and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> > errors (see below for a full backtrace). That doesn't seem to happen when
> > we drop this dma-coherent property.
> > 
> > [  690.945731] ------------[ cut here ]------------
> > [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> > [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.968854] Modules linked in:
> > [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> > [  690.981964] Hardware name: Khadas VIM3 (DT)
> > [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.002836] sp : ffff800011bcbcd0
> > [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> > [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> > [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> > [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> > [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> > [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> > [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> > [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> > [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> > [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> > [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> > [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> > [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> > [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> > [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> > [  691.085033] Call trace:
> > [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.092712]  irq_thread_fn+0x2c/0xa0
> > [  691.096246]  irq_thread+0x184/0x208
> > [  691.099699]  kthread+0x140/0x160
> > [  691.102890]  ret_from_fork+0x10/0x34
> > [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> >   
> 
> It's quite possible this is caused by the GPU seeing a stale page table 
> entry, so perhaps coherency isn't working as well as it should...
> 
> Do you get an "Unhandled Page fault" message after this?

Yep (see below).

--->8---

[  689.640491] ------------[ cut here ]------------
[  689.644754] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 3146000)
[  689.644802] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.663607] Modules linked in:
[  689.666631] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  689.676717] Hardware name: Khadas VIM3 (DT)
[  689.680860] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  689.686380] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.691987] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.697590] sp : ffff800011bcbcd0
[  689.700867] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  689.706128] x27: ffff0000d89cf750 x26: ffff0000da34a800 
[  689.711389] x25: ffff0000f32409c0 x24: 0000000000000001 
[  689.716650] x23: ffff0000f3240880 x22: ffff0000d456e000 
[  689.721911] x21: 0000000000000000 x20: 0000000000000000 
[  689.727173] x19: 0000000000010001 x18: 0000000000000020 
[  689.732434] x17: 0000000000000000 x16: 0000000000000000 
[  689.737695] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  689.742957] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  689.748218] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  689.753479] x9 : 00000000fffffffe x8 : 0000000000000000 
[  689.758740] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  689.764001] x5 : 0000000000000000 x4 : 0000000000000000 
[  689.769263] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  689.774524] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  689.779786] Call trace:
[  689.782204]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.787465]  irq_thread_fn+0x2c/0xa0
[  689.790999]  irq_thread+0x184/0x208
[  689.794451]  kthread+0x140/0x160
[  689.797642]  ret_from_fork+0x10/0x34
[  689.801177] ---[ end trace b5dd8c2dfada8235 ]---
[  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
[  689.805864] Reason: TODO
[  689.805864] raw fault status: 0x10003C3
[  689.805864] decoded fault status: SLAVE FAULT
[  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
[  689.805864] access type 0x3: WRITE
[  689.805864] source id 0x100
[  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
[  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945731] ------------[ cut here ]------------

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05  8:39           ` Boris Brezillon
  0 siblings, 0 replies; 88+ messages in thread
From: Boris Brezillon @ 2020-10-05  8:39 UTC (permalink / raw)
  To: Steven Price
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On Mon, 5 Oct 2020 09:34:06 +0100
Steven Price <steven.price@arm.com> wrote:

> On 05/10/2020 09:15, Boris Brezillon wrote:
> > Hi Robin, Neil,
> > 
> > On Wed, 16 Sep 2020 10:26:43 +0200
> > Neil Armstrong <narmstrong@baylibre.com> wrote:
> >   
> >> Hi Robin,
> >>
> >> On 16/09/2020 01:51, Robin Murphy wrote:  
> >>> According to a downstream commit I found in the Khadas vendor kernel,
> >>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> >>> how to handle this properly) we should describe it as such. Otherwise
> >>> the mismatch leads to all manner of fun with mismatched attributes and
> >>> inadvertently snooping stale data from caches, which would account for
> >>> at least some of the brokenness observed on this platform.
> >>>
> >>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>> ---
> >>>   arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
> >>>   1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> >>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> >>> @@ -135,3 +135,7 @@ map1 {
> >>>   		};
> >>>   	};
> >>>   };
> >>> +
> >>> +&mali {
> >>> +	dma-coherent;
> >>> +};
> >>>      
> >>
> >> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !  
> > 
> > Sorry for the late reply. I triggered a dEQP run with this patch applied
> > and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
> > errors (see below for a full backtrace). That doesn't seem to happen when
> > we drop this dma-coherent property.
> > 
> > [  690.945731] ------------[ cut here ]------------
> > [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
> > [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.968854] Modules linked in:
> > [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
> > [  690.981964] Hardware name: Khadas VIM3 (DT)
> > [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.002836] sp : ffff800011bcbcd0
> > [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
> > [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
> > [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
> > [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
> > [  691.027159] x21: 0000000000000000 x20: 0000000000000000
> > [  691.032420] x19: 0000000000010001 x18: 0000000000000020
> > [  691.037681] x17: 0000000000000000 x16: 0000000000000000
> > [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
> > [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
> > [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
> > [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
> > [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
> > [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
> > [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
> > [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
> > [  691.085033] Call trace:
> > [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
> > [  691.092712]  irq_thread_fn+0x2c/0xa0
> > [  691.096246]  irq_thread+0x184/0x208
> > [  691.099699]  kthread+0x140/0x160
> > [  691.102890]  ret_from_fork+0x10/0x34
> > [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
> >   
> 
> It's quite possible this is caused by the GPU seeing a stale page table 
> entry, so perhaps coherency isn't working as well as it should...
> 
> Do you get an "Unhandled Page fault" message after this?

Yep (see below).

--->8---

[  689.640491] ------------[ cut here ]------------
[  689.644754] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 3146000)
[  689.644802] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.663607] Modules linked in:
[  689.666631] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
[  689.676717] Hardware name: Khadas VIM3 (DT)
[  689.680860] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[  689.686380] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.691987] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.697590] sp : ffff800011bcbcd0
[  689.700867] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800 
[  689.706128] x27: ffff0000d89cf750 x26: ffff0000da34a800 
[  689.711389] x25: ffff0000f32409c0 x24: 0000000000000001 
[  689.716650] x23: ffff0000f3240880 x22: ffff0000d456e000 
[  689.721911] x21: 0000000000000000 x20: 0000000000000000 
[  689.727173] x19: 0000000000010001 x18: 0000000000000020 
[  689.732434] x17: 0000000000000000 x16: 0000000000000000 
[  689.737695] x15: ffff0000f3fe3c70 x14: ffffffffffffffff 
[  689.742957] x13: ffff8000116c2428 x12: ffff8000116c2086 
[  689.748218] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0 
[  689.753479] x9 : 00000000fffffffe x8 : 0000000000000000 
[  689.758740] x7 : 7420706165682074 x6 : ffff8000116c1816 
[  689.764001] x5 : 0000000000000000 x4 : 0000000000000000 
[  689.769263] x3 : 00000000ffffffff x2 : ffff8000e348c000 
[  689.774524] x1 : f1b91ff9af2df000 x0 : 0000000000000000 
[  689.779786] Call trace:
[  689.782204]  panfrost_mmu_irq_handler_thread+0x47c/0x650
[  689.787465]  irq_thread_fn+0x2c/0xa0
[  689.790999]  irq_thread+0x184/0x208
[  689.794451]  kthread+0x140/0x160
[  689.797642]  ret_from_fork+0x10/0x34
[  689.801177] ---[ end trace b5dd8c2dfada8235 ]---
[  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
[  689.805864] Reason: TODO
[  689.805864] raw fault status: 0x10003C3
[  689.805864] decoded fault status: SLAVE FAULT
[  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
[  689.805864] access type 0x3: WRITE
[  689.805864] source id 0x100
[  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
[  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
[  690.945731] ------------[ cut here ]------------


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
  2020-10-05  8:39           ` Boris Brezillon
  (?)
  (?)
@ 2020-10-05 10:05             ` Steven Price
  -1 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05 10:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On 05/10/2020 09:39, Boris Brezillon wrote:
> On Mon, 5 Oct 2020 09:34:06 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 05/10/2020 09:15, Boris Brezillon wrote:
>>> Hi Robin, Neil,
>>>
>>> On Wed, 16 Sep 2020 10:26:43 +0200
>>> Neil Armstrong <narmstrong@baylibre.com> wrote:
>>>    
>>>> Hi Robin,
>>>>
>>>> On 16/09/2020 01:51, Robin Murphy wrote:
>>>>> According to a downstream commit I found in the Khadas vendor kernel,
>>>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>>>> how to handle this properly) we should describe it as such. Otherwise
>>>>> the mismatch leads to all manner of fun with mismatched attributes and
>>>>> inadvertently snooping stale data from caches, which would account for
>>>>> at least some of the brokenness observed on this platform.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>> ---
>>>>>    arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> @@ -135,3 +135,7 @@ map1 {
>>>>>    		};
>>>>>    	};
>>>>>    };
>>>>> +
>>>>> +&mali {
>>>>> +	dma-coherent;
>>>>> +};
>>>>>       
>>>>
>>>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
>>>
>>> Sorry for the late reply. I triggered a dEQP run with this patch applied
>>> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
>>> errors (see below for a full backtrace). That doesn't seem to happen when
>>> we drop this dma-coherent property.
>>>
>>> [  690.945731] ------------[ cut here ]------------
>>> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
>>> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.968854] Modules linked in:
>>> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
>>> [  690.981964] Hardware name: Khadas VIM3 (DT)
>>> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
>>> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.002836] sp : ffff800011bcbcd0
>>> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
>>> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
>>> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
>>> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
>>> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
>>> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
>>> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
>>> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
>>> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
>>> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
>>> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
>>> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
>>> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
>>> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
>>> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
>>> [  691.085033] Call trace:
>>> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.092712]  irq_thread_fn+0x2c/0xa0
>>> [  691.096246]  irq_thread+0x184/0x208
>>> [  691.099699]  kthread+0x140/0x160
>>> [  691.102890]  ret_from_fork+0x10/0x34
>>> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
>>>    
>>
>> It's quite possible this is caused by the GPU seeing a stale page table
>> entry, so perhaps coherency isn't working as well as it should...
>>
>> Do you get an "Unhandled Page fault" message after this?
> 
> Yep (see below).
> 
> --->8---
> 
[...]
> [  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
> [  689.805864] Reason: TODO
> [  689.805864] raw fault status: 0x10003C3
> [  689.805864] decoded fault status: SLAVE FAULT
> [  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
> [  689.805864] access type 0x3: WRITE
> [  689.805864] source id 0x100
> [  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
> [  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945731] ------------[ cut here ]------------
> 

That's a write fault from level 3 of the page table triggered by shader
core 0 in a fragment job. So could be writing out the frame buffer.

It would be interesting to see if a patch like below would work round
it:

----8<----
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..5144860afdea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,9 +460,12 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
  
  	bo = bomapping->obj;
  	if (!bo->is_heap) {
-		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+		dev_err(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
  			 bomapping->mmnode.start << PAGE_SHIFT);
-		ret = -EINVAL;
+		/* Force a flush of the MMU to restart the transaction */
+		mmu_hw_do_operation(pfdev, bomapping->mmu, addr, 0,
+				    AS_COMMAND_FLUSH_MEM);
+		ret = 0;
  		goto err_bo;
  	}
  	WARN_ON(bomapping->mmu->as != as);
---8<---

That's obviously not the correct solution (the fault shouldn't occur),
but if flushing the GPU's caches and retrying works then we know it's
a coherency issue. You can also try backing off from the big hammer of
AS_COMMAND_FLUSH_MEM and trying AS_COMMAND_FLUSH_PT or
AS_COMMAND_UNLOCK.

Steve
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05 10:05             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05 10:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On 05/10/2020 09:39, Boris Brezillon wrote:
> On Mon, 5 Oct 2020 09:34:06 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 05/10/2020 09:15, Boris Brezillon wrote:
>>> Hi Robin, Neil,
>>>
>>> On Wed, 16 Sep 2020 10:26:43 +0200
>>> Neil Armstrong <narmstrong@baylibre.com> wrote:
>>>    
>>>> Hi Robin,
>>>>
>>>> On 16/09/2020 01:51, Robin Murphy wrote:
>>>>> According to a downstream commit I found in the Khadas vendor kernel,
>>>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>>>> how to handle this properly) we should describe it as such. Otherwise
>>>>> the mismatch leads to all manner of fun with mismatched attributes and
>>>>> inadvertently snooping stale data from caches, which would account for
>>>>> at least some of the brokenness observed on this platform.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>> ---
>>>>>    arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> @@ -135,3 +135,7 @@ map1 {
>>>>>    		};
>>>>>    	};
>>>>>    };
>>>>> +
>>>>> +&mali {
>>>>> +	dma-coherent;
>>>>> +};
>>>>>       
>>>>
>>>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
>>>
>>> Sorry for the late reply. I triggered a dEQP run with this patch applied
>>> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
>>> errors (see below for a full backtrace). That doesn't seem to happen when
>>> we drop this dma-coherent property.
>>>
>>> [  690.945731] ------------[ cut here ]------------
>>> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
>>> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.968854] Modules linked in:
>>> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
>>> [  690.981964] Hardware name: Khadas VIM3 (DT)
>>> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
>>> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.002836] sp : ffff800011bcbcd0
>>> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
>>> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
>>> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
>>> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
>>> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
>>> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
>>> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
>>> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
>>> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
>>> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
>>> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
>>> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
>>> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
>>> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
>>> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
>>> [  691.085033] Call trace:
>>> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.092712]  irq_thread_fn+0x2c/0xa0
>>> [  691.096246]  irq_thread+0x184/0x208
>>> [  691.099699]  kthread+0x140/0x160
>>> [  691.102890]  ret_from_fork+0x10/0x34
>>> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
>>>    
>>
>> It's quite possible this is caused by the GPU seeing a stale page table
>> entry, so perhaps coherency isn't working as well as it should...
>>
>> Do you get an "Unhandled Page fault" message after this?
> 
> Yep (see below).
> 
> --->8---
> 
[...]
> [  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
> [  689.805864] Reason: TODO
> [  689.805864] raw fault status: 0x10003C3
> [  689.805864] decoded fault status: SLAVE FAULT
> [  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
> [  689.805864] access type 0x3: WRITE
> [  689.805864] source id 0x100
> [  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
> [  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945731] ------------[ cut here ]------------
> 

That's a write fault from level 3 of the page table triggered by shader
core 0 in a fragment job. So could be writing out the frame buffer.

It would be interesting to see if a patch like below would work round
it:

----8<----
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..5144860afdea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,9 +460,12 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
  
  	bo = bomapping->obj;
  	if (!bo->is_heap) {
-		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+		dev_err(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
  			 bomapping->mmnode.start << PAGE_SHIFT);
-		ret = -EINVAL;
+		/* Force a flush of the MMU to restart the transaction */
+		mmu_hw_do_operation(pfdev, bomapping->mmu, addr, 0,
+				    AS_COMMAND_FLUSH_MEM);
+		ret = 0;
  		goto err_bo;
  	}
  	WARN_ON(bomapping->mmu->as != as);
---8<---

That's obviously not the correct solution (the fault shouldn't occur),
but if flushing the GPU's caches and retrying works then we know it's
a coherency issue. You can also try backing off from the big hammer of
AS_COMMAND_FLUSH_MEM and trying AS_COMMAND_FLUSH_PT or
AS_COMMAND_UNLOCK.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05 10:05             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05 10:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel, iommu,
	alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On 05/10/2020 09:39, Boris Brezillon wrote:
> On Mon, 5 Oct 2020 09:34:06 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 05/10/2020 09:15, Boris Brezillon wrote:
>>> Hi Robin, Neil,
>>>
>>> On Wed, 16 Sep 2020 10:26:43 +0200
>>> Neil Armstrong <narmstrong@baylibre.com> wrote:
>>>    
>>>> Hi Robin,
>>>>
>>>> On 16/09/2020 01:51, Robin Murphy wrote:
>>>>> According to a downstream commit I found in the Khadas vendor kernel,
>>>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>>>> how to handle this properly) we should describe it as such. Otherwise
>>>>> the mismatch leads to all manner of fun with mismatched attributes and
>>>>> inadvertently snooping stale data from caches, which would account for
>>>>> at least some of the brokenness observed on this platform.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>> ---
>>>>>    arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> @@ -135,3 +135,7 @@ map1 {
>>>>>    		};
>>>>>    	};
>>>>>    };
>>>>> +
>>>>> +&mali {
>>>>> +	dma-coherent;
>>>>> +};
>>>>>       
>>>>
>>>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
>>>
>>> Sorry for the late reply. I triggered a dEQP run with this patch applied
>>> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
>>> errors (see below for a full backtrace). That doesn't seem to happen when
>>> we drop this dma-coherent property.
>>>
>>> [  690.945731] ------------[ cut here ]------------
>>> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
>>> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.968854] Modules linked in:
>>> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
>>> [  690.981964] Hardware name: Khadas VIM3 (DT)
>>> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
>>> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.002836] sp : ffff800011bcbcd0
>>> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
>>> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
>>> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
>>> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
>>> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
>>> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
>>> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
>>> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
>>> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
>>> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
>>> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
>>> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
>>> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
>>> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
>>> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
>>> [  691.085033] Call trace:
>>> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.092712]  irq_thread_fn+0x2c/0xa0
>>> [  691.096246]  irq_thread+0x184/0x208
>>> [  691.099699]  kthread+0x140/0x160
>>> [  691.102890]  ret_from_fork+0x10/0x34
>>> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
>>>    
>>
>> It's quite possible this is caused by the GPU seeing a stale page table
>> entry, so perhaps coherency isn't working as well as it should...
>>
>> Do you get an "Unhandled Page fault" message after this?
> 
> Yep (see below).
> 
> --->8---
> 
[...]
> [  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
> [  689.805864] Reason: TODO
> [  689.805864] raw fault status: 0x10003C3
> [  689.805864] decoded fault status: SLAVE FAULT
> [  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
> [  689.805864] access type 0x3: WRITE
> [  689.805864] source id 0x100
> [  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
> [  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945731] ------------[ cut here ]------------
> 

That's a write fault from level 3 of the page table triggered by shader
core 0 in a fragment job. So could be writing out the frame buffer.

It would be interesting to see if a patch like below would work round
it:

----8<----
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..5144860afdea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,9 +460,12 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
  
  	bo = bomapping->obj;
  	if (!bo->is_heap) {
-		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+		dev_err(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
  			 bomapping->mmnode.start << PAGE_SHIFT);
-		ret = -EINVAL;
+		/* Force a flush of the MMU to restart the transaction */
+		mmu_hw_do_operation(pfdev, bomapping->mmu, addr, 0,
+				    AS_COMMAND_FLUSH_MEM);
+		ret = 0;
  		goto err_bo;
  	}
  	WARN_ON(bomapping->mmu->as != as);
---8<---

That's obviously not the correct solution (the fault shouldn't occur),
but if flushing the GPU's caches and retrying works then we know it's
a coherency issue. You can also try backing off from the big hammer of
AS_COMMAND_FLUSH_MEM and trying AS_COMMAND_FLUSH_PT or
AS_COMMAND_UNLOCK.

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
@ 2020-10-05 10:05             ` Steven Price
  0 siblings, 0 replies; 88+ messages in thread
From: Steven Price @ 2020-10-05 10:05 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: robh, tomeu.vizoso, Neil Armstrong, Robin Murphy, dri-devel,
	iommu, alyssa.rosenzweig, linux-amlogic, will, linux-arm-kernel,
	jbrunet

On 05/10/2020 09:39, Boris Brezillon wrote:
> On Mon, 5 Oct 2020 09:34:06 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 05/10/2020 09:15, Boris Brezillon wrote:
>>> Hi Robin, Neil,
>>>
>>> On Wed, 16 Sep 2020 10:26:43 +0200
>>> Neil Armstrong <narmstrong@baylibre.com> wrote:
>>>    
>>>> Hi Robin,
>>>>
>>>> On 16/09/2020 01:51, Robin Murphy wrote:
>>>>> According to a downstream commit I found in the Khadas vendor kernel,
>>>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>>>> how to handle this properly) we should describe it as such. Otherwise
>>>>> the mismatch leads to all manner of fun with mismatched attributes and
>>>>> inadvertently snooping stale data from caches, which would account for
>>>>> at least some of the brokenness observed on this platform.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>>> ---
>>>>>    arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> @@ -135,3 +135,7 @@ map1 {
>>>>>    		};
>>>>>    	};
>>>>>    };
>>>>> +
>>>>> +&mali {
>>>>> +	dma-coherent;
>>>>> +};
>>>>>       
>>>>
>>>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
>>>
>>> Sorry for the late reply. I triggered a dEQP run with this patch applied
>>> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
>>> errors (see below for a full backtrace). That doesn't seem to happen when
>>> we drop this dma-coherent property.
>>>
>>> [  690.945731] ------------[ cut here ]------------
>>> [  690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
>>> [  690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.968854] Modules linked in:
>>> [  690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G        W         5.9.0-rc5-02434-g7d8109ec5a42 #784
>>> [  690.981964] Hardware name: Khadas VIM3 (DT)
>>> [  690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
>>> [  690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.002836] sp : ffff800011bcbcd0
>>> [  691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
>>> [  691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
>>> [  691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
>>> [  691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
>>> [  691.027159] x21: 0000000000000000 x20: 0000000000000000
>>> [  691.032420] x19: 0000000000010001 x18: 0000000000000020
>>> [  691.037681] x17: 0000000000000000 x16: 0000000000000000
>>> [  691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
>>> [  691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
>>> [  691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
>>> [  691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
>>> [  691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
>>> [  691.069249] x5 : 0000000000000000 x4 : 0000000000000000
>>> [  691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
>>> [  691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
>>> [  691.085033] Call trace:
>>> [  691.087452]  panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [  691.092712]  irq_thread_fn+0x2c/0xa0
>>> [  691.096246]  irq_thread+0x184/0x208
>>> [  691.099699]  kthread+0x140/0x160
>>> [  691.102890]  ret_from_fork+0x10/0x34
>>> [  691.106424] ---[ end trace b5dd8c2dfada8236 ]---
>>>    
>>
>> It's quite possible this is caused by the GPU seeing a stale page table
>> entry, so perhaps coherency isn't working as well as it should...
>>
>> Do you get an "Unhandled Page fault" message after this?
> 
> Yep (see below).
> 
> --->8---
> 
[...]
> [  689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
> [  689.805864] Reason: TODO
> [  689.805864] raw fault status: 0x10003C3
> [  689.805864] decoded fault status: SLAVE FAULT
> [  689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
> [  689.805864] access type 0x3: WRITE
> [  689.805864] source id 0x100
> [  690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
> [  690.770373] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945123] panfrost ffe40000.gpu: error powering up gpu shader
> [  690.945731] ------------[ cut here ]------------
> 

That's a write fault from level 3 of the page table triggered by shader
core 0 in a fragment job. So could be writing out the frame buffer.

It would be interesting to see if a patch like below would work round
it:

----8<----
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..5144860afdea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,9 +460,12 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
  
  	bo = bomapping->obj;
  	if (!bo->is_heap) {
-		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+		dev_err(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
  			 bomapping->mmnode.start << PAGE_SHIFT);
-		ret = -EINVAL;
+		/* Force a flush of the MMU to restart the transaction */
+		mmu_hw_do_operation(pfdev, bomapping->mmu, addr, 0,
+				    AS_COMMAND_FLUSH_MEM);
+		ret = 0;
  		goto err_bo;
  	}
  	WARN_ON(bomapping->mmu->as != as);
---8<---

That's obviously not the correct solution (the fault shouldn't occur),
but if flushing the GPU's caches and retrying works then we know it's
a coherency issue. You can also try backing off from the big hammer of
AS_COMMAND_FLUSH_MEM and trying AS_COMMAND_FLUSH_PT or
AS_COMMAND_UNLOCK.

Steve

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2020-10-05 10:06 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 23:51 [PATCH 0/3] drm: panfrost: Coherency support Robin Murphy
2020-09-15 23:51 ` Robin Murphy
2020-09-15 23:51 ` Robin Murphy
2020-09-15 23:51 ` Robin Murphy
2020-09-15 23:51 ` [PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-21 17:57   ` Will Deacon
2020-09-21 17:57     ` Will Deacon
2020-09-21 17:57     ` Will Deacon
2020-09-21 17:57     ` Will Deacon
2020-09-21 21:53     ` Robin Murphy
2020-09-21 21:53       ` Robin Murphy
2020-09-21 21:53       ` Robin Murphy
2020-09-21 21:53       ` Robin Murphy
2020-09-21 22:24       ` Will Deacon
2020-09-21 22:24         ` Will Deacon
2020-09-21 22:24         ` Will Deacon
2020-09-21 22:24         ` Will Deacon
2020-09-15 23:51 ` [PATCH 2/3] drm/panfrost: Support cache-coherent integrations Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-17 10:37   ` Steven Price
2020-09-17 10:37     ` Steven Price
2020-09-17 10:37     ` Steven Price
2020-09-17 10:37     ` Steven Price
2020-09-15 23:51 ` [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-15 23:51   ` Robin Murphy
2020-09-16  8:26   ` Neil Armstrong
2020-09-16  8:26     ` Neil Armstrong
2020-09-16  8:26     ` Neil Armstrong
2020-09-16  8:26     ` Neil Armstrong
2020-10-05  8:15     ` Boris Brezillon
2020-10-05  8:15       ` Boris Brezillon
2020-10-05  8:15       ` Boris Brezillon
2020-10-05  8:15       ` Boris Brezillon
2020-10-05  8:34       ` Steven Price
2020-10-05  8:34         ` Steven Price
2020-10-05  8:34         ` Steven Price
2020-10-05  8:34         ` Steven Price
2020-10-05  8:39         ` Boris Brezillon
2020-10-05  8:39           ` Boris Brezillon
2020-10-05  8:39           ` Boris Brezillon
2020-10-05  8:39           ` Boris Brezillon
2020-10-05 10:05           ` Steven Price
2020-10-05 10:05             ` Steven Price
2020-10-05 10:05             ` Steven Price
2020-10-05 10:05             ` Steven Price
2020-09-16 14:54   ` Neil Armstrong
2020-09-16 14:54     ` Neil Armstrong
2020-09-16 14:54     ` Neil Armstrong
2020-09-16 14:54     ` Neil Armstrong
2020-09-16 14:54 ` [PATCH 0/3] drm: panfrost: Coherency support Neil Armstrong
2020-09-16 14:54   ` Neil Armstrong
2020-09-16 14:54   ` Neil Armstrong
2020-09-16 14:54   ` Neil Armstrong
2020-09-16 17:04   ` Alyssa Rosenzweig
2020-09-16 17:04     ` Alyssa Rosenzweig
2020-09-16 17:04     ` Alyssa Rosenzweig
2020-09-16 17:04     ` Alyssa Rosenzweig
2020-09-16 17:46     ` Rob Herring
2020-09-16 17:46       ` Rob Herring
2020-09-16 17:46       ` Rob Herring
2020-09-16 17:46       ` Rob Herring
2020-09-17 10:38       ` Steven Price
2020-09-17 10:38         ` Steven Price
2020-09-17 10:38         ` Steven Price
2020-09-17 10:38         ` Steven Price
2020-09-17 10:51         ` Tomeu Vizoso
2020-09-17 10:51           ` Tomeu Vizoso
2020-09-17 10:51           ` Tomeu Vizoso
2020-09-17 10:51           ` Tomeu Vizoso
2020-09-17 11:00           ` Steven Price
2020-09-17 11:00             ` Steven Price
2020-09-17 11:00             ` Steven Price
2020-09-17 11:00             ` Steven Price
2020-09-17 12:03         ` Alyssa Rosenzweig
2020-09-17 12:03           ` Alyssa Rosenzweig
2020-09-17 12:03           ` Alyssa Rosenzweig
2020-09-17 12:03           ` Alyssa Rosenzweig
2020-09-17 12:38       ` Robin Murphy
2020-09-17 12:38         ` Robin Murphy
2020-09-17 12:38         ` Robin Murphy
2020-09-17 12:38         ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.