All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] drm/cache: Build-in drm_clflush_*() functions
@ 2015-04-09 14:34 ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Irrespective of whether or not the DRM core is built as a module, build
the cache flush functions into the kernel. This is required because the
implementation may require functions that shouldn't be exported to most
drivers. DRM is somewhat of a special case here because it requires the
cache maintenance functions to properly flush pages backed by shmemfs.
These pages are directly given to display hardware for scanout, so they
must be purged from any caches to avoid visual corruption on screen.

To achieve this, add a new boolean Kconfig symbol that drivers can
select if they use any of these functions. drm_cache.c is then built if
and only if this symbol is selected. TTM and the i915 driver already use
these functions, so make them select DRM_CACHE.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/Kconfig      | 5 +++++
 drivers/gpu/drm/Makefile     | 3 ++-
 drivers/gpu/drm/i915/Kconfig | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 47f2ce81b412..25bffdb89304 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -21,6 +21,10 @@ menuconfig DRM
 	  details.  You should also select and configure AGP
 	  (/dev/agpgart) support if it is available for your platform.
 
+config DRM_CACHE
+	bool
+	depends on DRM
+
 config DRM_MIPI_DSI
 	bool
 	depends on DRM
@@ -55,6 +59,7 @@ config DRM_LOAD_EDID_FIRMWARE
 config DRM_TTM
 	tristate
 	depends on DRM
+	select DRM_CACHE
 	help
 	  GPU memory management subsystem for devices with multiple
 	  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 7d4944e1a60c..1ad54a0e4467 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -4,7 +4,7 @@
 
 ccflags-y := -Iinclude/drm
 
-drm-y       :=	drm_auth.o drm_bufs.o drm_cache.o \
+drm-y       :=	drm_auth.o drm_bufs.o \
 		drm_context.o drm_dma.o \
 		drm_fops.o drm_gem.o drm_ioctl.o drm_irq.o \
 		drm_lock.o drm_memory.o drm_drv.o drm_vm.o \
@@ -33,6 +33,7 @@ obj-$(CONFIG_DRM_KMS_HELPER) += drm_kms_helper.o
 CFLAGS_drm_trace_points.o := -I$(src)
 
 obj-$(CONFIG_DRM)	+= drm.o
+obj-$(CONFIG_DRM_CACHE) += drm_cache.o
 obj-$(CONFIG_DRM_MIPI_DSI) += drm_mipi_dsi.o
 obj-$(CONFIG_DRM_TTM)	+= ttm/
 obj-$(CONFIG_DRM_TDFX)	+= tdfx/
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 74acca9bcd9d..237be03e8a7c 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -10,6 +10,7 @@ config DRM_I915
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
 	select TMPFS
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select DRM_MIPI_DSI
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/6] drm/cache: Build-in drm_clflush_*() functions
@ 2015-04-09 14:34 ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	Daniel Vetter, Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Irrespective of whether or not the DRM core is built as a module, build
the cache flush functions into the kernel. This is required because the
implementation may require functions that shouldn't be exported to most
drivers. DRM is somewhat of a special case here because it requires the
cache maintenance functions to properly flush pages backed by shmemfs.
These pages are directly given to display hardware for scanout, so they
must be purged from any caches to avoid visual corruption on screen.

To achieve this, add a new boolean Kconfig symbol that drivers can
select if they use any of these functions. drm_cache.c is then built if
and only if this symbol is selected. TTM and the i915 driver already use
these functions, so make them select DRM_CACHE.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/Kconfig      | 5 +++++
 drivers/gpu/drm/Makefile     | 3 ++-
 drivers/gpu/drm/i915/Kconfig | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 47f2ce81b412..25bffdb89304 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -21,6 +21,10 @@ menuconfig DRM
 	  details.  You should also select and configure AGP
 	  (/dev/agpgart) support if it is available for your platform.
 
+config DRM_CACHE
+	bool
+	depends on DRM
+
 config DRM_MIPI_DSI
 	bool
 	depends on DRM
@@ -55,6 +59,7 @@ config DRM_LOAD_EDID_FIRMWARE
 config DRM_TTM
 	tristate
 	depends on DRM
+	select DRM_CACHE
 	help
 	  GPU memory management subsystem for devices with multiple
 	  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 7d4944e1a60c..1ad54a0e4467 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -4,7 +4,7 @@
 
 ccflags-y := -Iinclude/drm
 
-drm-y       :=	drm_auth.o drm_bufs.o drm_cache.o \
+drm-y       :=	drm_auth.o drm_bufs.o \
 		drm_context.o drm_dma.o \
 		drm_fops.o drm_gem.o drm_ioctl.o drm_irq.o \
 		drm_lock.o drm_memory.o drm_drv.o drm_vm.o \
@@ -33,6 +33,7 @@ obj-$(CONFIG_DRM_KMS_HELPER) += drm_kms_helper.o
 CFLAGS_drm_trace_points.o := -I$(src)
 
 obj-$(CONFIG_DRM)	+= drm.o
+obj-$(CONFIG_DRM_CACHE) += drm_cache.o
 obj-$(CONFIG_DRM_MIPI_DSI) += drm_mipi_dsi.o
 obj-$(CONFIG_DRM_TTM)	+= ttm/
 obj-$(CONFIG_DRM_TDFX)	+= tdfx/
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 74acca9bcd9d..237be03e8a7c 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -10,6 +10,7 @@ config DRM_I915
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
 	select TMPFS
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select DRM_MIPI_DSI
-- 
2.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
  2015-04-09 14:34 ` Thierry Reding
@ 2015-04-09 14:34   ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Add implementations for drm_clflush_*() on ARM by borrowing code from
the DMA mapping API implementation. Unfortunately ARM doesn't export an
API to flush caches on a page by page basis, so this replicates most of
the code.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/drm_cache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 9a62d7a53553..200d86c3d72d 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -67,6 +67,41 @@ static void drm_cache_flush_clflush(struct page *pages[],
 }
 #endif
 
+#if defined(CONFIG_ARM)
+
+#include <asm/cacheflush.h>
+#include <asm/cachetype.h>
+#include <asm/highmem.h>
+#include <asm/outercache.h>
+
+static void drm_clflush_page(struct page *page)
+{
+	enum dma_data_direction dir = DMA_TO_DEVICE;
+	phys_addr_t phys = page_to_phys(page);
+	size_t size = PAGE_SIZE;
+	void *virt;
+
+	if (PageHighMem(page)) {
+		if (cache_is_vipt_nonaliasing()) {
+			virt = kmap_atomic(page);
+			dmac_map_area(virt, size, dir);
+			kunmap_atomic(virt);
+		} else {
+			virt = kmap_high_get(page);
+			if (virt) {
+				dmac_map_area(virt, size, dir);
+				kunmap_high(page);
+			}
+		}
+	} else {
+		virt = page_address(page);
+		dmac_map_area(virt, size, dir);
+	}
+
+	outer_flush_range(phys, phys + PAGE_SIZE);
+}
+#endif
+
 void
 drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {
@@ -94,6 +129,11 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 				   (unsigned long)page_virtual + PAGE_SIZE);
 		kunmap_atomic(page_virtual);
 	}
+#elif defined(CONFIG_ARM)
+	unsigned long i;
+
+	for (i = 0; i < num_pages; i++)
+		drm_clflush_page(pages[i]);
 #else
 	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
 	WARN_ON_ONCE(1);
@@ -118,6 +158,11 @@ drm_clflush_sg(struct sg_table *st)
 
 	if (wbinvd_on_all_cpus())
 		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+#elif defined(CONFIG_ARM)
+	struct sg_page_iter sg_iter;
+
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
+		drm_clflush_page(sg_page_iter_page(&sg_iter));
 #else
 	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
 	WARN_ON_ONCE(1);
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
@ 2015-04-09 14:34   ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, David Airlie, Catalin Marinas, intel-gfx,
	Will Deacon, Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Add implementations for drm_clflush_*() on ARM by borrowing code from
the DMA mapping API implementation. Unfortunately ARM doesn't export an
API to flush caches on a page by page basis, so this replicates most of
the code.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/drm_cache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 9a62d7a53553..200d86c3d72d 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -67,6 +67,41 @@ static void drm_cache_flush_clflush(struct page *pages[],
 }
 #endif
 
+#if defined(CONFIG_ARM)
+
+#include <asm/cacheflush.h>
+#include <asm/cachetype.h>
+#include <asm/highmem.h>
+#include <asm/outercache.h>
+
+static void drm_clflush_page(struct page *page)
+{
+	enum dma_data_direction dir = DMA_TO_DEVICE;
+	phys_addr_t phys = page_to_phys(page);
+	size_t size = PAGE_SIZE;
+	void *virt;
+
+	if (PageHighMem(page)) {
+		if (cache_is_vipt_nonaliasing()) {
+			virt = kmap_atomic(page);
+			dmac_map_area(virt, size, dir);
+			kunmap_atomic(virt);
+		} else {
+			virt = kmap_high_get(page);
+			if (virt) {
+				dmac_map_area(virt, size, dir);
+				kunmap_high(page);
+			}
+		}
+	} else {
+		virt = page_address(page);
+		dmac_map_area(virt, size, dir);
+	}
+
+	outer_flush_range(phys, phys + PAGE_SIZE);
+}
+#endif
+
 void
 drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {
@@ -94,6 +129,11 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 				   (unsigned long)page_virtual + PAGE_SIZE);
 		kunmap_atomic(page_virtual);
 	}
+#elif defined(CONFIG_ARM)
+	unsigned long i;
+
+	for (i = 0; i < num_pages; i++)
+		drm_clflush_page(pages[i]);
 #else
 	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
 	WARN_ON_ONCE(1);
@@ -118,6 +158,11 @@ drm_clflush_sg(struct sg_table *st)
 
 	if (wbinvd_on_all_cpus())
 		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+#elif defined(CONFIG_ARM)
+	struct sg_page_iter sg_iter;
+
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
+		drm_clflush_page(sg_page_iter_page(&sg_iter));
 #else
 	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
 	WARN_ON_ONCE(1);
-- 
2.3.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] drm/cache: Implement drm_clflush_*() for 64-bit ARM
  2015-04-09 14:34 ` Thierry Reding
@ 2015-04-09 14:34   ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Add implementations for drm_clflush_*() on 64-bit ARM. This shares a lot
of code with the 32-bit ARM implementation.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/drm_cache.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 200d86c3d72d..0c3072b4cdc9 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -102,6 +102,17 @@ static void drm_clflush_page(struct page *page)
 }
 #endif
 
+#if defined(CONFIG_ARM64)
+static void drm_clflush_page(struct page *page)
+{
+	void *virt;
+
+	virt = kmap_atomic(page);
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+	kunmap_atomic(virt);
+}
+#endif
+
 void
 drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {
@@ -129,7 +140,7 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 				   (unsigned long)page_virtual + PAGE_SIZE);
 		kunmap_atomic(page_virtual);
 	}
-#elif defined(CONFIG_ARM)
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	unsigned long i;
 
 	for (i = 0; i < num_pages; i++)
@@ -158,7 +169,7 @@ drm_clflush_sg(struct sg_table *st)
 
 	if (wbinvd_on_all_cpus())
 		printk(KERN_ERR "Timed out waiting for cache flush.\n");
-#elif defined(CONFIG_ARM)
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	struct sg_page_iter sg_iter;
 
 	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] drm/cache: Implement drm_clflush_*() for 64-bit ARM
@ 2015-04-09 14:34   ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, David Airlie, Catalin Marinas, intel-gfx,
	Will Deacon, Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Add implementations for drm_clflush_*() on 64-bit ARM. This shares a lot
of code with the 32-bit ARM implementation.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/drm_cache.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 200d86c3d72d..0c3072b4cdc9 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -102,6 +102,17 @@ static void drm_clflush_page(struct page *page)
 }
 #endif
 
+#if defined(CONFIG_ARM64)
+static void drm_clflush_page(struct page *page)
+{
+	void *virt;
+
+	virt = kmap_atomic(page);
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+	kunmap_atomic(virt);
+}
+#endif
+
 void
 drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {
@@ -129,7 +140,7 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 				   (unsigned long)page_virtual + PAGE_SIZE);
 		kunmap_atomic(page_virtual);
 	}
-#elif defined(CONFIG_ARM)
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	unsigned long i;
 
 	for (i = 0; i < num_pages; i++)
@@ -158,7 +169,7 @@ drm_clflush_sg(struct sg_table *st)
 
 	if (wbinvd_on_all_cpus())
 		printk(KERN_ERR "Timed out waiting for cache flush.\n");
-#elif defined(CONFIG_ARM)
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	struct sg_page_iter sg_iter;
 
 	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
-- 
2.3.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] drm/tegra: gem: Use drm_clflush_*() functions
  2015-04-09 14:34 ` Thierry Reding
@ 2015-04-09 14:34   ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/Kconfig |  1 +
 drivers/gpu/drm/tegra/gem.c   | 42 +++++++++++-------------------------------
 2 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index 74d9d621453d..4901f20f99a1 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -4,6 +4,7 @@ config DRM_TEGRA
 	depends on COMMON_CLK
 	depends on DRM
 	depends on RESET_CONTROLLER
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_MIPI_DSI
 	select DRM_PANEL
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 499f86739786..11e97a46e63d 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -203,48 +203,28 @@ static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo)
 
 static int tegra_bo_get_pages(struct drm_device *drm, struct tegra_bo *bo)
 {
-	struct scatterlist *s;
-	struct sg_table *sgt;
-	unsigned int i;
-
 	bo->pages = drm_gem_get_pages(&bo->gem);
 	if (IS_ERR(bo->pages))
 		return PTR_ERR(bo->pages);
 
 	bo->num_pages = bo->gem.size >> PAGE_SHIFT;
 
-	sgt = drm_prime_pages_to_sg(bo->pages, bo->num_pages);
-	if (IS_ERR(sgt))
-		goto put_pages;
+	bo->sgt = drm_prime_pages_to_sg(bo->pages, bo->num_pages);
+	if (IS_ERR(bo->sgt)) {
+		drm_gem_put_pages(&bo->gem, bo->pages, false, false);
+		return PTR_ERR(bo->sgt);
+	}
 
-#ifndef CONFIG_ARM64
 	/*
-	 * Fake up the SG table so that dma_map_sg() can be used to flush the
-	 * pages associated with it. Note that this relies on the fact that
-	 * the DMA API doesn't hook into IOMMU on Tegra, therefore mapping is
-	 * only cache maintenance.
-	 *
-	 * TODO: Replace this by drm_clflash_sg() once it can be implemented
-	 * without relying on symbols that are not exported.
+	 * Pages allocated by shmemfs are marked dirty but not flushed on
+	 * ARMv7 and ARMv8. Since this memory is used to back framebuffers,
+	 * however, they must be forced out of caches to avoid corruption
+	 * on screen later on as the result of dirty cache-lines being
+	 * flushed.
 	 */
-	for_each_sg(sgt->sgl, s, sgt->nents, i)
-		sg_dma_address(s) = sg_phys(s);
-
-	if (dma_map_sg(drm->dev, sgt->sgl, sgt->nents, DMA_TO_DEVICE) == 0)
-		goto release_sgt;
-#endif
-
-	bo->sgt = sgt;
+	drm_clflush_sg(bo->sgt);
 
 	return 0;
-
-release_sgt:
-	sg_free_table(sgt);
-	kfree(sgt);
-	sgt = ERR_PTR(-ENOMEM);
-put_pages:
-	drm_gem_put_pages(&bo->gem, bo->pages, false, false);
-	return PTR_ERR(sgt);
 }
 
 static int tegra_bo_alloc(struct drm_device *drm, struct tegra_bo *bo)
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] drm/tegra: gem: Use drm_clflush_*() functions
@ 2015-04-09 14:34   ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/Kconfig |  1 +
 drivers/gpu/drm/tegra/gem.c   | 42 +++++++++++-------------------------------
 2 files changed, 12 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/drm/tegra/Kconfig
index 74d9d621453d..4901f20f99a1 100644
--- a/drivers/gpu/drm/tegra/Kconfig
+++ b/drivers/gpu/drm/tegra/Kconfig
@@ -4,6 +4,7 @@ config DRM_TEGRA
 	depends on COMMON_CLK
 	depends on DRM
 	depends on RESET_CONTROLLER
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_MIPI_DSI
 	select DRM_PANEL
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 499f86739786..11e97a46e63d 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -203,48 +203,28 @@ static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo)
 
 static int tegra_bo_get_pages(struct drm_device *drm, struct tegra_bo *bo)
 {
-	struct scatterlist *s;
-	struct sg_table *sgt;
-	unsigned int i;
-
 	bo->pages = drm_gem_get_pages(&bo->gem);
 	if (IS_ERR(bo->pages))
 		return PTR_ERR(bo->pages);
 
 	bo->num_pages = bo->gem.size >> PAGE_SHIFT;
 
-	sgt = drm_prime_pages_to_sg(bo->pages, bo->num_pages);
-	if (IS_ERR(sgt))
-		goto put_pages;
+	bo->sgt = drm_prime_pages_to_sg(bo->pages, bo->num_pages);
+	if (IS_ERR(bo->sgt)) {
+		drm_gem_put_pages(&bo->gem, bo->pages, false, false);
+		return PTR_ERR(bo->sgt);
+	}
 
-#ifndef CONFIG_ARM64
 	/*
-	 * Fake up the SG table so that dma_map_sg() can be used to flush the
-	 * pages associated with it. Note that this relies on the fact that
-	 * the DMA API doesn't hook into IOMMU on Tegra, therefore mapping is
-	 * only cache maintenance.
-	 *
-	 * TODO: Replace this by drm_clflash_sg() once it can be implemented
-	 * without relying on symbols that are not exported.
+	 * Pages allocated by shmemfs are marked dirty but not flushed on
+	 * ARMv7 and ARMv8. Since this memory is used to back framebuffers,
+	 * however, they must be forced out of caches to avoid corruption
+	 * on screen later on as the result of dirty cache-lines being
+	 * flushed.
 	 */
-	for_each_sg(sgt->sgl, s, sgt->nents, i)
-		sg_dma_address(s) = sg_phys(s);
-
-	if (dma_map_sg(drm->dev, sgt->sgl, sgt->nents, DMA_TO_DEVICE) == 0)
-		goto release_sgt;
-#endif
-
-	bo->sgt = sgt;
+	drm_clflush_sg(bo->sgt);
 
 	return 0;
-
-release_sgt:
-	sg_free_table(sgt);
-	kfree(sgt);
-	sgt = ERR_PTR(-ENOMEM);
-put_pages:
-	drm_gem_put_pages(&bo->gem, bo->pages, false, false);
-	return PTR_ERR(sgt);
 }
 
 static int tegra_bo_alloc(struct drm_device *drm, struct tegra_bo *bo)
-- 
2.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
  2015-04-09 14:34 ` Thierry Reding
@ 2015-04-09 14:34   ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/armada/Kconfig      |  1 +
 drivers/gpu/drm/armada/armada_gem.c | 13 ++-----------
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/armada/Kconfig b/drivers/gpu/drm/armada/Kconfig
index 50ae88ad4d76..7b7070128a05 100644
--- a/drivers/gpu/drm/armada/Kconfig
+++ b/drivers/gpu/drm/armada/Kconfig
@@ -4,6 +4,7 @@ config DRM_ARMADA
 	select FB_CFB_FILLRECT
 	select FB_CFB_COPYAREA
 	select FB_CFB_IMAGEBLIT
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_KMS_FB_HELPER
 	help
diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
index 580e10acaa3a..c2d4414031ab 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
 			sg_set_page(sg, page, PAGE_SIZE, 0);
 		}
 
-		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
-			num = sgt->nents;
-			goto release;
-		}
+		drm_clflush_sg(sgt);
 	} else if (dobj->page) {
 		/* Single contiguous page */
 		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
 			goto free_sgt;
 
 		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
-
-		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
-			goto free_table;
+		drm_clflush_sg(sgt);
 	} else if (dobj->linear) {
 		/* Single contiguous physical region - no struct page */
 		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
@@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
  release:
 	for_each_sg(sgt->sgl, sg, num, i)
 		page_cache_release(sg_page(sg));
- free_table:
 	sg_free_table(sgt);
  free_sgt:
 	kfree(sgt);
@@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
 	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
 	int i;
 
-	if (!dobj->linear)
-		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
-
 	if (dobj->obj.filp) {
 		struct scatterlist *sg;
 		for_each_sg(sgt->sgl, sg, sgt->nents, i)
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
@ 2015-04-09 14:34   ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/armada/Kconfig      |  1 +
 drivers/gpu/drm/armada/armada_gem.c | 13 ++-----------
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/armada/Kconfig b/drivers/gpu/drm/armada/Kconfig
index 50ae88ad4d76..7b7070128a05 100644
--- a/drivers/gpu/drm/armada/Kconfig
+++ b/drivers/gpu/drm/armada/Kconfig
@@ -4,6 +4,7 @@ config DRM_ARMADA
 	select FB_CFB_FILLRECT
 	select FB_CFB_COPYAREA
 	select FB_CFB_IMAGEBLIT
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_KMS_FB_HELPER
 	help
diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
index 580e10acaa3a..c2d4414031ab 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
 			sg_set_page(sg, page, PAGE_SIZE, 0);
 		}
 
-		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
-			num = sgt->nents;
-			goto release;
-		}
+		drm_clflush_sg(sgt);
 	} else if (dobj->page) {
 		/* Single contiguous page */
 		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
 			goto free_sgt;
 
 		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
-
-		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
-			goto free_table;
+		drm_clflush_sg(sgt);
 	} else if (dobj->linear) {
 		/* Single contiguous physical region - no struct page */
 		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
@@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
  release:
 	for_each_sg(sgt->sgl, sg, num, i)
 		page_cache_release(sg_page(sg));
- free_table:
 	sg_free_table(sgt);
  free_sgt:
 	kfree(sgt);
@@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
 	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
 	int i;
 
-	if (!dobj->linear)
-		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
-
 	if (dobj->obj.filp) {
 		struct scatterlist *sg;
 		for_each_sg(sgt->sgl, sg, sgt->nents, i)
-- 
2.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] drm/msm: gem: Use drm_clflush_*() functions
  2015-04-09 14:34 ` Thierry Reding
@ 2015-04-09 14:34   ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/msm/Kconfig   | 1 +
 drivers/gpu/drm/msm/msm_gem.c | 9 +--------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 0a6f6764a37c..5da7fe793e89 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -5,6 +5,7 @@ config DRM_MSM
 	depends on ARCH_QCOM || (ARM && COMPILE_TEST)
 	depends on OF && COMMON_CLK
 	select REGULATOR
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select SHMEM
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 52839769eb6c..ce265085fc57 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -101,8 +101,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
 		 * because display controller, GPU, etc. are not coherent:
 		 */
 		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_map_sg(dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+			drm_clflush_sg(msm_obj->sgt);
 	}
 
 	return msm_obj->pages;
@@ -113,12 +112,6 @@ static void put_pages(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	if (msm_obj->pages) {
-		/* For non-cached buffers, ensure the new pages are clean
-		 * because display controller, GPU, etc. are not coherent:
-		 */
-		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 		sg_free_table(msm_obj->sgt);
 		kfree(msm_obj->sgt);
 
-- 
2.3.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] drm/msm: gem: Use drm_clflush_*() functions
@ 2015-04-09 14:34   ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-09 14:34 UTC (permalink / raw)
  To: dri-devel
  Cc: Arnd Bergmann, David Airlie, Catalin Marinas, intel-gfx,
	Will Deacon, Russell King, linux-arm-kernel

From: Thierry Reding <treding@nvidia.com>

Instead of going through the DMA mapping API for cache maintenance, use
the drm_clflush_*() family of functions to achieve the same effect.

Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/msm/Kconfig   | 1 +
 drivers/gpu/drm/msm/msm_gem.c | 9 +--------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 0a6f6764a37c..5da7fe793e89 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -5,6 +5,7 @@ config DRM_MSM
 	depends on ARCH_QCOM || (ARM && COMPILE_TEST)
 	depends on OF && COMMON_CLK
 	select REGULATOR
+	select DRM_CACHE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select SHMEM
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 52839769eb6c..ce265085fc57 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -101,8 +101,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
 		 * because display controller, GPU, etc. are not coherent:
 		 */
 		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_map_sg(dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+			drm_clflush_sg(msm_obj->sgt);
 	}
 
 	return msm_obj->pages;
@@ -113,12 +112,6 @@ static void put_pages(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	if (msm_obj->pages) {
-		/* For non-cached buffers, ensure the new pages are clean
-		 * because display controller, GPU, etc. are not coherent:
-		 */
-		if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-			dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
-					msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
 		sg_free_table(msm_obj->sgt);
 		kfree(msm_obj->sgt);
 
-- 
2.3.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
  2015-04-09 14:34   ` Thierry Reding
@ 2015-04-10 12:03     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Add implementations for drm_clflush_*() on ARM by borrowing code from
> the DMA mapping API implementation. Unfortunately ARM doesn't export an
> API to flush caches on a page by page basis, so this replicates most of
> the code.

I'm _really_ not happy with this, because it's poking about in ARM
internal implementation details of the DMA API.  It's also not going
to work too well on aliasing caches either - especially when you
consider that the userspace mapping of a page may have no relationship
to the address you get from kmap.

For an aliasing cache, the way things work with the DMA API, we ensure
that the kernel alias is clean whenever pages are un-kmapped, which
means that unmapped highmem pages never have L1 cache lines associated
with the kernel alias of them.  The user aliases are handled separately
via the normal flush_dcache_page()/flush_anon_page() calls.

None of this exists here...

It gets even more hairly on older ARM CPUs - but I hope no one is
expecting to support DRM's clflush there - we should make that explicit
though, and ensure that clflush support returns an error there.

That aside, we have most of this logic already inside
dma_cache_maint_page(), and even if it was the right thing to be doing,
we shouldn't be duplicating this architecture specific code inside a
driver.

So you can take that as a NAK on this.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
@ 2015-04-10 12:03     ` Russell King - ARM Linux
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 12:03 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	dri-devel, linux-arm-kernel

On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Add implementations for drm_clflush_*() on ARM by borrowing code from
> the DMA mapping API implementation. Unfortunately ARM doesn't export an
> API to flush caches on a page by page basis, so this replicates most of
> the code.

I'm _really_ not happy with this, because it's poking about in ARM
internal implementation details of the DMA API.  It's also not going
to work too well on aliasing caches either - especially when you
consider that the userspace mapping of a page may have no relationship
to the address you get from kmap.

For an aliasing cache, the way things work with the DMA API, we ensure
that the kernel alias is clean whenever pages are un-kmapped, which
means that unmapped highmem pages never have L1 cache lines associated
with the kernel alias of them.  The user aliases are handled separately
via the normal flush_dcache_page()/flush_anon_page() calls.

None of this exists here...

It gets even more hairly on older ARM CPUs - but I hope no one is
expecting to support DRM's clflush there - we should make that explicit
though, and ensure that clflush support returns an error there.

That aside, we have most of this logic already inside
dma_cache_maint_page(), and even if it was the right thing to be doing,
we shouldn't be duplicating this architecture specific code inside a
driver.

So you can take that as a NAK on this.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
  2015-04-09 14:34   ` Thierry Reding
@ 2015-04-10 12:08     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 09, 2015 at 04:34:08PM +0200, Thierry Reding wrote:
> diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
> index 580e10acaa3a..c2d4414031ab 100644
> --- a/drivers/gpu/drm/armada/armada_gem.c
> +++ b/drivers/gpu/drm/armada/armada_gem.c
> @@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
>  			sg_set_page(sg, page, PAGE_SIZE, 0);
>  		}
>  
> -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
> -			num = sgt->nents;
> -			goto release;
> -		}
> +		drm_clflush_sg(sgt);
>  	} else if (dobj->page) {
>  		/* Single contiguous page */
>  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
>  			goto free_sgt;
>  
>  		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
> -
> -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
> -			goto free_table;
> +		drm_clflush_sg(sgt);
>  	} else if (dobj->linear) {
>  		/* Single contiguous physical region - no struct page */
>  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> @@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
>   release:
>  	for_each_sg(sgt->sgl, sg, num, i)
>  		page_cache_release(sg_page(sg));
> - free_table:
>  	sg_free_table(sgt);
>   free_sgt:
>  	kfree(sgt);
> @@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
>  	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
>  	int i;
>  
> -	if (!dobj->linear)
> -		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
> -

I'm really wonder where this is the right thing to do.

DMA coherency on ARMv6 and ARMv7 CPUs is not just a case of "do something
just before DMA" - it's more complicated than that because of the
speculative prefetching.

What you must remember is this:

	Any memory which is readable to the CPU may be speculatively
	prefetched by the CPU, and cache lines allocated into the L1
	and L2 caches.

What this means is that if you're doing this:

	Flush caches
	Perform DMA to buffer
	Read buffer from CPU

You may or may not see the data you expect in the buffer - it's
indeterminant, depending on how aggressive the CPU has been at
prefetching data.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
@ 2015-04-10 12:08     ` Russell King - ARM Linux
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 12:08 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	dri-devel, linux-arm-kernel

On Thu, Apr 09, 2015 at 04:34:08PM +0200, Thierry Reding wrote:
> diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
> index 580e10acaa3a..c2d4414031ab 100644
> --- a/drivers/gpu/drm/armada/armada_gem.c
> +++ b/drivers/gpu/drm/armada/armada_gem.c
> @@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
>  			sg_set_page(sg, page, PAGE_SIZE, 0);
>  		}
>  
> -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
> -			num = sgt->nents;
> -			goto release;
> -		}
> +		drm_clflush_sg(sgt);
>  	} else if (dobj->page) {
>  		/* Single contiguous page */
>  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
>  			goto free_sgt;
>  
>  		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
> -
> -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
> -			goto free_table;
> +		drm_clflush_sg(sgt);
>  	} else if (dobj->linear) {
>  		/* Single contiguous physical region - no struct page */
>  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> @@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
>   release:
>  	for_each_sg(sgt->sgl, sg, num, i)
>  		page_cache_release(sg_page(sg));
> - free_table:
>  	sg_free_table(sgt);
>   free_sgt:
>  	kfree(sgt);
> @@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
>  	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
>  	int i;
>  
> -	if (!dobj->linear)
> -		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
> -

I'm really wonder where this is the right thing to do.

DMA coherency on ARMv6 and ARMv7 CPUs is not just a case of "do something
just before DMA" - it's more complicated than that because of the
speculative prefetching.

What you must remember is this:

	Any memory which is readable to the CPU may be speculatively
	prefetched by the CPU, and cache lines allocated into the L1
	and L2 caches.

What this means is that if you're doing this:

	Flush caches
	Perform DMA to buffer
	Read buffer from CPU

You may or may not see the data you expect in the buffer - it's
indeterminant, depending on how aggressive the CPU has been at
prefetching data.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
  2015-04-10 12:08     ` Russell King - ARM Linux
@ 2015-04-10 12:44       ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 01:08:02PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 09, 2015 at 04:34:08PM +0200, Thierry Reding wrote:
> > diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
> > index 580e10acaa3a..c2d4414031ab 100644
> > --- a/drivers/gpu/drm/armada/armada_gem.c
> > +++ b/drivers/gpu/drm/armada/armada_gem.c
> > @@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> >  			sg_set_page(sg, page, PAGE_SIZE, 0);
> >  		}
> >  
> > -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
> > -			num = sgt->nents;
> > -			goto release;
> > -		}
> > +		drm_clflush_sg(sgt);
> >  	} else if (dobj->page) {
> >  		/* Single contiguous page */
> >  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> >  			goto free_sgt;
> >  
> >  		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
> > -
> > -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
> > -			goto free_table;
> > +		drm_clflush_sg(sgt);
> >  	} else if (dobj->linear) {
> >  		/* Single contiguous physical region - no struct page */
> >  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> > @@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> >   release:
> >  	for_each_sg(sgt->sgl, sg, num, i)
> >  		page_cache_release(sg_page(sg));
> > - free_table:
> >  	sg_free_table(sgt);
> >   free_sgt:
> >  	kfree(sgt);
> > @@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> >  	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
> >  	int i;
> >  
> > -	if (!dobj->linear)
> > -		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
> > -
> 
> I'm really wonder where this is the right thing to do.

I think it isn't the right thing to do in this case indeed. Both Tegra
and MSM were using this pattern to make sure that caches are invalidated
upon allocation of memory from shmemfs, but I now realize that for the
Armada driver this isn't the case.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150410/629c796e/attachment.sig>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] drm/armada: gem: Use drm_clflush_*() functions
@ 2015-04-10 12:44       ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-10 12:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	dri-devel, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2056 bytes --]

On Fri, Apr 10, 2015 at 01:08:02PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 09, 2015 at 04:34:08PM +0200, Thierry Reding wrote:
> > diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
> > index 580e10acaa3a..c2d4414031ab 100644
> > --- a/drivers/gpu/drm/armada/armada_gem.c
> > +++ b/drivers/gpu/drm/armada/armada_gem.c
> > @@ -453,19 +453,14 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> >  			sg_set_page(sg, page, PAGE_SIZE, 0);
> >  		}
> >  
> > -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
> > -			num = sgt->nents;
> > -			goto release;
> > -		}
> > +		drm_clflush_sg(sgt);
> >  	} else if (dobj->page) {
> >  		/* Single contiguous page */
> >  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> >  			goto free_sgt;
> >  
> >  		sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
> > -
> > -		if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
> > -			goto free_table;
> > +		drm_clflush_sg(sgt);
> >  	} else if (dobj->linear) {
> >  		/* Single contiguous physical region - no struct page */
> >  		if (sg_alloc_table(sgt, 1, GFP_KERNEL))
> > @@ -480,7 +475,6 @@ armada_gem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> >   release:
> >  	for_each_sg(sgt->sgl, sg, num, i)
> >  		page_cache_release(sg_page(sg));
> > - free_table:
> >  	sg_free_table(sgt);
> >   free_sgt:
> >  	kfree(sgt);
> > @@ -494,9 +488,6 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> >  	struct armada_gem_object *dobj = drm_to_armada_gem(obj);
> >  	int i;
> >  
> > -	if (!dobj->linear)
> > -		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
> > -
> 
> I'm really wonder where this is the right thing to do.

I think it isn't the right thing to do in this case indeed. Both Tegra
and MSM were using this pattern to make sure that caches are invalidated
upon allocation of memory from shmemfs, but I now realize that for the
Armada driver this isn't the case.

Thierry

[-- Attachment #1.2: Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
  2015-04-10 12:03     ` Russell King - ARM Linux
@ 2015-04-10 13:05       ` Thierry Reding
  -1 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-10 13:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 01:03:13PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Add implementations for drm_clflush_*() on ARM by borrowing code from
> > the DMA mapping API implementation. Unfortunately ARM doesn't export an
> > API to flush caches on a page by page basis, so this replicates most of
> > the code.
> 
> I'm _really_ not happy with this, because it's poking about in ARM
> internal implementation details of the DMA API.  It's also not going
> to work too well on aliasing caches either - especially when you
> consider that the userspace mapping of a page may have no relationship
> to the address you get from kmap.
> 
> For an aliasing cache, the way things work with the DMA API, we ensure
> that the kernel alias is clean whenever pages are un-kmapped, which
> means that unmapped highmem pages never have L1 cache lines associated
> with the kernel alias of them.  The user aliases are handled separately
> via the normal flush_dcache_page()/flush_anon_page() calls.
> 
> None of this exists here...
> 
> It gets even more hairly on older ARM CPUs - but I hope no one is
> expecting to support DRM's clflush there - we should make that explicit
> though, and ensure that clflush support returns an error there.
> 
> That aside, we have most of this logic already inside
> dma_cache_maint_page(), and even if it was the right thing to be doing,
> we shouldn't be duplicating this architecture specific code inside a
> driver.
> 
> So you can take that as a NAK on this.

Perhaps I should take a step back and explain what I'm trying to solve,
maybe that'll allow us to come up with a more proper solution.

Both the MSM and Tegra drivers allocate framebuffers from shmem in the
presence of an IOMMU. The problem with allocating pages from the shmem
is that pages end up not being flushed, resulting in visual artifacts
on the screen (horizontal black lines) when the cachelines from earlier
allocations of these pages get flushed. The drivers also use the IOMMU
API directly to manage the IOVA space.

Currently both drivers work around this by faking up an sg_table and
calling dma_map_sg(), which ends up doing the right thing. Unfortunately
this only works if nothing backs the DMA API and we end up getting the
regular arm_dma_ops. If an IOMMU registers with the ARM/DMA glue, we'll
end up getting a different set of dma_ops and things break (buffers are
mapped through the IOMMU twice, ...).

Still, this has worked fine on Tegra (and I assume the same is true for
MSM) so far because we don't register an IOMMU with the ARM/DMA glue.
However while porting this to 64-bit ARM I started seeing failures to
map buffers because SWIOTLB is enabled by default and gets in the way.

With regards to code duplication, I suspect we could call something like
arm_dma_ops.sync_sg_for_device() directly, but that raises the question
about which device to pass in. It isn't clear at allocation time which
device will use the buffer. It may in fact be used by multiple devices
at once.

Do you have an alternative suggestion on how to fix this?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150410/d26ac0a7/attachment.sig>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM
@ 2015-04-10 13:05       ` Thierry Reding
  0 siblings, 0 replies; 20+ messages in thread
From: Thierry Reding @ 2015-04-10 13:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Catalin Marinas, intel-gfx, Will Deacon,
	dri-devel, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3248 bytes --]

On Fri, Apr 10, 2015 at 01:03:13PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Add implementations for drm_clflush_*() on ARM by borrowing code from
> > the DMA mapping API implementation. Unfortunately ARM doesn't export an
> > API to flush caches on a page by page basis, so this replicates most of
> > the code.
> 
> I'm _really_ not happy with this, because it's poking about in ARM
> internal implementation details of the DMA API.  It's also not going
> to work too well on aliasing caches either - especially when you
> consider that the userspace mapping of a page may have no relationship
> to the address you get from kmap.
> 
> For an aliasing cache, the way things work with the DMA API, we ensure
> that the kernel alias is clean whenever pages are un-kmapped, which
> means that unmapped highmem pages never have L1 cache lines associated
> with the kernel alias of them.  The user aliases are handled separately
> via the normal flush_dcache_page()/flush_anon_page() calls.
> 
> None of this exists here...
> 
> It gets even more hairly on older ARM CPUs - but I hope no one is
> expecting to support DRM's clflush there - we should make that explicit
> though, and ensure that clflush support returns an error there.
> 
> That aside, we have most of this logic already inside
> dma_cache_maint_page(), and even if it was the right thing to be doing,
> we shouldn't be duplicating this architecture specific code inside a
> driver.
> 
> So you can take that as a NAK on this.

Perhaps I should take a step back and explain what I'm trying to solve,
maybe that'll allow us to come up with a more proper solution.

Both the MSM and Tegra drivers allocate framebuffers from shmem in the
presence of an IOMMU. The problem with allocating pages from the shmem
is that pages end up not being flushed, resulting in visual artifacts
on the screen (horizontal black lines) when the cachelines from earlier
allocations of these pages get flushed. The drivers also use the IOMMU
API directly to manage the IOVA space.

Currently both drivers work around this by faking up an sg_table and
calling dma_map_sg(), which ends up doing the right thing. Unfortunately
this only works if nothing backs the DMA API and we end up getting the
regular arm_dma_ops. If an IOMMU registers with the ARM/DMA glue, we'll
end up getting a different set of dma_ops and things break (buffers are
mapped through the IOMMU twice, ...).

Still, this has worked fine on Tegra (and I assume the same is true for
MSM) so far because we don't register an IOMMU with the ARM/DMA glue.
However while porting this to 64-bit ARM I started seeing failures to
map buffers because SWIOTLB is enabled by default and gets in the way.

With regards to code duplication, I suspect we could call something like
arm_dma_ops.sync_sg_for_device() directly, but that raises the question
about which device to pass in. It isn't clear at allocation time which
device will use the buffer. It may in fact be used by multiple devices
at once.

Do you have an alternative suggestion on how to fix this?

Thierry

[-- Attachment #1.2: Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-04-10 13:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09 14:34 [PATCH 1/6] drm/cache: Build-in drm_clflush_*() functions Thierry Reding
2015-04-09 14:34 ` Thierry Reding
2015-04-09 14:34 ` [PATCH 2/6] drm/cache: Implement drm_clflush_*() for ARM Thierry Reding
2015-04-09 14:34   ` Thierry Reding
2015-04-10 12:03   ` Russell King - ARM Linux
2015-04-10 12:03     ` Russell King - ARM Linux
2015-04-10 13:05     ` Thierry Reding
2015-04-10 13:05       ` Thierry Reding
2015-04-09 14:34 ` [PATCH 3/6] drm/cache: Implement drm_clflush_*() for 64-bit ARM Thierry Reding
2015-04-09 14:34   ` Thierry Reding
2015-04-09 14:34 ` [PATCH 4/6] drm/tegra: gem: Use drm_clflush_*() functions Thierry Reding
2015-04-09 14:34   ` Thierry Reding
2015-04-09 14:34 ` [PATCH 5/6] drm/armada: " Thierry Reding
2015-04-09 14:34   ` Thierry Reding
2015-04-10 12:08   ` Russell King - ARM Linux
2015-04-10 12:08     ` Russell King - ARM Linux
2015-04-10 12:44     ` Thierry Reding
2015-04-10 12:44       ` Thierry Reding
2015-04-09 14:34 ` [PATCH 6/6] drm/msm: " Thierry Reding
2015-04-09 14:34   ` Thierry Reding

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.