All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] drm/ttm: nouveau: memory coherency for ARM
@ 2014-06-24  9:54 ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Courbot

For this v2 I have fixed the patches that are non-controversial (all Lucas' :))
and am resubmitting them in the hope that they will get merged. This will
just leave the issue of Nouveau system-memory buffers mapping to be solved.

This issue is quite complex, so let me summarize the situation and the data
I have at hand. ARM caching is like a quantum world where Murphy's law
constantly applies: sometimes things that you don't expect to work happen to
work, but never the ones you would like to.

On ARM the accepted wisdom is that all CPU mappings must share the same
attributes, and that failure to do so results in undefined behavior. I have
heard voices claiming that recent ARM SoCs were not affected by this, but have
yet to see hard evidence that it is indeed the case.

Most (or all) physical memory happens to be already mapped, cached, in the
lowmem area of the virtual address space. This means that if we want to be
safe wrt. the rule mentioned above, we must perform all subsequent memory
mappings the same way.

Nouveau currently performs its memory mappings cached, since it can rely on PCI
to snoop and invalidate addresses written by the CPU - something that we don't
have on ARM shared-memory systems. I had proposed a (bad) patch in the previous
revision of this series that added a way to flush the CPU cache after each write
(http://lists.freedesktop.org/archives/dri-devel/2014-May/059893.html ) but it
did not trigger a lot of approval. Instead, it has been suggested to map such
BOs write-combined.

This would break the "one mapping type only" rule, but I gave it a try by
changing the TTM_PL_TT manager's authorized caching to WC. This immediatly
resulted in breakage of applications using the GPU. Digging a little further,
I noticed that kernel mappings could be performed WC or uncached without any
problem, but that user-space mappings *must* be cached under all circumstances.
Failure to do so results in invalid pushbuffers being sent to the GPUs, messed
up vertices, and other corruptions. Uncached mappings result in the same
breakage.

So, to summarize our options for GK20A:
1) Keeping mappings of TTM buffers cached seems to be the safest option, as it
is consistent with the lowmem mapping likely affecting the memory of our
buffers. But we will have to flush kernel CPU writes to these buffers one way
or the other.
2) Changing the kernel mappings to WC or uncached seems to be safe. However
user-space mappings must still be cached or inconsistencies happen. This
double-policy for kernel and user-space mappings is not implemented in TTM
and nothing so far suggests that it should be.

And that's the state where we are. I am not considering the other possibilities
(carving memory out of lowmem, etc.) as they have already been discussed many
times by people much smarter than me (e.g. http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html )
and it seems that the issue is still here nonetheless.

At this point suggestions towards option 1) or 2) (or where I could have
screwed up in my understanding of ARM mappings) are welcome. And in the
meantime, let's try to get the 3 guys below merged!

Changes since v1:
- Removed conditional compilation for Nouveau cache sync handler
- Refactored nouveau_gem_ioctl_cpu_prep() into a new function to keep buffer
  cache management into nouveau_bo.c

Lucas Stach (3):
  drm/ttm: recognize ARM arch in ioprot handler
  drm/ttm: introduce dma cache sync helpers
  drm/nouveau: hook up cache sync functions

 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 drivers/gpu/drm/ttm/ttm_bo_util.c     |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c          | 25 +++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h       | 28 +++++++++++++++++++++
 6 files changed, 105 insertions(+), 8 deletions(-)

-- 
2.0.0

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 0/3] drm/ttm: nouveau: memory coherency for ARM
@ 2014-06-24  9:54 ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	gnurou, Alexandre Courbot

For this v2 I have fixed the patches that are non-controversial (all Lucas' :))
and am resubmitting them in the hope that they will get merged. This will
just leave the issue of Nouveau system-memory buffers mapping to be solved.

This issue is quite complex, so let me summarize the situation and the data
I have at hand. ARM caching is like a quantum world where Murphy's law
constantly applies: sometimes things that you don't expect to work happen to
work, but never the ones you would like to.

On ARM the accepted wisdom is that all CPU mappings must share the same
attributes, and that failure to do so results in undefined behavior. I have
heard voices claiming that recent ARM SoCs were not affected by this, but have
yet to see hard evidence that it is indeed the case.

Most (or all) physical memory happens to be already mapped, cached, in the
lowmem area of the virtual address space. This means that if we want to be
safe wrt. the rule mentioned above, we must perform all subsequent memory
mappings the same way.

Nouveau currently performs its memory mappings cached, since it can rely on PCI
to snoop and invalidate addresses written by the CPU - something that we don't
have on ARM shared-memory systems. I had proposed a (bad) patch in the previous
revision of this series that added a way to flush the CPU cache after each write
(http://lists.freedesktop.org/archives/dri-devel/2014-May/059893.html ) but it
did not trigger a lot of approval. Instead, it has been suggested to map such
BOs write-combined.

This would break the "one mapping type only" rule, but I gave it a try by
changing the TTM_PL_TT manager's authorized caching to WC. This immediatly
resulted in breakage of applications using the GPU. Digging a little further,
I noticed that kernel mappings could be performed WC or uncached without any
problem, but that user-space mappings *must* be cached under all circumstances.
Failure to do so results in invalid pushbuffers being sent to the GPUs, messed
up vertices, and other corruptions. Uncached mappings result in the same
breakage.

So, to summarize our options for GK20A:
1) Keeping mappings of TTM buffers cached seems to be the safest option, as it
is consistent with the lowmem mapping likely affecting the memory of our
buffers. But we will have to flush kernel CPU writes to these buffers one way
or the other.
2) Changing the kernel mappings to WC or uncached seems to be safe. However
user-space mappings must still be cached or inconsistencies happen. This
double-policy for kernel and user-space mappings is not implemented in TTM
and nothing so far suggests that it should be.

And that's the state where we are. I am not considering the other possibilities
(carving memory out of lowmem, etc.) as they have already been discussed many
times by people much smarter than me (e.g. http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html )
and it seems that the issue is still here nonetheless.

At this point suggestions towards option 1) or 2) (or where I could have
screwed up in my understanding of ARM mappings) are welcome. And in the
meantime, let's try to get the 3 guys below merged!

Changes since v1:
- Removed conditional compilation for Nouveau cache sync handler
- Refactored nouveau_gem_ioctl_cpu_prep() into a new function to keep buffer
  cache management into nouveau_bo.c

Lucas Stach (3):
  drm/ttm: recognize ARM arch in ioprot handler
  drm/ttm: introduce dma cache sync helpers
  drm/nouveau: hook up cache sync functions

 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 drivers/gpu/drm/ttm/ttm_bo_util.c     |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c          | 25 +++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h       | 28 +++++++++++++++++++++
 6 files changed, 105 insertions(+), 8 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 0/3] drm/ttm: nouveau: memory coherency for ARM
@ 2014-06-24  9:54 ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

For this v2 I have fixed the patches that are non-controversial (all Lucas' :))
and am resubmitting them in the hope that they will get merged. This will
just leave the issue of Nouveau system-memory buffers mapping to be solved.

This issue is quite complex, so let me summarize the situation and the data
I have at hand. ARM caching is like a quantum world where Murphy's law
constantly applies: sometimes things that you don't expect to work happen to
work, but never the ones you would like to.

On ARM the accepted wisdom is that all CPU mappings must share the same
attributes, and that failure to do so results in undefined behavior. I have
heard voices claiming that recent ARM SoCs were not affected by this, but have
yet to see hard evidence that it is indeed the case.

Most (or all) physical memory happens to be already mapped, cached, in the
lowmem area of the virtual address space. This means that if we want to be
safe wrt. the rule mentioned above, we must perform all subsequent memory
mappings the same way.

Nouveau currently performs its memory mappings cached, since it can rely on PCI
to snoop and invalidate addresses written by the CPU - something that we don't
have on ARM shared-memory systems. I had proposed a (bad) patch in the previous
revision of this series that added a way to flush the CPU cache after each write
(http://lists.freedesktop.org/archives/dri-devel/2014-May/059893.html ) but it
did not trigger a lot of approval. Instead, it has been suggested to map such
BOs write-combined.

This would break the "one mapping type only" rule, but I gave it a try by
changing the TTM_PL_TT manager's authorized caching to WC. This immediatly
resulted in breakage of applications using the GPU. Digging a little further,
I noticed that kernel mappings could be performed WC or uncached without any
problem, but that user-space mappings *must* be cached under all circumstances.
Failure to do so results in invalid pushbuffers being sent to the GPUs, messed
up vertices, and other corruptions. Uncached mappings result in the same
breakage.

So, to summarize our options for GK20A:
1) Keeping mappings of TTM buffers cached seems to be the safest option, as it
is consistent with the lowmem mapping likely affecting the memory of our
buffers. But we will have to flush kernel CPU writes to these buffers one way
or the other.
2) Changing the kernel mappings to WC or uncached seems to be safe. However
user-space mappings must still be cached or inconsistencies happen. This
double-policy for kernel and user-space mappings is not implemented in TTM
and nothing so far suggests that it should be.

And that's the state where we are. I am not considering the other possibilities
(carving memory out of lowmem, etc.) as they have already been discussed many
times by people much smarter than me (e.g. http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html )
and it seems that the issue is still here nonetheless.

At this point suggestions towards option 1) or 2) (or where I could have
screwed up in my understanding of ARM mappings) are welcome. And in the
meantime, let's try to get the 3 guys below merged!

Changes since v1:
- Removed conditional compilation for Nouveau cache sync handler
- Refactored nouveau_gem_ioctl_cpu_prep() into a new function to keep buffer
  cache management into nouveau_bo.c

Lucas Stach (3):
  drm/ttm: recognize ARM arch in ioprot handler
  drm/ttm: introduce dma cache sync helpers
  drm/nouveau: hook up cache sync functions

 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 drivers/gpu/drm/ttm/ttm_bo_util.c     |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c          | 25 +++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h       | 28 +++++++++++++++++++++
 6 files changed, 105 insertions(+), 8 deletions(-)

-- 
2.0.0

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 1/3] drm/ttm: recognize ARM arch in ioprot handler
  2014-06-24  9:54 ` Alexandre Courbot
  (?)
@ 2014-06-24  9:54   ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: gnurou, nouveau, linux-kernel, dri-devel, linux-tegra, linux-arm-kernel

From: Lucas Stach <dev@lynxeye.de>

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..30e5d90cb7bc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -500,7 +500,7 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
 			pgprot_val(tmp) |= _PAGE_GUARDED;
 	}
 #endif
-#if defined(__ia64__)
+#if defined(__ia64__) || defined(__arm__)
 	if (caching_flags & TTM_PL_FLAG_WC)
 		tmp = pgprot_writecombine(tmp);
 	else
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 1/3] drm/ttm: recognize ARM arch in ioprot handler
@ 2014-06-24  9:54   ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	gnurou, Alexandre Courbot

From: Lucas Stach <dev@lynxeye.de>

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..30e5d90cb7bc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -500,7 +500,7 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
 			pgprot_val(tmp) |= _PAGE_GUARDED;
 	}
 #endif
-#if defined(__ia64__)
+#if defined(__ia64__) || defined(__arm__)
 	if (caching_flags & TTM_PL_FLAG_WC)
 		tmp = pgprot_writecombine(tmp);
 	else
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 1/3] drm/ttm: recognize ARM arch in ioprot handler
@ 2014-06-24  9:54   ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lucas Stach <dev@lynxeye.de>

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..30e5d90cb7bc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -500,7 +500,7 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
 			pgprot_val(tmp) |= _PAGE_GUARDED;
 	}
 #endif
-#if defined(__ia64__)
+#if defined(__ia64__) || defined(__arm__)
 	if (caching_flags & TTM_PL_FLAG_WC)
 		tmp = pgprot_writecombine(tmp);
 	else
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24  9:54 ` Alexandre Courbot
  (?)
@ 2014-06-24  9:54     ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

From: Lucas Stach <dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>

On architectures for which access to GPU memory is non-coherent,
caches need to be flushed and invalidated explicitly at the
appropriate places. Introduce two small helpers to make things
easy for TTM-based drivers.

Signed-off-by: Lucas Stach <dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>
Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/ttm/ttm_tt.c    | 25 +++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f319090043..66c16ad35f70 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -38,6 +38,7 @@
 #include <linux/swap.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/dma-mapping.h>
 #include <drm/drm_cache.h>
 #include <drm/drm_mem_util.h>
 #include <drm/ttm/ttm_module.h>
@@ -248,6 +249,30 @@ void ttm_dma_tt_fini(struct ttm_dma_tt *ttm_dma)
 }
 EXPORT_SYMBOL(ttm_dma_tt_fini);
 
+void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+				      struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_device(dev, ttm_dma->dma_address[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_device);
+
+void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+				   struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_cpu(dev, ttm_dma->dma_address[i],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_cpu);
+
 void ttm_tt_unbind(struct ttm_tt *ttm)
 {
 	int ret;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..52fb709568fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -41,6 +41,7 @@
 #include <linux/fs.h>
 #include <linux/spinlock.h>
 #include <linux/reservation.h>
+#include <linux/device.h>
 
 struct ttm_backend_func {
 	/**
@@ -690,6 +691,33 @@ extern int ttm_tt_swapout(struct ttm_tt *ttm,
  */
 extern void ttm_tt_unpopulate(struct ttm_tt *ttm);
 
+/**
+ * ttm_dma_tt_cache_sync_for_device:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device to which to sync.
+ *
+ * This function will flush the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that data written by the CPU is visible to the device.
+ */
+extern void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+					     struct device *dev);
+
+/**
+ * ttm_dma_tt_cache_sync_for_cpu:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device from which to sync.
+ *
+ * This function will invalidate the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that the CPU does not read any stale cached or
+ * prefetched data.
+ */
+extern void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+					  struct device *dev);
+
 /*
  * ttm_bo.c
  */
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24  9:54     ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	gnurou, Alexandre Courbot

From: Lucas Stach <dev@lynxeye.de>

On architectures for which access to GPU memory is non-coherent,
caches need to be flushed and invalidated explicitly at the
appropriate places. Introduce two small helpers to make things
easy for TTM-based drivers.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c    | 25 +++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f319090043..66c16ad35f70 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -38,6 +38,7 @@
 #include <linux/swap.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/dma-mapping.h>
 #include <drm/drm_cache.h>
 #include <drm/drm_mem_util.h>
 #include <drm/ttm/ttm_module.h>
@@ -248,6 +249,30 @@ void ttm_dma_tt_fini(struct ttm_dma_tt *ttm_dma)
 }
 EXPORT_SYMBOL(ttm_dma_tt_fini);
 
+void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+				      struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_device(dev, ttm_dma->dma_address[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_device);
+
+void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+				   struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_cpu(dev, ttm_dma->dma_address[i],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_cpu);
+
 void ttm_tt_unbind(struct ttm_tt *ttm)
 {
 	int ret;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..52fb709568fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -41,6 +41,7 @@
 #include <linux/fs.h>
 #include <linux/spinlock.h>
 #include <linux/reservation.h>
+#include <linux/device.h>
 
 struct ttm_backend_func {
 	/**
@@ -690,6 +691,33 @@ extern int ttm_tt_swapout(struct ttm_tt *ttm,
  */
 extern void ttm_tt_unpopulate(struct ttm_tt *ttm);
 
+/**
+ * ttm_dma_tt_cache_sync_for_device:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device to which to sync.
+ *
+ * This function will flush the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that data written by the CPU is visible to the device.
+ */
+extern void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+					     struct device *dev);
+
+/**
+ * ttm_dma_tt_cache_sync_for_cpu:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device from which to sync.
+ *
+ * This function will invalidate the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that the CPU does not read any stale cached or
+ * prefetched data.
+ */
+extern void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+					  struct device *dev);
+
 /*
  * ttm_bo.c
  */
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24  9:54     ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lucas Stach <dev@lynxeye.de>

On architectures for which access to GPU memory is non-coherent,
caches need to be flushed and invalidated explicitly at the
appropriate places. Introduce two small helpers to make things
easy for TTM-based drivers.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c    | 25 +++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f319090043..66c16ad35f70 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -38,6 +38,7 @@
 #include <linux/swap.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/dma-mapping.h>
 #include <drm/drm_cache.h>
 #include <drm/drm_mem_util.h>
 #include <drm/ttm/ttm_module.h>
@@ -248,6 +249,30 @@ void ttm_dma_tt_fini(struct ttm_dma_tt *ttm_dma)
 }
 EXPORT_SYMBOL(ttm_dma_tt_fini);
 
+void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+				      struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_device(dev, ttm_dma->dma_address[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_device);
+
+void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+				   struct device *dev)
+{
+	unsigned long i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_cpu(dev, ttm_dma->dma_address[i],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_cpu);
+
 void ttm_tt_unbind(struct ttm_tt *ttm)
 {
 	int ret;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..52fb709568fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -41,6 +41,7 @@
 #include <linux/fs.h>
 #include <linux/spinlock.h>
 #include <linux/reservation.h>
+#include <linux/device.h>
 
 struct ttm_backend_func {
 	/**
@@ -690,6 +691,33 @@ extern int ttm_tt_swapout(struct ttm_tt *ttm,
  */
 extern void ttm_tt_unpopulate(struct ttm_tt *ttm);
 
+/**
+ * ttm_dma_tt_cache_sync_for_device:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device to which to sync.
+ *
+ * This function will flush the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that data written by the CPU is visible to the device.
+ */
+extern void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+					     struct device *dev);
+
+/**
+ * ttm_dma_tt_cache_sync_for_cpu:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device from which to sync.
+ *
+ * This function will invalidate the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an (almost)
+ * noop. This makes sure that the CPU does not read any stale cached or
+ * prefetched data.
+ */
+extern void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+					  struct device *dev);
+
 /*
  * ttm_bo.c
  */
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 3/3] drm/nouveau: hook up cache sync functions
  2014-06-24  9:54 ` Alexandre Courbot
  (?)
@ 2014-06-24  9:54   ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: gnurou, nouveau, linux-kernel, dri-devel, linux-tegra, linux-arm-kernel

From: Lucas Stach <dev@lynxeye.de>

Use the newly-introduced TTM cache sync functions in Nouveau.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
[acourbot@nvidia.com: rearrange code, make platform-friendly]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 3 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..74c68c16e777 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -284,6 +284,34 @@ set_placement_range(struct nouveau_bo *nvbo, uint32_t type)
 	}
 }
 
+static void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_device *device;
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+	if (nvbo->bo.ttm && nvbo->bo.ttm->caching_state == tt_cached)
+		ttm_dma_tt_cache_sync_for_cpu((struct ttm_dma_tt *)nvbo->bo.ttm,
+					      nv_device_base(device));
+}
+
+static void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	if (ttm && ttm->caching_state == tt_cached) {
+		struct nouveau_device *device;
+
+		device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+		ttm_dma_tt_cache_sync_for_device((struct ttm_dma_tt *)ttm,
+						 nv_device_base(device));
+	}
+}
+
 void
 nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 {
@@ -407,6 +435,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 {
 	int ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	ret = ttm_bo_validate(&nvbo->bo, &nvbo->placement,
 			      interruptible, no_wait_gpu);
 	if (ret)
@@ -415,6 +445,23 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 	return 0;
 }
 
+int
+nouveau_bo_wait(struct nouveau_bo *nvbo, bool no_wait)
+{
+	int ret;
+
+	spin_lock(&nvbo->bo.bdev->fence_lock);
+	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
+	spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+	if (ret)
+		return ret;
+
+	nouveau_bo_sync_for_cpu(nvbo);
+
+	return 0;
+}
+
 u16
 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ff17c1f432fc..a4e9052d54fd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -81,6 +81,7 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32 val);
 void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *);
 int  nouveau_bo_validate(struct nouveau_bo *, bool interruptible,
 			 bool no_wait_gpu);
+int nouveau_bo_wait(struct nouveau_bo *, bool no_wait);
 
 struct nouveau_vma *
 nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..916cb8ff568c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -884,19 +884,15 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 {
 	struct drm_nouveau_gem_cpu_prep *req = data;
 	struct drm_gem_object *gem;
-	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-	int ret = -EINVAL;
+	int ret;
 
 	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
 	if (!gem)
 		return -ENOENT;
-	nvbo = nouveau_gem_object(gem);
-
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
+	ret = nouveau_bo_wait(nouveau_gem_object(gem), no_wait);
 	drm_gem_object_unreference_unlocked(gem);
+
 	return ret;
 }
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 3/3] drm/nouveau: hook up cache sync functions
@ 2014-06-24  9:54   ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding
  Cc: nouveau, dri-devel, linux-tegra, linux-kernel, linux-arm-kernel,
	gnurou, Alexandre Courbot

From: Lucas Stach <dev@lynxeye.de>

Use the newly-introduced TTM cache sync functions in Nouveau.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
[acourbot@nvidia.com: rearrange code, make platform-friendly]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 3 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..74c68c16e777 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -284,6 +284,34 @@ set_placement_range(struct nouveau_bo *nvbo, uint32_t type)
 	}
 }
 
+static void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_device *device;
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+	if (nvbo->bo.ttm && nvbo->bo.ttm->caching_state == tt_cached)
+		ttm_dma_tt_cache_sync_for_cpu((struct ttm_dma_tt *)nvbo->bo.ttm,
+					      nv_device_base(device));
+}
+
+static void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	if (ttm && ttm->caching_state == tt_cached) {
+		struct nouveau_device *device;
+
+		device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+		ttm_dma_tt_cache_sync_for_device((struct ttm_dma_tt *)ttm,
+						 nv_device_base(device));
+	}
+}
+
 void
 nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 {
@@ -407,6 +435,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 {
 	int ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	ret = ttm_bo_validate(&nvbo->bo, &nvbo->placement,
 			      interruptible, no_wait_gpu);
 	if (ret)
@@ -415,6 +445,23 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 	return 0;
 }
 
+int
+nouveau_bo_wait(struct nouveau_bo *nvbo, bool no_wait)
+{
+	int ret;
+
+	spin_lock(&nvbo->bo.bdev->fence_lock);
+	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
+	spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+	if (ret)
+		return ret;
+
+	nouveau_bo_sync_for_cpu(nvbo);
+
+	return 0;
+}
+
 u16
 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ff17c1f432fc..a4e9052d54fd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -81,6 +81,7 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32 val);
 void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *);
 int  nouveau_bo_validate(struct nouveau_bo *, bool interruptible,
 			 bool no_wait_gpu);
+int nouveau_bo_wait(struct nouveau_bo *, bool no_wait);
 
 struct nouveau_vma *
 nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..916cb8ff568c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -884,19 +884,15 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 {
 	struct drm_nouveau_gem_cpu_prep *req = data;
 	struct drm_gem_object *gem;
-	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-	int ret = -EINVAL;
+	int ret;
 
 	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
 	if (!gem)
 		return -ENOENT;
-	nvbo = nouveau_gem_object(gem);
-
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
+	ret = nouveau_bo_wait(nouveau_gem_object(gem), no_wait);
 	drm_gem_object_unreference_unlocked(gem);
+
 	return ret;
 }
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 3/3] drm/nouveau: hook up cache sync functions
@ 2014-06-24  9:54   ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Lucas Stach <dev@lynxeye.de>

Use the newly-introduced TTM cache sync functions in Nouveau.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
[acourbot at nvidia.com: rearrange code, make platform-friendly]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 47 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  1 +
 drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +++-----
 3 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..74c68c16e777 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -284,6 +284,34 @@ set_placement_range(struct nouveau_bo *nvbo, uint32_t type)
 	}
 }
 
+static void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_device *device;
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+	if (nvbo->bo.ttm && nvbo->bo.ttm->caching_state == tt_cached)
+		ttm_dma_tt_cache_sync_for_cpu((struct ttm_dma_tt *)nvbo->bo.ttm,
+					      nv_device_base(device));
+}
+
+static void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	if (ttm && ttm->caching_state == tt_cached) {
+		struct nouveau_device *device;
+
+		device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+		ttm_dma_tt_cache_sync_for_device((struct ttm_dma_tt *)ttm,
+						 nv_device_base(device));
+	}
+}
+
 void
 nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 {
@@ -407,6 +435,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 {
 	int ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	ret = ttm_bo_validate(&nvbo->bo, &nvbo->placement,
 			      interruptible, no_wait_gpu);
 	if (ret)
@@ -415,6 +445,23 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 	return 0;
 }
 
+int
+nouveau_bo_wait(struct nouveau_bo *nvbo, bool no_wait)
+{
+	int ret;
+
+	spin_lock(&nvbo->bo.bdev->fence_lock);
+	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
+	spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+	if (ret)
+		return ret;
+
+	nouveau_bo_sync_for_cpu(nvbo);
+
+	return 0;
+}
+
 u16
 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ff17c1f432fc..a4e9052d54fd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -81,6 +81,7 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32 val);
 void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *);
 int  nouveau_bo_validate(struct nouveau_bo *, bool interruptible,
 			 bool no_wait_gpu);
+int nouveau_bo_wait(struct nouveau_bo *, bool no_wait);
 
 struct nouveau_vma *
 nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..916cb8ff568c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -884,19 +884,15 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 {
 	struct drm_nouveau_gem_cpu_prep *req = data;
 	struct drm_gem_object *gem;
-	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-	int ret = -EINVAL;
+	int ret;
 
 	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
 	if (!gem)
 		return -ENOENT;
-	nvbo = nouveau_gem_object(gem);
-
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
+	ret = nouveau_bo_wait(nouveau_gem_object(gem), no_wait);
 	drm_gem_object_unreference_unlocked(gem);
+
 	return ret;
 }
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24  9:54     ` Alexandre Courbot
  (?)
@ 2014-06-24 10:02       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 10:02 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: gnurou, nouveau, linux-kernel, dri-devel, Ben Skeggs,
	linux-tegra, linux-arm-kernel

On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> From: Lucas Stach <dev@lynxeye.de>
> 
> On architectures for which access to GPU memory is non-coherent,
> caches need to be flushed and invalidated explicitly at the
> appropriate places. Introduce two small helpers to make things
> easy for TTM-based drivers.

Have you run this with DMA API debugging enabled?  I suspect you haven't,
and I recommend that you do.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:02       ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 10:02 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding, gnurou,
	nouveau, linux-kernel, dri-devel, linux-tegra, linux-arm-kernel

On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> From: Lucas Stach <dev@lynxeye.de>
> 
> On architectures for which access to GPU memory is non-coherent,
> caches need to be flushed and invalidated explicitly at the
> appropriate places. Introduce two small helpers to make things
> easy for TTM-based drivers.

Have you run this with DMA API debugging enabled?  I suspect you haven't,
and I recommend that you do.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:02       ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> From: Lucas Stach <dev@lynxeye.de>
> 
> On architectures for which access to GPU memory is non-coherent,
> caches need to be flushed and invalidated explicitly at the
> appropriate places. Introduce two small helpers to make things
> easy for TTM-based drivers.

Have you run this with DMA API debugging enabled?  I suspect you haven't,
and I recommend that you do.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 10:02       ` Russell King - ARM Linux
  (?)
@ 2014-06-24 10:33         ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: gnurou, nouveau, linux-kernel, dri-devel, Ben Skeggs,
	linux-tegra, linux-arm-kernel

On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> From: Lucas Stach <dev@lynxeye.de>
>>
>> On architectures for which access to GPU memory is non-coherent,
>> caches need to be flushed and invalidated explicitly at the
>> appropriate places. Introduce two small helpers to make things
>> easy for TTM-based drivers.
>
> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> and I recommend that you do.

# cat /sys/kernel/debug/dma-api/error_count
162621

(╯°□°)╯︵ ┻━┻)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:33         ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding, gnurou,
	nouveau, linux-kernel, dri-devel, linux-tegra, linux-arm-kernel

On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> From: Lucas Stach <dev@lynxeye.de>
>>
>> On architectures for which access to GPU memory is non-coherent,
>> caches need to be flushed and invalidated explicitly at the
>> appropriate places. Introduce two small helpers to make things
>> easy for TTM-based drivers.
>
> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> and I recommend that you do.

# cat /sys/kernel/debug/dma-api/error_count
162621

(╯°□°)╯︵ ┻━┻)


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:33         ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> From: Lucas Stach <dev@lynxeye.de>
>>
>> On architectures for which access to GPU memory is non-coherent,
>> caches need to be flushed and invalidated explicitly at the
>> appropriate places. Introduce two small helpers to make things
>> easy for TTM-based drivers.
>
> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> and I recommend that you do.

# cat /sys/kernel/debug/dma-api/error_count
162621

(??????? ???)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 10:33         ` Alexandre Courbot
  (?)
@ 2014-06-24 10:55             ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:55 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> From: Lucas Stach <dev@lynxeye.de>
>>>
>>> On architectures for which access to GPU memory is non-coherent,
>>> caches need to be flushed and invalidated explicitly at the
>>> appropriate places. Introduce two small helpers to make things
>>> easy for TTM-based drivers.
>>
>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> and I recommend that you do.
>
> # cat /sys/kernel/debug/dma-api/error_count
> 162621
>
> (╯°□°)╯︵ ┻━┻)

*puts table back on its feet*

So, yeah - TTM memory is not allocated using the DMA API, hence we 
cannot use the DMA API to sync it. Thanks Russell for pointing it out.

The only alternative I see here is to flush the CPU caches when syncing 
for the device, and invalidate them for the other direction. Of course 
if the device has caches on its side as well the opposite operation must 
also be done for it. Guess the only way is to handle it all by ourselves 
here. :/
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:55             ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:55 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Airlie, Ben Skeggs, Lucas Stach, Thierry Reding, gnurou,
	nouveau, linux-kernel, dri-devel, linux-tegra, linux-arm-kernel

On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> From: Lucas Stach <dev@lynxeye.de>
>>>
>>> On architectures for which access to GPU memory is non-coherent,
>>> caches need to be flushed and invalidated explicitly at the
>>> appropriate places. Introduce two small helpers to make things
>>> easy for TTM-based drivers.
>>
>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> and I recommend that you do.
>
> # cat /sys/kernel/debug/dma-api/error_count
> 162621
>
> (╯°□°)╯︵ ┻━┻)

*puts table back on its feet*

So, yeah - TTM memory is not allocated using the DMA API, hence we 
cannot use the DMA API to sync it. Thanks Russell for pointing it out.

The only alternative I see here is to flush the CPU caches when syncing 
for the device, and invalidate them for the other direction. Of course 
if the device has caches on its side as well the opposite operation must 
also be done for it. Guess the only way is to handle it all by ourselves 
here. :/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 10:55             ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> From: Lucas Stach <dev@lynxeye.de>
>>>
>>> On architectures for which access to GPU memory is non-coherent,
>>> caches need to be flushed and invalidated explicitly at the
>>> appropriate places. Introduce two small helpers to make things
>>> easy for TTM-based drivers.
>>
>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> and I recommend that you do.
>
> # cat /sys/kernel/debug/dma-api/error_count
> 162621
>
> (??????? ???)

*puts table back on its feet*

So, yeah - TTM memory is not allocated using the DMA API, hence we 
cannot use the DMA API to sync it. Thanks Russell for pointing it out.

The only alternative I see here is to flush the CPU caches when syncing 
for the device, and invalidate them for the other direction. Of course 
if the device has caches on its side as well the opposite operation must 
also be done for it. Guess the only way is to handle it all by ourselves 
here. :/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 10:55             ` Alexandre Courbot
  (?)
@ 2014-06-24 12:23                 ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 12:23 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: David Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Russell King - ARM Linux,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>
>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>
>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>
>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>
>>>> On architectures for which access to GPU memory is non-coherent,
>>>> caches need to be flushed and invalidated explicitly at the
>>>> appropriate places. Introduce two small helpers to make things
>>>> easy for TTM-based drivers.
>>>
>>>
>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> and I recommend that you do.
>>
>>
>> # cat /sys/kernel/debug/dma-api/error_count
>> 162621
>>
>> (╯°□°)╯︵ ┻━┻)
>
>
> *puts table back on its feet*
>
> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> use the DMA API to sync it. Thanks Russell for pointing it out.
>
> The only alternative I see here is to flush the CPU caches when syncing for
> the device, and invalidate them for the other direction. Of course if the
> device has caches on its side as well the opposite operation must also be
> done for it. Guess the only way is to handle it all by ourselves here. :/

... and it really sucks. Basically if we cannot use the DMA API here
we will lose the convenience of having a portable API that does just
the right thing for the underlying platform. Without it we would have
to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
have support for ARM.

The usage of the DMA API that we are doing might be illegal, but in
essence it does exactly what we need - at least for ARM. What are the
alternatives?
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 12:23                 ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 12:23 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Russell King - ARM Linux, David Airlie, Ben Skeggs, Lucas Stach,
	Thierry Reding, nouveau, linux-kernel, dri-devel, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>
>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>
>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>
>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>
>>>> On architectures for which access to GPU memory is non-coherent,
>>>> caches need to be flushed and invalidated explicitly at the
>>>> appropriate places. Introduce two small helpers to make things
>>>> easy for TTM-based drivers.
>>>
>>>
>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> and I recommend that you do.
>>
>>
>> # cat /sys/kernel/debug/dma-api/error_count
>> 162621
>>
>> (╯°□°)╯︵ ┻━┻)
>
>
> *puts table back on its feet*
>
> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> use the DMA API to sync it. Thanks Russell for pointing it out.
>
> The only alternative I see here is to flush the CPU caches when syncing for
> the device, and invalidate them for the other direction. Of course if the
> device has caches on its side as well the opposite operation must also be
> done for it. Guess the only way is to handle it all by ourselves here. :/

... and it really sucks. Basically if we cannot use the DMA API here
we will lose the convenience of having a portable API that does just
the right thing for the underlying platform. Without it we would have
to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
have support for ARM.

The usage of the DMA API that we are doing might be illegal, but in
essence it does exactly what we need - at least for ARM. What are the
alternatives?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 12:23                 ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>
>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>
>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>
>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>
>>>> On architectures for which access to GPU memory is non-coherent,
>>>> caches need to be flushed and invalidated explicitly at the
>>>> appropriate places. Introduce two small helpers to make things
>>>> easy for TTM-based drivers.
>>>
>>>
>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> and I recommend that you do.
>>
>>
>> # cat /sys/kernel/debug/dma-api/error_count
>> 162621
>>
>> (??????? ???)
>
>
> *puts table back on its feet*
>
> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> use the DMA API to sync it. Thanks Russell for pointing it out.
>
> The only alternative I see here is to flush the CPU caches when syncing for
> the device, and invalidate them for the other direction. Of course if the
> device has caches on its side as well the opposite operation must also be
> done for it. Guess the only way is to handle it all by ourselves here. :/

... and it really sucks. Basically if we cannot use the DMA API here
we will lose the convenience of having a portable API that does just
the right thing for the underlying platform. Without it we would have
to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
have support for ARM.

The usage of the DMA API that we are doing might be illegal, but in
essence it does exactly what we need - at least for ARM. What are the
alternatives?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 12:23                 ` Alexandre Courbot
  (?)
@ 2014-06-24 12:27                   ` Maarten Lankhorst
  -1 siblings, 0 replies; 63+ messages in thread
From: Maarten Lankhorst @ 2014-06-24 12:27 UTC (permalink / raw)
  To: Alexandre Courbot, Alexandre Courbot
  Cc: Russell King - ARM Linux, nouveau, linux-kernel, dri-devel,
	Ben Skeggs, linux-tegra, linux-arm-kernel

op 24-06-14 14:23, Alexandre Courbot schreef:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>>
>>>>> On architectures for which access to GPU memory is non-coherent,
>>>>> caches need to be flushed and invalidated explicitly at the
>>>>> appropriate places. Introduce two small helpers to make things
>>>>> easy for TTM-based drivers.
>>>>
>>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>>> and I recommend that you do.
>>>
>>> # cat /sys/kernel/debug/dma-api/error_count
>>> 162621
>>>
>>> (╯°□°)╯︵ ┻━┻)
>>
>> *puts table back on its feet*
>>
>> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> use the DMA API to sync it. Thanks Russell for pointing it out.
>>
>> The only alternative I see here is to flush the CPU caches when syncing for
>> the device, and invalidate them for the other direction. Of course if the
>> device has caches on its side as well the opposite operation must also be
>> done for it. Guess the only way is to handle it all by ourselves here. :/
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
>
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?
Convert TTM to use the dma api? :-)

~Maarten
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 12:27                   ` Maarten Lankhorst
  0 siblings, 0 replies; 63+ messages in thread
From: Maarten Lankhorst @ 2014-06-24 12:27 UTC (permalink / raw)
  To: Alexandre Courbot, Alexandre Courbot
  Cc: David Airlie, nouveau, linux-kernel, dri-devel, Ben Skeggs,
	linux-tegra, Russell King - ARM Linux, linux-arm-kernel

op 24-06-14 14:23, Alexandre Courbot schreef:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>>
>>>>> On architectures for which access to GPU memory is non-coherent,
>>>>> caches need to be flushed and invalidated explicitly at the
>>>>> appropriate places. Introduce two small helpers to make things
>>>>> easy for TTM-based drivers.
>>>>
>>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>>> and I recommend that you do.
>>>
>>> # cat /sys/kernel/debug/dma-api/error_count
>>> 162621
>>>
>>> (╯°□°)╯︵ ┻━┻)
>>
>> *puts table back on its feet*
>>
>> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> use the DMA API to sync it. Thanks Russell for pointing it out.
>>
>> The only alternative I see here is to flush the CPU caches when syncing for
>> the device, and invalidate them for the other direction. Of course if the
>> device has caches on its side as well the opposite operation must also be
>> done for it. Guess the only way is to handle it all by ourselves here. :/
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
>
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?
Convert TTM to use the dma api? :-)

~Maarten

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 12:27                   ` Maarten Lankhorst
  0 siblings, 0 replies; 63+ messages in thread
From: Maarten Lankhorst @ 2014-06-24 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

op 24-06-14 14:23, Alexandre Courbot schreef:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>>>> From: Lucas Stach <dev@lynxeye.de>
>>>>>
>>>>> On architectures for which access to GPU memory is non-coherent,
>>>>> caches need to be flushed and invalidated explicitly at the
>>>>> appropriate places. Introduce two small helpers to make things
>>>>> easy for TTM-based drivers.
>>>>
>>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>>> and I recommend that you do.
>>>
>>> # cat /sys/kernel/debug/dma-api/error_count
>>> 162621
>>>
>>> (??????? ???)
>>
>> *puts table back on its feet*
>>
>> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> use the DMA API to sync it. Thanks Russell for pointing it out.
>>
>> The only alternative I see here is to flush the CPU caches when syncing for
>> the device, and invalidate them for the other direction. Of course if the
>> device has caches on its side as well the opposite operation must also be
>> done for it. Guess the only way is to handle it all by ourselves here. :/
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
>
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?
Convert TTM to use the dma api? :-)

~Maarten

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 12:23                 ` Alexandre Courbot
  (?)
@ 2014-06-24 13:09                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 13:09 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> > The only alternative I see here is to flush the CPU caches when syncing for
> > the device, and invalidate them for the other direction. Of course if the
> > device has caches on its side as well the opposite operation must also be
> > done for it. Guess the only way is to handle it all by ourselves here. :/
> 
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
> 
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?

It may seem /to you/ as a driver developer to be the easiest thing in
the world to abuse an API in a way that it's not supposed to be used,
and it is easy to do that.

However, what you're actually saying is that you don't respect your
fellow kernel developers who have to maintain the other side of that
interface.  When they need to change the implementation of that
interface, what if those changes then screw your abuse of that
interface.

The reason we define the behaviours and properties of APIs is to give
both the user and the implementer of the API some degree of latitude
in how that interface works, so that it can be maintained into the
future.  If abuses (such as these) are allowed, then we've lost,
because the interface can no longer be sanely maintained - especially
if driver authors eventually end up not caring about their pile of
abuse they've created after they've moved on to new wonderful hardware.

With an API such as the DMA API, where we have hundreds, if not a
thousand users of it, this /really/ matters.

We've been here before with the DMA API on older ARM platforms, where
we've had people abusing the API or going beneath the API because "it
does what they need it to", which then makes stuff much harder to change
at architecture level.

Last time it happened, it was when ARMv6 came along and ARM moved away
from VIVT caches.  The options were either to break the crap drivers
and support ARMv6+ CPUs, or keep the crap drivers working and not
support DMA in any shape or form on ARMv6+.

Obviously, this was too important to for one or two abusive drivers to
block, so I changed the architecture level /anyway/ and just said screw
the drivers which end up being broken by their short-sightedness, they
can either rot or someone else can fix them.

I have no beef for intentionally breaking stuff when people abuse well
defined interfaces and/or refuse to discuss their requirements when
interfaces don't quite do what they want - or worse, refuse to listen
to objections.

As I say, it's disrespectful to your fellow kernel developers to abuse
well defined interfaces.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:09                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 13:09 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alexandre Courbot, David Airlie, Ben Skeggs, Lucas Stach,
	Thierry Reding, nouveau, linux-kernel, dri-devel, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> > The only alternative I see here is to flush the CPU caches when syncing for
> > the device, and invalidate them for the other direction. Of course if the
> > device has caches on its side as well the opposite operation must also be
> > done for it. Guess the only way is to handle it all by ourselves here. :/
> 
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
> 
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?

It may seem /to you/ as a driver developer to be the easiest thing in
the world to abuse an API in a way that it's not supposed to be used,
and it is easy to do that.

However, what you're actually saying is that you don't respect your
fellow kernel developers who have to maintain the other side of that
interface.  When they need to change the implementation of that
interface, what if those changes then screw your abuse of that
interface.

The reason we define the behaviours and properties of APIs is to give
both the user and the implementer of the API some degree of latitude
in how that interface works, so that it can be maintained into the
future.  If abuses (such as these) are allowed, then we've lost,
because the interface can no longer be sanely maintained - especially
if driver authors eventually end up not caring about their pile of
abuse they've created after they've moved on to new wonderful hardware.

With an API such as the DMA API, where we have hundreds, if not a
thousand users of it, this /really/ matters.

We've been here before with the DMA API on older ARM platforms, where
we've had people abusing the API or going beneath the API because "it
does what they need it to", which then makes stuff much harder to change
at architecture level.

Last time it happened, it was when ARMv6 came along and ARM moved away
from VIVT caches.  The options were either to break the crap drivers
and support ARMv6+ CPUs, or keep the crap drivers working and not
support DMA in any shape or form on ARMv6+.

Obviously, this was too important to for one or two abusive drivers to
block, so I changed the architecture level /anyway/ and just said screw
the drivers which end up being broken by their short-sightedness, they
can either rot or someone else can fix them.

I have no beef for intentionally breaking stuff when people abuse well
defined interfaces and/or refuse to discuss their requirements when
interfaces don't quite do what they want - or worse, refuse to listen
to objections.

As I say, it's disrespectful to your fellow kernel developers to abuse
well defined interfaces.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:09                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-24 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> > The only alternative I see here is to flush the CPU caches when syncing for
> > the device, and invalidate them for the other direction. Of course if the
> > device has caches on its side as well the opposite operation must also be
> > done for it. Guess the only way is to handle it all by ourselves here. :/
> 
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
> 
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?

It may seem /to you/ as a driver developer to be the easiest thing in
the world to abuse an API in a way that it's not supposed to be used,
and it is easy to do that.

However, what you're actually saying is that you don't respect your
fellow kernel developers who have to maintain the other side of that
interface.  When they need to change the implementation of that
interface, what if those changes then screw your abuse of that
interface.

The reason we define the behaviours and properties of APIs is to give
both the user and the implementer of the API some degree of latitude
in how that interface works, so that it can be maintained into the
future.  If abuses (such as these) are allowed, then we've lost,
because the interface can no longer be sanely maintained - especially
if driver authors eventually end up not caring about their pile of
abuse they've created after they've moved on to new wonderful hardware.

With an API such as the DMA API, where we have hundreds, if not a
thousand users of it, this /really/ matters.

We've been here before with the DMA API on older ARM platforms, where
we've had people abusing the API or going beneath the API because "it
does what they need it to", which then makes stuff much harder to change
at architecture level.

Last time it happened, it was when ARMv6 came along and ARM moved away
from VIVT caches.  The options were either to break the crap drivers
and support ARMv6+ CPUs, or keep the crap drivers working and not
support DMA in any shape or form on ARMv6+.

Obviously, this was too important to for one or two abusive drivers to
block, so I changed the architecture level /anyway/ and just said screw
the drivers which end up being broken by their short-sightedness, they
can either rot or someone else can fix them.

I have no beef for intentionally breaking stuff when people abuse well
defined interfaces and/or refuse to discuss their requirements when
interfaces don't quite do what they want - or worse, refuse to listen
to objections.

As I say, it's disrespectful to your fellow kernel developers to abuse
well defined interfaces.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:09                   ` Russell King - ARM Linux
  (?)
@ 2014-06-24 13:25                       ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:25 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Alexandre Courbot, David Airlie, Ben Skeggs, Lucas Stach,
	Thierry Reding, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 10:09 PM, Russell King - ARM Linux
<linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org> wrote:
> On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
>> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
>> > The only alternative I see here is to flush the CPU caches when syncing for
>> > the device, and invalidate them for the other direction. Of course if the
>> > device has caches on its side as well the opposite operation must also be
>> > done for it. Guess the only way is to handle it all by ourselves here. :/
>>
>> ... and it really sucks. Basically if we cannot use the DMA API here
>> we will lose the convenience of having a portable API that does just
>> the right thing for the underlying platform. Without it we would have
>> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> have support for ARM.
>>
>> The usage of the DMA API that we are doing might be illegal, but in
>> essence it does exactly what we need - at least for ARM. What are the
>> alternatives?
>
> It may seem /to you/ as a driver developer to be the easiest thing in
> the world to abuse an API in a way that it's not supposed to be used,
> and it is easy to do that.
>
> However, what you're actually saying is that you don't respect your
> fellow kernel developers who have to maintain the other side of that
> interface.  When they need to change the implementation of that
> interface, what if those changes then screw your abuse of that
> interface.
>
> The reason we define the behaviours and properties of APIs is to give
> both the user and the implementer of the API some degree of latitude
> in how that interface works, so that it can be maintained into the
> future.  If abuses (such as these) are allowed, then we've lost,
> because the interface can no longer be sanely maintained - especially
> if driver authors eventually end up not caring about their pile of
> abuse they've created after they've moved on to new wonderful hardware.
>
> With an API such as the DMA API, where we have hundreds, if not a
> thousand users of it, this /really/ matters.
>
> We've been here before with the DMA API on older ARM platforms, where
> we've had people abusing the API or going beneath the API because "it
> does what they need it to", which then makes stuff much harder to change
> at architecture level.
>
> Last time it happened, it was when ARMv6 came along and ARM moved away
> from VIVT caches.  The options were either to break the crap drivers
> and support ARMv6+ CPUs, or keep the crap drivers working and not
> support DMA in any shape or form on ARMv6+.
>
> Obviously, this was too important to for one or two abusive drivers to
> block, so I changed the architecture level /anyway/ and just said screw
> the drivers which end up being broken by their short-sightedness, they
> can either rot or someone else can fix them.
>
> I have no beef for intentionally breaking stuff when people abuse well
> defined interfaces and/or refuse to discuss their requirements when
> interfaces don't quite do what they want - or worse, refuse to listen
> to objections.
>
> As I say, it's disrespectful to your fellow kernel developers to abuse
> well defined interfaces.

Apologies if I sounded that way - I wasn't suggesting that we carry on
with this clearly illegal usage of the DMA API, but was merely noting
that if we were to implement the intended behavior on our side it
would look just like what the ARM implementation currently does. My
question about alternatives wasn't rethorical, I am not so familiar
with this part of the system (as should be obvious by now) and would
like to know whether there aren't other solutions that would spare us
the need to re-implement something that already exists.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:25                       ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:25 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Alexandre Courbot, David Airlie, Ben Skeggs, Lucas Stach,
	Thierry Reding, nouveau, linux-kernel, dri-devel, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 10:09 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
>> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> > The only alternative I see here is to flush the CPU caches when syncing for
>> > the device, and invalidate them for the other direction. Of course if the
>> > device has caches on its side as well the opposite operation must also be
>> > done for it. Guess the only way is to handle it all by ourselves here. :/
>>
>> ... and it really sucks. Basically if we cannot use the DMA API here
>> we will lose the convenience of having a portable API that does just
>> the right thing for the underlying platform. Without it we would have
>> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> have support for ARM.
>>
>> The usage of the DMA API that we are doing might be illegal, but in
>> essence it does exactly what we need - at least for ARM. What are the
>> alternatives?
>
> It may seem /to you/ as a driver developer to be the easiest thing in
> the world to abuse an API in a way that it's not supposed to be used,
> and it is easy to do that.
>
> However, what you're actually saying is that you don't respect your
> fellow kernel developers who have to maintain the other side of that
> interface.  When they need to change the implementation of that
> interface, what if those changes then screw your abuse of that
> interface.
>
> The reason we define the behaviours and properties of APIs is to give
> both the user and the implementer of the API some degree of latitude
> in how that interface works, so that it can be maintained into the
> future.  If abuses (such as these) are allowed, then we've lost,
> because the interface can no longer be sanely maintained - especially
> if driver authors eventually end up not caring about their pile of
> abuse they've created after they've moved on to new wonderful hardware.
>
> With an API such as the DMA API, where we have hundreds, if not a
> thousand users of it, this /really/ matters.
>
> We've been here before with the DMA API on older ARM platforms, where
> we've had people abusing the API or going beneath the API because "it
> does what they need it to", which then makes stuff much harder to change
> at architecture level.
>
> Last time it happened, it was when ARMv6 came along and ARM moved away
> from VIVT caches.  The options were either to break the crap drivers
> and support ARMv6+ CPUs, or keep the crap drivers working and not
> support DMA in any shape or form on ARMv6+.
>
> Obviously, this was too important to for one or two abusive drivers to
> block, so I changed the architecture level /anyway/ and just said screw
> the drivers which end up being broken by their short-sightedness, they
> can either rot or someone else can fix them.
>
> I have no beef for intentionally breaking stuff when people abuse well
> defined interfaces and/or refuse to discuss their requirements when
> interfaces don't quite do what they want - or worse, refuse to listen
> to objections.
>
> As I say, it's disrespectful to your fellow kernel developers to abuse
> well defined interfaces.

Apologies if I sounded that way - I wasn't suggesting that we carry on
with this clearly illegal usage of the DMA API, but was merely noting
that if we were to implement the intended behavior on our side it
would look just like what the ARM implementation currently does. My
question about alternatives wasn't rethorical, I am not so familiar
with this part of the system (as should be obvious by now) and would
like to know whether there aren't other solutions that would spare us
the need to re-implement something that already exists.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:25                       ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 10:09 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Jun 24, 2014 at 09:23:05PM +0900, Alexandre Courbot wrote:
>> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> > The only alternative I see here is to flush the CPU caches when syncing for
>> > the device, and invalidate them for the other direction. Of course if the
>> > device has caches on its side as well the opposite operation must also be
>> > done for it. Guess the only way is to handle it all by ourselves here. :/
>>
>> ... and it really sucks. Basically if we cannot use the DMA API here
>> we will lose the convenience of having a portable API that does just
>> the right thing for the underlying platform. Without it we would have
>> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> have support for ARM.
>>
>> The usage of the DMA API that we are doing might be illegal, but in
>> essence it does exactly what we need - at least for ARM. What are the
>> alternatives?
>
> It may seem /to you/ as a driver developer to be the easiest thing in
> the world to abuse an API in a way that it's not supposed to be used,
> and it is easy to do that.
>
> However, what you're actually saying is that you don't respect your
> fellow kernel developers who have to maintain the other side of that
> interface.  When they need to change the implementation of that
> interface, what if those changes then screw your abuse of that
> interface.
>
> The reason we define the behaviours and properties of APIs is to give
> both the user and the implementer of the API some degree of latitude
> in how that interface works, so that it can be maintained into the
> future.  If abuses (such as these) are allowed, then we've lost,
> because the interface can no longer be sanely maintained - especially
> if driver authors eventually end up not caring about their pile of
> abuse they've created after they've moved on to new wonderful hardware.
>
> With an API such as the DMA API, where we have hundreds, if not a
> thousand users of it, this /really/ matters.
>
> We've been here before with the DMA API on older ARM platforms, where
> we've had people abusing the API or going beneath the API because "it
> does what they need it to", which then makes stuff much harder to change
> at architecture level.
>
> Last time it happened, it was when ARMv6 came along and ARM moved away
> from VIVT caches.  The options were either to break the crap drivers
> and support ARMv6+ CPUs, or keep the crap drivers working and not
> support DMA in any shape or form on ARMv6+.
>
> Obviously, this was too important to for one or two abusive drivers to
> block, so I changed the architecture level /anyway/ and just said screw
> the drivers which end up being broken by their short-sightedness, they
> can either rot or someone else can fix them.
>
> I have no beef for intentionally breaking stuff when people abuse well
> defined interfaces and/or refuse to discuss their requirements when
> interfaces don't quite do what they want - or worse, refuse to listen
> to objections.
>
> As I say, it's disrespectful to your fellow kernel developers to abuse
> well defined interfaces.

Apologies if I sounded that way - I wasn't suggesting that we carry on
with this clearly illegal usage of the DMA API, but was merely noting
that if we were to implement the intended behavior on our side it
would look just like what the ARM implementation currently does. My
question about alternatives wasn't rethorical, I am not so familiar
with this part of the system (as should be obvious by now) and would
like to know whether there aren't other solutions that would spare us
the need to re-implement something that already exists.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 12:27                   ` Maarten Lankhorst
  (?)
@ 2014-06-24 13:25                     ` Lucas Stach
  -1 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:25 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Alexandre Courbot, Russell King - ARM Linux, nouveau,
	linux-kernel, dri-devel, Alexandre Courbot, Ben Skeggs,
	linux-tegra, linux-arm-kernel

Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> op 24-06-14 14:23, Alexandre Courbot schreef:
> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >>>>>
> >>>>> On architectures for which access to GPU memory is non-coherent,
> >>>>> caches need to be flushed and invalidated explicitly at the
> >>>>> appropriate places. Introduce two small helpers to make things
> >>>>> easy for TTM-based drivers.
> >>>>
> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >>>> and I recommend that you do.
> >>>
> >>> # cat /sys/kernel/debug/dma-api/error_count
> >>> 162621
> >>>
> >>> (╯°□°)╯︵ ┻━┻)
> >>
> >> *puts table back on its feet*
> >>
> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >>
> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> the device, and invalidate them for the other direction. Of course if the
> >> device has caches on its side as well the opposite operation must also be
> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> > ... and it really sucks. Basically if we cannot use the DMA API here
> > we will lose the convenience of having a portable API that does just
> > the right thing for the underlying platform. Without it we would have
> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> > have support for ARM.
> >
> > The usage of the DMA API that we are doing might be illegal, but in
> > essence it does exactly what we need - at least for ARM. What are the
> > alternatives?
> Convert TTM to use the dma api? :-)

Actually TTM already has a page alloc backend using the DMA API. It's
just not used for the standard case right now.

I would argue that we should just use this page allocator (which has the
side effect of getting pages from CMA if available -> you are actually
free to change the caching) and do away with the other allocator in the
ARM case.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:25                     ` Lucas Stach
  0 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:25 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Alexandre Courbot, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> op 24-06-14 14:23, Alexandre Courbot schreef:
> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >>>>>
> >>>>> On architectures for which access to GPU memory is non-coherent,
> >>>>> caches need to be flushed and invalidated explicitly at the
> >>>>> appropriate places. Introduce two small helpers to make things
> >>>>> easy for TTM-based drivers.
> >>>>
> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >>>> and I recommend that you do.
> >>>
> >>> # cat /sys/kernel/debug/dma-api/error_count
> >>> 162621
> >>>
> >>> (╯°□°)╯︵ ┻━┻)
> >>
> >> *puts table back on its feet*
> >>
> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >>
> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> the device, and invalidate them for the other direction. Of course if the
> >> device has caches on its side as well the opposite operation must also be
> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> > ... and it really sucks. Basically if we cannot use the DMA API here
> > we will lose the convenience of having a portable API that does just
> > the right thing for the underlying platform. Without it we would have
> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> > have support for ARM.
> >
> > The usage of the DMA API that we are doing might be illegal, but in
> > essence it does exactly what we need - at least for ARM. What are the
> > alternatives?
> Convert TTM to use the dma api? :-)

Actually TTM already has a page alloc backend using the DMA API. It's
just not used for the standard case right now.

I would argue that we should just use this page allocator (which has the
side effect of getting pages from CMA if available -> you are actually
free to change the caching) and do away with the other allocator in the
ARM case.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:25                     ` Lucas Stach
  0 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:25 UTC (permalink / raw)
  To: linux-arm-kernel

Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> op 24-06-14 14:23, Alexandre Courbot schreef:
> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >>>>>
> >>>>> On architectures for which access to GPU memory is non-coherent,
> >>>>> caches need to be flushed and invalidated explicitly at the
> >>>>> appropriate places. Introduce two small helpers to make things
> >>>>> easy for TTM-based drivers.
> >>>>
> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >>>> and I recommend that you do.
> >>>
> >>> # cat /sys/kernel/debug/dma-api/error_count
> >>> 162621
> >>>
> >>> (??????? ???)
> >>
> >> *puts table back on its feet*
> >>
> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >>
> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> the device, and invalidate them for the other direction. Of course if the
> >> device has caches on its side as well the opposite operation must also be
> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> > ... and it really sucks. Basically if we cannot use the DMA API here
> > we will lose the convenience of having a portable API that does just
> > the right thing for the underlying platform. Without it we would have
> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> > have support for ARM.
> >
> > The usage of the DMA API that we are doing might be illegal, but in
> > essence it does exactly what we need - at least for ARM. What are the
> > alternatives?
> Convert TTM to use the dma api? :-)

Actually TTM already has a page alloc backend using the DMA API. It's
just not used for the standard case right now.

I would argue that we should just use this page allocator (which has the
side effect of getting pages from CMA if available -> you are actually
free to change the caching) and do away with the other allocator in the
ARM case.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:25                     ` Lucas Stach
  (?)
@ 2014-06-24 13:52                         ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:52 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Russell King - ARM Linux,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (╯°□°)╯︵ ┻━┻)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.

Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
set apparently.

> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

Mm? Does it mean that CMA memory is not mapped into lowmem? That would
certainly help in the present case, but I wonder how useful it will be
once the iommu support is in place. Will also need to consider
performance of such coherent memory for e.g. user-space mappings.

Anyway, I will experiment a bit with this tomorrow, thanks!
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:52                         ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:52 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (╯°□°)╯︵ ┻━┻)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.

Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
set apparently.

> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

Mm? Does it mean that CMA memory is not mapped into lowmem? That would
certainly help in the present case, but I wonder how useful it will be
once the iommu support is in place. Will also need to consider
performance of such coherent memory for e.g. user-space mappings.

Anyway, I will experiment a bit with this tomorrow, thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:52                         ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 13:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (??????? ???)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.

Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
set apparently.

> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

Mm? Does it mean that CMA memory is not mapped into lowmem? That would
certainly help in the present case, but I wonder how useful it will be
once the iommu support is in place. Will also need to consider
performance of such coherent memory for e.g. user-space mappings.

Anyway, I will experiment a bit with this tomorrow, thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:52                         ` Alexandre Courbot
  (?)
@ 2014-06-24 13:58                           ` Lucas Stach
  -1 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:58 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Russell King - ARM Linux, nouveau, linux-kernel, dri-devel,
	linux-tegra, Alexandre Courbot, Ben Skeggs, Maarten Lankhorst,
	linux-arm-kernel

Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> >> op 24-06-14 14:23, Alexandre Courbot schreef:
> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >> >>>>>
> >> >>>>> On architectures for which access to GPU memory is non-coherent,
> >> >>>>> caches need to be flushed and invalidated explicitly at the
> >> >>>>> appropriate places. Introduce two small helpers to make things
> >> >>>>> easy for TTM-based drivers.
> >> >>>>
> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >> >>>> and I recommend that you do.
> >> >>>
> >> >>> # cat /sys/kernel/debug/dma-api/error_count
> >> >>> 162621
> >> >>>
> >> >>> (╯°□°)╯︵ ┻━┻)
> >> >>
> >> >> *puts table back on its feet*
> >> >>
> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >> >>
> >> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> >> the device, and invalidate them for the other direction. Of course if the
> >> >> device has caches on its side as well the opposite operation must also be
> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> >> > ... and it really sucks. Basically if we cannot use the DMA API here
> >> > we will lose the convenience of having a portable API that does just
> >> > the right thing for the underlying platform. Without it we would have
> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> >> > have support for ARM.
> >> >
> >> > The usage of the DMA API that we are doing might be illegal, but in
> >> > essence it does exactly what we need - at least for ARM. What are the
> >> > alternatives?
> >> Convert TTM to use the dma api? :-)
> >
> > Actually TTM already has a page alloc backend using the DMA API. It's
> > just not used for the standard case right now.
> 
> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
> set apparently.
> 
> > I would argue that we should just use this page allocator (which has the
> > side effect of getting pages from CMA if available -> you are actually
> > free to change the caching) and do away with the other allocator in the
> > ARM case.
> 
> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
> certainly help in the present case, but I wonder how useful it will be
> once the iommu support is in place. Will also need to consider
> performance of such coherent memory for e.g. user-space mappings.
> 
> Anyway, I will experiment a bit with this tomorrow, thanks!

CMA memory is reserved before the lowmem section mapping is set up. It
is then mapped with individual 4k pages before giving it back to the
buddy allocator.
This means CMA pages in use by the kernel are mapped into lowmem, but
they are actually unmapped from lowmem once you allocate them as DMA
memory.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:58                           ` Lucas Stach
  0 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:58 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> >> op 24-06-14 14:23, Alexandre Courbot schreef:
> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >> >>>>>
> >> >>>>> On architectures for which access to GPU memory is non-coherent,
> >> >>>>> caches need to be flushed and invalidated explicitly at the
> >> >>>>> appropriate places. Introduce two small helpers to make things
> >> >>>>> easy for TTM-based drivers.
> >> >>>>
> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >> >>>> and I recommend that you do.
> >> >>>
> >> >>> # cat /sys/kernel/debug/dma-api/error_count
> >> >>> 162621
> >> >>>
> >> >>> (╯°□°)╯︵ ┻━┻)
> >> >>
> >> >> *puts table back on its feet*
> >> >>
> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >> >>
> >> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> >> the device, and invalidate them for the other direction. Of course if the
> >> >> device has caches on its side as well the opposite operation must also be
> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> >> > ... and it really sucks. Basically if we cannot use the DMA API here
> >> > we will lose the convenience of having a portable API that does just
> >> > the right thing for the underlying platform. Without it we would have
> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> >> > have support for ARM.
> >> >
> >> > The usage of the DMA API that we are doing might be illegal, but in
> >> > essence it does exactly what we need - at least for ARM. What are the
> >> > alternatives?
> >> Convert TTM to use the dma api? :-)
> >
> > Actually TTM already has a page alloc backend using the DMA API. It's
> > just not used for the standard case right now.
> 
> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
> set apparently.
> 
> > I would argue that we should just use this page allocator (which has the
> > side effect of getting pages from CMA if available -> you are actually
> > free to change the caching) and do away with the other allocator in the
> > ARM case.
> 
> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
> certainly help in the present case, but I wonder how useful it will be
> once the iommu support is in place. Will also need to consider
> performance of such coherent memory for e.g. user-space mappings.
> 
> Anyway, I will experiment a bit with this tomorrow, thanks!

CMA memory is reserved before the lowmem section mapping is set up. It
is then mapped with individual 4k pages before giving it back to the
buddy allocator.
This means CMA pages in use by the kernel are mapped into lowmem, but
they are actually unmapped from lowmem once you allocate them as DMA
memory.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 13:58                           ` Lucas Stach
  0 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2014-06-24 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
> >> op 24-06-14 14:23, Alexandre Courbot schreef:
> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
> >> >>>>>
> >> >>>>> On architectures for which access to GPU memory is non-coherent,
> >> >>>>> caches need to be flushed and invalidated explicitly at the
> >> >>>>> appropriate places. Introduce two small helpers to make things
> >> >>>>> easy for TTM-based drivers.
> >> >>>>
> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
> >> >>>> and I recommend that you do.
> >> >>>
> >> >>> # cat /sys/kernel/debug/dma-api/error_count
> >> >>> 162621
> >> >>>
> >> >>> (??????? ???)
> >> >>
> >> >> *puts table back on its feet*
> >> >>
> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >> >>
> >> >> The only alternative I see here is to flush the CPU caches when syncing for
> >> >> the device, and invalidate them for the other direction. Of course if the
> >> >> device has caches on its side as well the opposite operation must also be
> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
> >> > ... and it really sucks. Basically if we cannot use the DMA API here
> >> > we will lose the convenience of having a portable API that does just
> >> > the right thing for the underlying platform. Without it we would have
> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> >> > have support for ARM.
> >> >
> >> > The usage of the DMA API that we are doing might be illegal, but in
> >> > essence it does exactly what we need - at least for ARM. What are the
> >> > alternatives?
> >> Convert TTM to use the dma api? :-)
> >
> > Actually TTM already has a page alloc backend using the DMA API. It's
> > just not used for the standard case right now.
> 
> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
> set apparently.
> 
> > I would argue that we should just use this page allocator (which has the
> > side effect of getting pages from CMA if available -> you are actually
> > free to change the caching) and do away with the other allocator in the
> > ARM case.
> 
> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
> certainly help in the present case, but I wonder how useful it will be
> once the iommu support is in place. Will also need to consider
> performance of such coherent memory for e.g. user-space mappings.
> 
> Anyway, I will experiment a bit with this tomorrow, thanks!

CMA memory is reserved before the lowmem section mapping is set up. It
is then mapped with individual 4k pages before giving it back to the
buddy allocator.
This means CMA pages in use by the kernel are mapped into lowmem, but
they are actually unmapped from lowmem once you allocate them as DMA
memory.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:58                           ` Lucas Stach
  (?)
@ 2014-06-24 14:03                               ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 14:03 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Russell King - ARM Linux,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (╯°□°)╯︵ ┻━┻)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Thanks for the explanation. I really need to spend more time studying
the DMA allocator. I wonder if all this is already explained somewhere
in Documentation/ ?
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 14:03                               ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 14:03 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (╯°□°)╯︵ ┻━┻)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Thanks for the explanation. I really need to spend more time studying
the DMA allocator. I wonder if all this is already explained somewhere
in Documentation/ ?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-24 14:03                               ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-24 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (??????? ???)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Thanks for the explanation. I really need to spend more time studying
the DMA allocator. I wonder if all this is already explained somewhere
in Documentation/ ?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:25                     ` Lucas Stach
  (?)
@ 2014-06-25  4:00                         ` Stéphane Marchesin
  -1 siblings, 0 replies; 63+ messages in thread
From: Stéphane Marchesin @ 2014-06-25  4:00 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Russell King - ARM Linux,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (╯°□°)╯︵ ┻━┻)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.
>
> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

CMA comes with its own set of (severe) limitations though, in
particular it's not possible to map arbitrary CPU pages into the GPU
without incurring a copy, you add arbitrary memory limits etc. Overall
that's not really a good pick for the long term...

Stéphane
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-25  4:00                         ` Stéphane Marchesin
  0 siblings, 0 replies; 63+ messages in thread
From: Stéphane Marchesin @ 2014-06-25  4:00 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (╯°□°)╯︵ ┻━┻)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.
>
> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

CMA comes with its own set of (severe) limitations though, in
particular it's not possible to map arbitrary CPU pages into the GPU
without incurring a copy, you add arbitrary memory limits etc. Overall
that's not really a good pick for the long term...

Stéphane

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-25  4:00                         ` Stéphane Marchesin
  0 siblings, 0 replies; 63+ messages in thread
From: Stéphane Marchesin @ 2014-06-25  4:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >>>>> caches need to be flushed and invalidated explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (??????? ???)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> the device, and invalidate them for the other direction. Of course if the
>> >> device has caches on its side as well the opposite operation must also be
>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API here
>> > we will lose the convenience of having a portable API that does just
>> > the right thing for the underlying platform. Without it we would have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but in
>> > essence it does exactly what we need - at least for ARM. What are the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.
>
> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.

CMA comes with its own set of (severe) limitations though, in
particular it's not possible to map arbitrary CPU pages into the GPU
without incurring a copy, you add arbitrary memory limits etc. Overall
that's not really a good pick for the long term...

St?phane

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-24 13:58                           ` Lucas Stach
  (?)
@ 2014-06-26 14:50                               ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:50 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (╯°□°)╯︵ ┻━┻)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Tried enabling the DMA page allocation for GK20A. The great news is
that with it caching works as expected and that DMA API debugging does
not complain anymore when calling the sync functions. Actually, since
the DMA page allocator returns coherent memory, there is no need for
these sync functions anymore, which makes things easier. This seems to
be the simplest way towards enabling GK20A - albeit performance
suffers a little, but we can revisit that later once we have IOMMU
support.

I would not claim that we are fully compliant with what the DMA API
expects though. For instance, pages allocated for TTM via
dma_alloc_coherent() are later re-mapped in ttm_bo_kmap_ttm() using
vmap(), potentially with a different pgprot. The waste of address
space produced by these two simultaneous mappings aside, is this even
allowed? And when it comes to mapping these pages to user-space, TTM
does it without calling dma_mmap_coherent(), and again with whatever
flags we give it. IIUC memory allocated through the DMA API should
only be touched through the vaddr returned by dma_alloc_coherent(), or
through mappings provided by other DMA API functions. So is TTM
misusing the DMA API here?

In general, should we still be afraid of non-identical mappings on
modern CPUs like the A15 found in Tegra K1? I have heard contradictory
information so far and would really like to be able to understand this
once and for all, as it would give us more choices as for which memory
provider we can use.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 14:50                               ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:50 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Maarten Lankhorst, Alexandre Courbot, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (╯°□°)╯︵ ┻━┻)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Tried enabling the DMA page allocation for GK20A. The great news is
that with it caching works as expected and that DMA API debugging does
not complain anymore when calling the sync functions. Actually, since
the DMA page allocator returns coherent memory, there is no need for
these sync functions anymore, which makes things easier. This seems to
be the simplest way towards enabling GK20A - albeit performance
suffers a little, but we can revisit that later once we have IOMMU
support.

I would not claim that we are fully compliant with what the DMA API
expects though. For instance, pages allocated for TTM via
dma_alloc_coherent() are later re-mapped in ttm_bo_kmap_ttm() using
vmap(), potentially with a different pgprot. The waste of address
space produced by these two simultaneous mappings aside, is this even
allowed? And when it comes to mapping these pages to user-space, TTM
does it without calling dma_mmap_coherent(), and again with whatever
flags we give it. IIUC memory allocated through the DMA API should
only be touched through the vaddr returned by dma_alloc_coherent(), or
through mappings provided by other DMA API functions. So is TTM
misusing the DMA API here?

In general, should we still be afraid of non-identical mappings on
modern CPUs like the A15 found in Tegra K1? I have heard contradictory
information so far and would really like to be able to understand this
once and for all, as it would give us more choices as for which memory
provider we can use.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 14:50                               ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 24, 2014 at 10:58 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 24.06.2014, 22:52 +0900 schrieb Alexandre Courbot:
>> On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> >> op 24-06-14 14:23, Alexandre Courbot schreef:
>> >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>> >> >>>>> From: Lucas Stach <dev@lynxeye.de>
>> >> >>>>>
>> >> >>>>> On architectures for which access to GPU memory is non-coherent,
>> >> >>>>> caches need to be flushed and invalidated explicitly at the
>> >> >>>>> appropriate places. Introduce two small helpers to make things
>> >> >>>>> easy for TTM-based drivers.
>> >> >>>>
>> >> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>> >> >>>> and I recommend that you do.
>> >> >>>
>> >> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >> >>> 162621
>> >> >>>
>> >> >>> (??????? ???)
>> >> >>
>> >> >> *puts table back on its feet*
>> >> >>
>> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> >> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>> >> >>
>> >> >> The only alternative I see here is to flush the CPU caches when syncing for
>> >> >> the device, and invalidate them for the other direction. Of course if the
>> >> >> device has caches on its side as well the opposite operation must also be
>> >> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>> >> > ... and it really sucks. Basically if we cannot use the DMA API here
>> >> > we will lose the convenience of having a portable API that does just
>> >> > the right thing for the underlying platform. Without it we would have
>> >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>> >> > have support for ARM.
>> >> >
>> >> > The usage of the DMA API that we are doing might be illegal, but in
>> >> > essence it does exactly what we need - at least for ARM. What are the
>> >> > alternatives?
>> >> Convert TTM to use the dma api? :-)
>> >
>> > Actually TTM already has a page alloc backend using the DMA API. It's
>> > just not used for the standard case right now.
>>
>> Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
>> set apparently.
>>
>> > I would argue that we should just use this page allocator (which has the
>> > side effect of getting pages from CMA if available -> you are actually
>> > free to change the caching) and do away with the other allocator in the
>> > ARM case.
>>
>> Mm? Does it mean that CMA memory is not mapped into lowmem? That would
>> certainly help in the present case, but I wonder how useful it will be
>> once the iommu support is in place. Will also need to consider
>> performance of such coherent memory for e.g. user-space mappings.
>>
>> Anyway, I will experiment a bit with this tomorrow, thanks!
>
> CMA memory is reserved before the lowmem section mapping is set up. It
> is then mapped with individual 4k pages before giving it back to the
> buddy allocator.
> This means CMA pages in use by the kernel are mapped into lowmem, but
> they are actually unmapped from lowmem once you allocate them as DMA
> memory.

Tried enabling the DMA page allocation for GK20A. The great news is
that with it caching works as expected and that DMA API debugging does
not complain anymore when calling the sync functions. Actually, since
the DMA page allocator returns coherent memory, there is no need for
these sync functions anymore, which makes things easier. This seems to
be the simplest way towards enabling GK20A - albeit performance
suffers a little, but we can revisit that later once we have IOMMU
support.

I would not claim that we are fully compliant with what the DMA API
expects though. For instance, pages allocated for TTM via
dma_alloc_coherent() are later re-mapped in ttm_bo_kmap_ttm() using
vmap(), potentially with a different pgprot. The waste of address
space produced by these two simultaneous mappings aside, is this even
allowed? And when it comes to mapping these pages to user-space, TTM
does it without calling dma_mmap_coherent(), and again with whatever
flags we give it. IIUC memory allocated through the DMA API should
only be touched through the vaddr returned by dma_alloc_coherent(), or
through mappings provided by other DMA API functions. So is TTM
misusing the DMA API here?

In general, should we still be afraid of non-identical mappings on
modern CPUs like the A15 found in Tegra K1? I have heard contradictory
information so far and would really like to be able to understand this
once and for all, as it would give us more choices as for which memory
provider we can use.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-25  4:00                         ` Stéphane Marchesin
  (?)
@ 2014-06-26 14:53                             ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:53 UTC (permalink / raw)
  To: Stéphane Marchesin
  Cc: Russell King - ARM Linux,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

On Wed, Jun 25, 2014 at 1:00 PM, Stéphane Marchesin
<stephane.marchesin@gmail.com> wrote:
> On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>>> op 24-06-14 14:23, Alexandre Courbot schreef:
>>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>>> >>>>>
>>> >>>>> On architectures for which access to GPU memory is non-coherent,
>>> >>>>> caches need to be flushed and invalidated explicitly at the
>>> >>>>> appropriate places. Introduce two small helpers to make things
>>> >>>>> easy for TTM-based drivers.
>>> >>>>
>>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> >>>> and I recommend that you do.
>>> >>>
>>> >>> # cat /sys/kernel/debug/dma-api/error_count
>>> >>> 162621
>>> >>>
>>> >>> (╯°□°)╯︵ ┻━┻)
>>> >>
>>> >> *puts table back on its feet*
>>> >>
>>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>>> >>
>>> >> The only alternative I see here is to flush the CPU caches when syncing for
>>> >> the device, and invalidate them for the other direction. Of course if the
>>> >> device has caches on its side as well the opposite operation must also be
>>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>>> > ... and it really sucks. Basically if we cannot use the DMA API here
>>> > we will lose the convenience of having a portable API that does just
>>> > the right thing for the underlying platform. Without it we would have
>>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>>> > have support for ARM.
>>> >
>>> > The usage of the DMA API that we are doing might be illegal, but in
>>> > essence it does exactly what we need - at least for ARM. What are the
>>> > alternatives?
>>> Convert TTM to use the dma api? :-)
>>
>> Actually TTM already has a page alloc backend using the DMA API. It's
>> just not used for the standard case right now.
>>
>> I would argue that we should just use this page allocator (which has the
>> side effect of getting pages from CMA if available -> you are actually
>> free to change the caching) and do away with the other allocator in the
>> ARM case.
>
> CMA comes with its own set of (severe) limitations though, in
> particular it's not possible to map arbitrary CPU pages into the GPU
> without incurring a copy, you add arbitrary memory limits etc. Overall
> that's not really a good pick for the long term...

We don't plan to rely on CMA for too long. IOMMU support is on the way
and should make our life easier, although no matter the source of
memory, we will still have the issue of the lowmem mappings. So far it
sounds like CMA is the only way to "undo" them, so in the end it may
come down to whether or not the multi-mapping contraint applies to
TK1. I will tap into our internal sources of knowledge to try and
figure this one out.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 14:53                             ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:53 UTC (permalink / raw)
  To: Stéphane Marchesin
  Cc: Lucas Stach, Maarten Lankhorst, Russell King - ARM Linux,
	nouveau, linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Wed, Jun 25, 2014 at 1:00 PM, Stéphane Marchesin
<stephane.marchesin@gmail.com> wrote:
> On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>>> op 24-06-14 14:23, Alexandre Courbot schreef:
>>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>>> >>>>>
>>> >>>>> On architectures for which access to GPU memory is non-coherent,
>>> >>>>> caches need to be flushed and invalidated explicitly at the
>>> >>>>> appropriate places. Introduce two small helpers to make things
>>> >>>>> easy for TTM-based drivers.
>>> >>>>
>>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> >>>> and I recommend that you do.
>>> >>>
>>> >>> # cat /sys/kernel/debug/dma-api/error_count
>>> >>> 162621
>>> >>>
>>> >>> (╯°□°)╯︵ ┻━┻)
>>> >>
>>> >> *puts table back on its feet*
>>> >>
>>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>>> >>
>>> >> The only alternative I see here is to flush the CPU caches when syncing for
>>> >> the device, and invalidate them for the other direction. Of course if the
>>> >> device has caches on its side as well the opposite operation must also be
>>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>>> > ... and it really sucks. Basically if we cannot use the DMA API here
>>> > we will lose the convenience of having a portable API that does just
>>> > the right thing for the underlying platform. Without it we would have
>>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>>> > have support for ARM.
>>> >
>>> > The usage of the DMA API that we are doing might be illegal, but in
>>> > essence it does exactly what we need - at least for ARM. What are the
>>> > alternatives?
>>> Convert TTM to use the dma api? :-)
>>
>> Actually TTM already has a page alloc backend using the DMA API. It's
>> just not used for the standard case right now.
>>
>> I would argue that we should just use this page allocator (which has the
>> side effect of getting pages from CMA if available -> you are actually
>> free to change the caching) and do away with the other allocator in the
>> ARM case.
>
> CMA comes with its own set of (severe) limitations though, in
> particular it's not possible to map arbitrary CPU pages into the GPU
> without incurring a copy, you add arbitrary memory limits etc. Overall
> that's not really a good pick for the long term...

We don't plan to rely on CMA for too long. IOMMU support is on the way
and should make our life easier, although no matter the source of
memory, we will still have the issue of the lowmem mappings. So far it
sounds like CMA is the only way to "undo" them, so in the end it may
come down to whether or not the multi-mapping contraint applies to
TK1. I will tap into our internal sources of knowledge to try and
figure this one out.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 14:53                             ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 14:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 25, 2014 at 1:00 PM, St?phane Marchesin
<stephane.marchesin@gmail.com> wrote:
> On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>>> op 24-06-14 14:23, Alexandre Courbot schreef:
>>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
>>> >>>>> From: Lucas Stach <dev@lynxeye.de>
>>> >>>>>
>>> >>>>> On architectures for which access to GPU memory is non-coherent,
>>> >>>>> caches need to be flushed and invalidated explicitly at the
>>> >>>>> appropriate places. Introduce two small helpers to make things
>>> >>>>> easy for TTM-based drivers.
>>> >>>>
>>> >>>> Have you run this with DMA API debugging enabled?  I suspect you haven't,
>>> >>>> and I recommend that you do.
>>> >>>
>>> >>> # cat /sys/kernel/debug/dma-api/error_count
>>> >>> 162621
>>> >>>
>>> >>> (??????? ???)
>>> >>
>>> >> *puts table back on its feet*
>>> >>
>>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>>> >> use the DMA API to sync it. Thanks Russell for pointing it out.
>>> >>
>>> >> The only alternative I see here is to flush the CPU caches when syncing for
>>> >> the device, and invalidate them for the other direction. Of course if the
>>> >> device has caches on its side as well the opposite operation must also be
>>> >> done for it. Guess the only way is to handle it all by ourselves here. :/
>>> > ... and it really sucks. Basically if we cannot use the DMA API here
>>> > we will lose the convenience of having a portable API that does just
>>> > the right thing for the underlying platform. Without it we would have
>>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
>>> > have support for ARM.
>>> >
>>> > The usage of the DMA API that we are doing might be illegal, but in
>>> > essence it does exactly what we need - at least for ARM. What are the
>>> > alternatives?
>>> Convert TTM to use the dma api? :-)
>>
>> Actually TTM already has a page alloc backend using the DMA API. It's
>> just not used for the standard case right now.
>>
>> I would argue that we should just use this page allocator (which has the
>> side effect of getting pages from CMA if available -> you are actually
>> free to change the caching) and do away with the other allocator in the
>> ARM case.
>
> CMA comes with its own set of (severe) limitations though, in
> particular it's not possible to map arbitrary CPU pages into the GPU
> without incurring a copy, you add arbitrary memory limits etc. Overall
> that's not really a good pick for the long term...

We don't plan to rely on CMA for too long. IOMMU support is on the way
and should make our life easier, although no matter the source of
memory, we will still have the issue of the lowmem mappings. So far it
sounds like CMA is the only way to "undo" them, so in the end it may
come down to whether or not the multi-mapping contraint applies to
TK1. I will tap into our internal sources of knowledge to try and
figure this one out.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-26 14:53                             ` Alexandre Courbot
  (?)
@ 2014-06-26 16:10                                 ` Russell King - ARM Linux
  -1 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 16:10 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
> We don't plan to rely on CMA for too long. IOMMU support is on the way
> and should make our life easier, although no matter the source of
> memory, we will still have the issue of the lowmem mappings.

When it comes to DMA memory, talking about lowmem vs highmem is utterly
meaningless.

The lowmem/highmem split is entirely a software concept and is completely
adjustable.  An extreme example is that you can boot any platform with
more than 32MB of memory with 32MB of lowmem and the remainder as
highmem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 16:10                                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 16:10 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Stéphane Marchesin, Lucas Stach, Maarten Lankhorst, nouveau,
	linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
> We don't plan to rely on CMA for too long. IOMMU support is on the way
> and should make our life easier, although no matter the source of
> memory, we will still have the issue of the lowmem mappings.

When it comes to DMA memory, talking about lowmem vs highmem is utterly
meaningless.

The lowmem/highmem split is entirely a software concept and is completely
adjustable.  An extreme example is that you can boot any platform with
more than 32MB of memory with 32MB of lowmem and the remainder as
highmem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 16:10                                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2014-06-26 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
> We don't plan to rely on CMA for too long. IOMMU support is on the way
> and should make our life easier, although no matter the source of
> memory, we will still have the issue of the lowmem mappings.

When it comes to DMA memory, talking about lowmem vs highmem is utterly
meaningless.

The lowmem/highmem split is entirely a software concept and is completely
adjustable.  An extreme example is that you can boot any platform with
more than 32MB of memory with 32MB of lowmem and the remainder as
highmem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-26 16:10                                 ` Russell King - ARM Linux
  (?)
@ 2014-06-26 23:17                                     ` Alexandre Courbot
  -1 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 23:17 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
<linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org> wrote:
> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>> and should make our life easier, although no matter the source of
>> memory, we will still have the issue of the lowmem mappings.
>
> When it comes to DMA memory, talking about lowmem vs highmem is utterly
> meaningless.
>
> The lowmem/highmem split is entirely a software concept and is completely
> adjustable.  An extreme example is that you can boot any platform with
> more than 32MB of memory with 32MB of lowmem and the remainder as
> highmem.

True, but isn't it also the case that all lowmem is already mapped in
the kernel address space, and that re-mapping this memory with
different cache settings (e.g. by creating a WC mapping for user-space
to write into) is undefined on ARM and must be avoided? That is the
issue I was referring to.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 23:17                                     ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 23:17 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Stéphane Marchesin, Lucas Stach, Maarten Lankhorst, nouveau,
	linux-kernel, dri-devel, Ben Skeggs, linux-tegra,
	linux-arm-kernel

On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>> and should make our life easier, although no matter the source of
>> memory, we will still have the issue of the lowmem mappings.
>
> When it comes to DMA memory, talking about lowmem vs highmem is utterly
> meaningless.
>
> The lowmem/highmem split is entirely a software concept and is completely
> adjustable.  An extreme example is that you can boot any platform with
> more than 32MB of memory with 32MB of lowmem and the remainder as
> highmem.

True, but isn't it also the case that all lowmem is already mapped in
the kernel address space, and that re-mapping this memory with
different cache settings (e.g. by creating a WC mapping for user-space
to write into) is undefined on ARM and must be avoided? That is the
issue I was referring to.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-26 23:17                                     ` Alexandre Courbot
  0 siblings, 0 replies; 63+ messages in thread
From: Alexandre Courbot @ 2014-06-26 23:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>> and should make our life easier, although no matter the source of
>> memory, we will still have the issue of the lowmem mappings.
>
> When it comes to DMA memory, talking about lowmem vs highmem is utterly
> meaningless.
>
> The lowmem/highmem split is entirely a software concept and is completely
> adjustable.  An extreme example is that you can boot any platform with
> more than 32MB of memory with 32MB of lowmem and the remainder as
> highmem.

True, but isn't it also the case that all lowmem is already mapped in
the kernel address space, and that re-mapping this memory with
different cache settings (e.g. by creating a WC mapping for user-space
to write into) is undefined on ARM and must be avoided? That is the
issue I was referring to.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
  2014-06-26 23:17                                     ` Alexandre Courbot
  (?)
@ 2014-06-27 12:08                                       ` Rob Clark
  -1 siblings, 0 replies; 63+ messages in thread
From: Rob Clark @ 2014-06-27 12:08 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Russell King - ARM Linux, nouveau, linux-kernel, dri-devel,
	Ben Skeggs, linux-tegra, linux-arm-kernel

On Thu, Jun 26, 2014 at 7:17 PM, Alexandre Courbot <gnurou@gmail.com> wrote:
> On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>>> and should make our life easier, although no matter the source of
>>> memory, we will still have the issue of the lowmem mappings.
>>
>> When it comes to DMA memory, talking about lowmem vs highmem is utterly
>> meaningless.
>>
>> The lowmem/highmem split is entirely a software concept and is completely
>> adjustable.  An extreme example is that you can boot any platform with
>> more than 32MB of memory with 32MB of lowmem and the remainder as
>> highmem.
>
> True, but isn't it also the case that all lowmem is already mapped in
> the kernel address space, and that re-mapping this memory with
> different cache settings (e.g. by creating a WC mapping for user-space
> to write into) is undefined on ARM and must be avoided? That is the
> issue I was referring to.
>

dma memory should be removed from the kernel linear map (if needed)..
assuming it is allocated w/ dma api's.

btw, something I've been wondering for a little while, but haven't had
time to investigate.  Not sure if this applies to you as well.  But
seems like I have IOMMU's which can be outer-coherent (snoop L2), but
I *think* they are not inner-coherent (L1).  No idea if current dma
memory code can grok this and only do inner-cache op's..

BR,
-R

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-27 12:08                                       ` Rob Clark
  0 siblings, 0 replies; 63+ messages in thread
From: Rob Clark @ 2014-06-27 12:08 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Russell King - ARM Linux, linux-tegra, nouveau, linux-kernel,
	dri-devel, Stéphane Marchesin, Ben Skeggs,
	Maarten Lankhorst, linux-arm-kernel, Lucas Stach

On Thu, Jun 26, 2014 at 7:17 PM, Alexandre Courbot <gnurou@gmail.com> wrote:
> On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>>> and should make our life easier, although no matter the source of
>>> memory, we will still have the issue of the lowmem mappings.
>>
>> When it comes to DMA memory, talking about lowmem vs highmem is utterly
>> meaningless.
>>
>> The lowmem/highmem split is entirely a software concept and is completely
>> adjustable.  An extreme example is that you can boot any platform with
>> more than 32MB of memory with 32MB of lowmem and the remainder as
>> highmem.
>
> True, but isn't it also the case that all lowmem is already mapped in
> the kernel address space, and that re-mapping this memory with
> different cache settings (e.g. by creating a WC mapping for user-space
> to write into) is undefined on ARM and must be avoided? That is the
> issue I was referring to.
>

dma memory should be removed from the kernel linear map (if needed)..
assuming it is allocated w/ dma api's.

btw, something I've been wondering for a little while, but haven't had
time to investigate.  Not sure if this applies to you as well.  But
seems like I have IOMMU's which can be outer-coherent (snoop L2), but
I *think* they are not inner-coherent (L1).  No idea if current dma
memory code can grok this and only do inner-cache op's..

BR,
-R

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
@ 2014-06-27 12:08                                       ` Rob Clark
  0 siblings, 0 replies; 63+ messages in thread
From: Rob Clark @ 2014-06-27 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 26, 2014 at 7:17 PM, Alexandre Courbot <gnurou@gmail.com> wrote:
> On Fri, Jun 27, 2014 at 1:10 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:
>>> We don't plan to rely on CMA for too long. IOMMU support is on the way
>>> and should make our life easier, although no matter the source of
>>> memory, we will still have the issue of the lowmem mappings.
>>
>> When it comes to DMA memory, talking about lowmem vs highmem is utterly
>> meaningless.
>>
>> The lowmem/highmem split is entirely a software concept and is completely
>> adjustable.  An extreme example is that you can boot any platform with
>> more than 32MB of memory with 32MB of lowmem and the remainder as
>> highmem.
>
> True, but isn't it also the case that all lowmem is already mapped in
> the kernel address space, and that re-mapping this memory with
> different cache settings (e.g. by creating a WC mapping for user-space
> to write into) is undefined on ARM and must be avoided? That is the
> issue I was referring to.
>

dma memory should be removed from the kernel linear map (if needed)..
assuming it is allocated w/ dma api's.

btw, something I've been wondering for a little while, but haven't had
time to investigate.  Not sure if this applies to you as well.  But
seems like I have IOMMU's which can be outer-coherent (snoop L2), but
I *think* they are not inner-coherent (L1).  No idea if current dma
memory code can grok this and only do inner-cache op's..

BR,
-R

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2014-06-27 12:08 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-24  9:54 [PATCH v2 0/3] drm/ttm: nouveau: memory coherency for ARM Alexandre Courbot
2014-06-24  9:54 ` Alexandre Courbot
2014-06-24  9:54 ` Alexandre Courbot
2014-06-24  9:54 ` [PATCH v2 1/3] drm/ttm: recognize ARM arch in ioprot handler Alexandre Courbot
2014-06-24  9:54   ` Alexandre Courbot
2014-06-24  9:54   ` Alexandre Courbot
     [not found] ` <1403603667-11302-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-06-24  9:54   ` [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers Alexandre Courbot
2014-06-24  9:54     ` Alexandre Courbot
2014-06-24  9:54     ` Alexandre Courbot
2014-06-24 10:02     ` Russell King - ARM Linux
2014-06-24 10:02       ` Russell King - ARM Linux
2014-06-24 10:02       ` Russell King - ARM Linux
2014-06-24 10:33       ` Alexandre Courbot
2014-06-24 10:33         ` Alexandre Courbot
2014-06-24 10:33         ` Alexandre Courbot
     [not found]         ` <53A953E6.2030503-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-06-24 10:55           ` Alexandre Courbot
2014-06-24 10:55             ` Alexandre Courbot
2014-06-24 10:55             ` Alexandre Courbot
     [not found]             ` <53A95910.20104-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-06-24 12:23               ` Alexandre Courbot
2014-06-24 12:23                 ` Alexandre Courbot
2014-06-24 12:23                 ` Alexandre Courbot
2014-06-24 12:27                 ` [Nouveau] " Maarten Lankhorst
2014-06-24 12:27                   ` Maarten Lankhorst
2014-06-24 12:27                   ` Maarten Lankhorst
2014-06-24 13:25                   ` Lucas Stach
2014-06-24 13:25                     ` Lucas Stach
2014-06-24 13:25                     ` Lucas Stach
     [not found]                     ` <1403616338.4230.8.camel-WzVe3FnzCwFR6QfukMTsflXZhhPuCNm+@public.gmane.org>
2014-06-24 13:52                       ` Alexandre Courbot
2014-06-24 13:52                         ` [Nouveau] " Alexandre Courbot
2014-06-24 13:52                         ` Alexandre Courbot
2014-06-24 13:58                         ` Lucas Stach
2014-06-24 13:58                           ` Lucas Stach
2014-06-24 13:58                           ` Lucas Stach
     [not found]                           ` <1403618295.4230.13.camel-WzVe3FnzCwFR6QfukMTsflXZhhPuCNm+@public.gmane.org>
2014-06-24 14:03                             ` Alexandre Courbot
2014-06-24 14:03                               ` [Nouveau] " Alexandre Courbot
2014-06-24 14:03                               ` Alexandre Courbot
2014-06-26 14:50                             ` Alexandre Courbot
2014-06-26 14:50                               ` Alexandre Courbot
2014-06-26 14:50                               ` Alexandre Courbot
2014-06-25  4:00                       ` Stéphane Marchesin
2014-06-25  4:00                         ` [Nouveau] " Stéphane Marchesin
2014-06-25  4:00                         ` Stéphane Marchesin
     [not found]                         ` <CACP_E+KECMyusuZd6hpzthL-RpfUOJpWh0EABf1DGAeaXc00aA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-26 14:53                           ` Alexandre Courbot
2014-06-26 14:53                             ` [Nouveau] " Alexandre Courbot
2014-06-26 14:53                             ` Alexandre Courbot
     [not found]                             ` <CAAVeFuLU04rk1DWMhVJV=ktVCpq4js38p1ipCr1jh1x72b=GZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-26 16:10                               ` Russell King - ARM Linux
2014-06-26 16:10                                 ` [Nouveau] " Russell King - ARM Linux
2014-06-26 16:10                                 ` Russell King - ARM Linux
     [not found]                                 ` <20140626161055.GX32514-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2014-06-26 23:17                                   ` Alexandre Courbot
2014-06-26 23:17                                     ` [Nouveau] " Alexandre Courbot
2014-06-26 23:17                                     ` Alexandre Courbot
2014-06-27 12:08                                     ` Rob Clark
2014-06-27 12:08                                       ` Rob Clark
2014-06-27 12:08                                       ` Rob Clark
2014-06-24 13:09                 ` Russell King - ARM Linux
2014-06-24 13:09                   ` Russell King - ARM Linux
2014-06-24 13:09                   ` Russell King - ARM Linux
     [not found]                   ` <20140624130930.GN32514-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2014-06-24 13:25                     ` Alexandre Courbot
2014-06-24 13:25                       ` Alexandre Courbot
2014-06-24 13:25                       ` Alexandre Courbot
2014-06-24  9:54 ` [PATCH v2 3/3] drm/nouveau: hook up cache sync functions Alexandre Courbot
2014-06-24  9:54   ` Alexandre Courbot
2014-06-24  9:54   ` Alexandre Courbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.