All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: unpinned DMA-buf exporting
@ 2018-03-09 19:11 ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx; +Cc: sumit.semwal

This set of patches adds an option invalidate_mappings callback to each DMA-buf attachment which can be filled in by the importer.

This callback allows the exporter to provided the DMA-buf content without pinning it. The reservation objects lock acts as synchronization point for buffer moves and creating mappings.

This set includes an implementation for amdgpu which should be rather easily portable to other DRM drivers.

Please comment,
Christian.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RFC: unpinned DMA-buf exporting
@ 2018-03-09 19:11 ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx

This set of patches adds an option invalidate_mappings callback to each DMA-buf attachment which can be filled in by the importer.

This callback allows the exporter to provided the DMA-buf content without pinning it. The reservation objects lock acts as synchronization point for buffer moves and creating mappings.

This set includes an implementation for amdgpu which should be rather easily portable to other DRM drivers.

Please comment,
Christian.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx; +Cc: sumit.semwal

Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c | 25 +++++++++++++++++++++++++
 include/linux/dma-buf.h   | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..ed8d5844ae74 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
 
 	might_sleep();
 
+	if (attach->invalidate_mappings)
+		reservation_object_assert_held(attach->dmabuf->resv);
+
 	if (WARN_ON(!attach || !attach->dmabuf))
 		return ERR_PTR(-EINVAL);
 
@@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
 {
 	might_sleep();
 
+	if (attach->invalidate_mappings)
+		reservation_object_assert_held(attach->dmabuf->resv);
+
 	if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
 		return;
 
@@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:	[in]	buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+	struct dma_buf_attachment *attach;
+
+	reservation_object_assert_held(dmabuf->resv);
+
+	list_for_each_entry(attach, &dmabuf->attachments, node)
+		attach->invalidate_mappings(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 085db2fee2d7..c1e2f7d93509 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -91,6 +91,18 @@ struct dma_buf_ops {
 	 */
 	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
 
+	/**
+	 * @supports_mapping_invalidation:
+	 *
+	 * True for exporters which supports unpinned DMA-buf operation using
+	 * the reservation lock.
+	 *
+	 * When attachment->invalidate_mappings is set the @map_dma_buf and
+	 * @unmap_dma_buf callbacks can be called with the reservation lock
+	 * held.
+	 */
+	bool supports_mapping_invalidation;
+
 	/**
 	 * @map_dma_buf:
 	 *
@@ -326,6 +338,29 @@ struct dma_buf_attachment {
 	struct device *dev;
 	struct list_head node;
 	void *priv;
+
+	/**
+	 * @invalidate_mappings:
+	 *
+	 * Optional callback provided by the importer of the attachment which
+	 * must be set before mappings are created.
+	 *
+	 * If provided the exporter can avoid pinning the backing store while
+	 * mappings exists.
+	 *
+	 * The function is called with the lock of the reservation object
+	 * associated with the dma_buf held and the mapping function must be
+	 * called with this lock held as well. This makes sure that no mapping
+	 * is created concurrently with an ongoing invalidation.
+	 *
+	 * After the callback all existing mappings are still valid until all
+	 * fences in the dma_bufs reservation object are signaled, but should be
+	 * destroyed by the importer as soon as possible.
+	 *
+	 * New mappings can be created immediately, but can't be used before the
+	 * exclusive fence in the dma_bufs reservation object is signaled.
+	 */
+	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
 
 /**
@@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 					enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 				enum dma_data_direction);
+void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 			     enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: sumit.semwal-QSEj5FYQhm4dnm+yROfE0A

Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c | 25 +++++++++++++++++++++++++
 include/linux/dma-buf.h   | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..ed8d5844ae74 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
 
 	might_sleep();
 
+	if (attach->invalidate_mappings)
+		reservation_object_assert_held(attach->dmabuf->resv);
+
 	if (WARN_ON(!attach || !attach->dmabuf))
 		return ERR_PTR(-EINVAL);
 
@@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
 {
 	might_sleep();
 
+	if (attach->invalidate_mappings)
+		reservation_object_assert_held(attach->dmabuf->resv);
+
 	if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
 		return;
 
@@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:	[in]	buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+	struct dma_buf_attachment *attach;
+
+	reservation_object_assert_held(dmabuf->resv);
+
+	list_for_each_entry(attach, &dmabuf->attachments, node)
+		attach->invalidate_mappings(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 085db2fee2d7..c1e2f7d93509 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -91,6 +91,18 @@ struct dma_buf_ops {
 	 */
 	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
 
+	/**
+	 * @supports_mapping_invalidation:
+	 *
+	 * True for exporters which supports unpinned DMA-buf operation using
+	 * the reservation lock.
+	 *
+	 * When attachment->invalidate_mappings is set the @map_dma_buf and
+	 * @unmap_dma_buf callbacks can be called with the reservation lock
+	 * held.
+	 */
+	bool supports_mapping_invalidation;
+
 	/**
 	 * @map_dma_buf:
 	 *
@@ -326,6 +338,29 @@ struct dma_buf_attachment {
 	struct device *dev;
 	struct list_head node;
 	void *priv;
+
+	/**
+	 * @invalidate_mappings:
+	 *
+	 * Optional callback provided by the importer of the attachment which
+	 * must be set before mappings are created.
+	 *
+	 * If provided the exporter can avoid pinning the backing store while
+	 * mappings exists.
+	 *
+	 * The function is called with the lock of the reservation object
+	 * associated with the dma_buf held and the mapping function must be
+	 * called with this lock held as well. This makes sure that no mapping
+	 * is created concurrently with an ongoing invalidation.
+	 *
+	 * After the callback all existing mappings are still valid until all
+	 * fences in the dma_bufs reservation object are signaled, but should be
+	 * destroyed by the importer as soon as possible.
+	 *
+	 * New mappings can be created immediately, but can't be used before the
+	 * exclusive fence in the dma_bufs reservation object is signaled.
+	 */
+	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
 
 /**
@@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 					enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 				enum dma_data_direction);
+void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 			     enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] drm/ttm: keep a reference to transfer pipelined BOs
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx; +Cc: sumit.semwal

Make sure the transfered BO is never destroy before the transfer BO.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 50 +++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1f730b3f18e5..1ee20558ee31 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -39,6 +39,11 @@
 #include <linux/module.h>
 #include <linux/reservation.h>
 
+struct ttm_transfer_obj {
+	struct ttm_buffer_object base;
+	struct ttm_buffer_object *bo;
+};
+
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
 	ttm_bo_mem_put(bo, &bo->mem);
@@ -435,7 +440,11 @@ EXPORT_SYMBOL(ttm_bo_move_memcpy);
 
 static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 {
-	kfree(bo);
+	struct ttm_transfer_obj *fbo;
+
+	fbo = container_of(bo, struct ttm_transfer_obj, base);
+	ttm_bo_unref(&fbo->bo);
+	kfree(fbo);
 }
 
 /**
@@ -456,14 +465,15 @@ static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 				      struct ttm_buffer_object **new_obj)
 {
-	struct ttm_buffer_object *fbo;
+	struct ttm_transfer_obj *fbo;
 	int ret;
 
 	fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
 	if (!fbo)
 		return -ENOMEM;
 
-	*fbo = *bo;
+	fbo->base = *bo;
+	fbo->bo = ttm_bo_reference(bo);
 
 	/**
 	 * Fix up members that we shouldn't copy directly:
@@ -471,25 +481,25 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	 */
 
 	atomic_inc(&bo->bdev->glob->bo_count);
-	INIT_LIST_HEAD(&fbo->ddestroy);
-	INIT_LIST_HEAD(&fbo->lru);
-	INIT_LIST_HEAD(&fbo->swap);
-	INIT_LIST_HEAD(&fbo->io_reserve_lru);
-	mutex_init(&fbo->wu_mutex);
-	fbo->moving = NULL;
-	drm_vma_node_reset(&fbo->vma_node);
-	atomic_set(&fbo->cpu_writers, 0);
-
-	kref_init(&fbo->list_kref);
-	kref_init(&fbo->kref);
-	fbo->destroy = &ttm_transfered_destroy;
-	fbo->acc_size = 0;
-	fbo->resv = &fbo->ttm_resv;
-	reservation_object_init(fbo->resv);
-	ret = reservation_object_trylock(fbo->resv);
+	INIT_LIST_HEAD(&fbo->base.ddestroy);
+	INIT_LIST_HEAD(&fbo->base.lru);
+	INIT_LIST_HEAD(&fbo->base.swap);
+	INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
+	mutex_init(&fbo->base.wu_mutex);
+	fbo->base.moving = NULL;
+	drm_vma_node_reset(&fbo->base.vma_node);
+	atomic_set(&fbo->base.cpu_writers, 0);
+
+	kref_init(&fbo->base.list_kref);
+	kref_init(&fbo->base.kref);
+	fbo->base.destroy = &ttm_transfered_destroy;
+	fbo->base.acc_size = 0;
+	fbo->base.resv = &fbo->base.ttm_resv;
+	reservation_object_init(fbo->base.resv);
+	ret = reservation_object_trylock(fbo->base.resv);
 	WARN_ON(!ret);
 
-	*new_obj = fbo;
+	*new_obj = &fbo->base;
 	return 0;
 }
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] drm/ttm: keep a reference to transfer pipelined BOs
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: sumit.semwal-QSEj5FYQhm4dnm+yROfE0A

Make sure the transfered BO is never destroy before the transfer BO.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 50 +++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1f730b3f18e5..1ee20558ee31 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -39,6 +39,11 @@
 #include <linux/module.h>
 #include <linux/reservation.h>
 
+struct ttm_transfer_obj {
+	struct ttm_buffer_object base;
+	struct ttm_buffer_object *bo;
+};
+
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
 	ttm_bo_mem_put(bo, &bo->mem);
@@ -435,7 +440,11 @@ EXPORT_SYMBOL(ttm_bo_move_memcpy);
 
 static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 {
-	kfree(bo);
+	struct ttm_transfer_obj *fbo;
+
+	fbo = container_of(bo, struct ttm_transfer_obj, base);
+	ttm_bo_unref(&fbo->bo);
+	kfree(fbo);
 }
 
 /**
@@ -456,14 +465,15 @@ static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 				      struct ttm_buffer_object **new_obj)
 {
-	struct ttm_buffer_object *fbo;
+	struct ttm_transfer_obj *fbo;
 	int ret;
 
 	fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
 	if (!fbo)
 		return -ENOMEM;
 
-	*fbo = *bo;
+	fbo->base = *bo;
+	fbo->bo = ttm_bo_reference(bo);
 
 	/**
 	 * Fix up members that we shouldn't copy directly:
@@ -471,25 +481,25 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	 */
 
 	atomic_inc(&bo->bdev->glob->bo_count);
-	INIT_LIST_HEAD(&fbo->ddestroy);
-	INIT_LIST_HEAD(&fbo->lru);
-	INIT_LIST_HEAD(&fbo->swap);
-	INIT_LIST_HEAD(&fbo->io_reserve_lru);
-	mutex_init(&fbo->wu_mutex);
-	fbo->moving = NULL;
-	drm_vma_node_reset(&fbo->vma_node);
-	atomic_set(&fbo->cpu_writers, 0);
-
-	kref_init(&fbo->list_kref);
-	kref_init(&fbo->kref);
-	fbo->destroy = &ttm_transfered_destroy;
-	fbo->acc_size = 0;
-	fbo->resv = &fbo->ttm_resv;
-	reservation_object_init(fbo->resv);
-	ret = reservation_object_trylock(fbo->resv);
+	INIT_LIST_HEAD(&fbo->base.ddestroy);
+	INIT_LIST_HEAD(&fbo->base.lru);
+	INIT_LIST_HEAD(&fbo->base.swap);
+	INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
+	mutex_init(&fbo->base.wu_mutex);
+	fbo->base.moving = NULL;
+	drm_vma_node_reset(&fbo->base.vma_node);
+	atomic_set(&fbo->base.cpu_writers, 0);
+
+	kref_init(&fbo->base.list_kref);
+	kref_init(&fbo->base.kref);
+	fbo->base.destroy = &ttm_transfered_destroy;
+	fbo->base.acc_size = 0;
+	fbo->base.resv = &fbo->base.ttm_resv;
+	reservation_object_init(fbo->base.resv);
+	ret = reservation_object_trylock(fbo->base.resv);
 	WARN_ON(!ret);
 
-	*new_obj = fbo;
+	*new_obj = &fbo->base;
 	return 0;
 }
 
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] drm/amdgpu: add independent DMA-buf export
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx; +Cc: sumit.semwal

Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 117 ++++++++++++++++++-----------
 4 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96bcdb97e7e2..909dc9764a22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
 				struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 				 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e6709362994a..e32dcdf8d3db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -886,7 +886,6 @@ static struct drm_driver kms_driver = {
 	.gem_prime_export = amdgpu_gem_prime_export,
 	.gem_prime_import = amdgpu_gem_prime_import,
 	.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-	.gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
 	.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
 	.gem_prime_vmap = amdgpu_gem_prime_vmap,
 	.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 48e0115d4f76..d5db5955a70a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/dma-buf.h>
 #include <drm/drmP.h>
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_cache.h>
@@ -931,6 +932,9 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
 	amdgpu_bo_kunmap(abo);
 
+	if (abo->gem_base.dma_buf)
+		dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
 	/* remember the eviction */
 	if (evict)
 		atomic64_inc(&adev->num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c9991738477..f575fb46d7a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,13 +32,9 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-	int npages = bo->tbo.num_pages;
-
-	return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
+static void amdgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
+				     struct sg_table *sgt,
+				     enum dma_data_direction dir);
 
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
@@ -126,22 +122,21 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 	return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-				 struct device *target_dev,
-				 struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+		       enum dma_data_direction dir)
 {
+	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+	struct sg_table *sgt;
 	long r;
 
-	r = drm_gem_map_attach(dma_buf, target_dev, attach);
-	if (r)
-		return r;
-
-	r = amdgpu_bo_reserve(bo, false);
-	if (unlikely(r != 0))
-		goto error_detach;
-
+	if (!attach->invalidate_mappings) {
+		r = amdgpu_bo_reserve(bo, false);
+		if (unlikely(r != 0))
+			return ERR_PTR(r);
+	}
 
 	if (dma_buf->ops != &amdgpu_dmabuf_ops) {
 		/*
@@ -157,41 +152,80 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 		}
 	}
 
-	/* pin buffer into GTT */
-	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-	if (r)
-		goto error_unreserve;
+	if (!attach->invalidate_mappings || true) {
+		/* pin buffer into GTT */
+		r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+		if (r)
+			goto error_unreserve;
+
+	} else {
+		/* move buffer into GTT */
+		struct ttm_operation_ctx ctx = { false, false };
+
+		amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+		r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+		if (r)
+			goto error_unreserve;
+	}
 
 	if (dma_buf->ops != &amdgpu_dmabuf_ops)
 		bo->prime_shared_count++;
 
+	sgt = drm_prime_pages_to_sg(bo->tbo.ttm->pages, bo->tbo.num_pages);
+	if (IS_ERR(sgt)) {
+		r = PTR_ERR(sgt);
+		goto error_unmap;
+	}
+
+	if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+			      DMA_ATTR_SKIP_CPU_SYNC))
+		goto error_free;
+
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
+
+	return sgt;
+
+error_free:
+	sg_free_table(sgt);
+	kfree(sgt);
+	r = -ENOMEM;
+
+error_unmap:
+	amdgpu_gem_unmap_dma_buf(attach, NULL, dir);
+
 error_unreserve:
-	amdgpu_bo_unreserve(bo);
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
 
-error_detach:
-	if (r)
-		drm_gem_map_detach(dma_buf, attach);
-	return r;
+	return ERR_PTR(r);
 }
 
-static void amdgpu_gem_map_detach(struct dma_buf *dma_buf,
-				  struct dma_buf_attachment *attach)
+static void amdgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
+				     struct sg_table *sgt,
+				     enum dma_data_direction dir)
 {
+	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-	int ret = 0;
 
-	ret = amdgpu_bo_reserve(bo, true);
-	if (unlikely(ret != 0))
-		goto error;
+	if (!attach->invalidate_mappings) {
+		amdgpu_bo_reserve(bo, true);
+		amdgpu_bo_unpin(bo);
+	}
 
-	amdgpu_bo_unpin(bo);
-	if (dma_buf->ops != &amdgpu_dmabuf_ops && bo->prime_shared_count)
+	if (dma_buf->ops != &amdgpu_dmabuf_ops &&
+	    bo->prime_shared_count)
 		bo->prime_shared_count--;
-	amdgpu_bo_unreserve(bo);
 
-error:
-	drm_gem_map_detach(dma_buf, attach);
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
+
+	if (sgt) {
+		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
 }
 
 struct reservation_object *amdgpu_gem_prime_res_obj(struct drm_gem_object *obj)
@@ -230,10 +264,9 @@ static int amdgpu_gem_begin_cpu_access(struct dma_buf *dma_buf,
 }
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops = {
-	.attach = amdgpu_gem_map_attach,
-	.detach = amdgpu_gem_map_detach,
-	.map_dma_buf = drm_gem_map_dma_buf,
-	.unmap_dma_buf = drm_gem_unmap_dma_buf,
+	.supports_mapping_invalidation = true,
+	.map_dma_buf = amdgpu_gem_map_dma_buf,
+	.unmap_dma_buf = amdgpu_gem_unmap_dma_buf,
 	.release = drm_gem_dmabuf_release,
 	.begin_cpu_access = amdgpu_gem_begin_cpu_access,
 	.map = drm_gem_dmabuf_kmap,
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] drm/amdgpu: add independent DMA-buf export
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: sumit.semwal-QSEj5FYQhm4dnm+yROfE0A

Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 117 ++++++++++++++++++-----------
 4 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96bcdb97e7e2..909dc9764a22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
 				struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 				 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e6709362994a..e32dcdf8d3db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -886,7 +886,6 @@ static struct drm_driver kms_driver = {
 	.gem_prime_export = amdgpu_gem_prime_export,
 	.gem_prime_import = amdgpu_gem_prime_import,
 	.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-	.gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
 	.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
 	.gem_prime_vmap = amdgpu_gem_prime_vmap,
 	.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 48e0115d4f76..d5db5955a70a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/dma-buf.h>
 #include <drm/drmP.h>
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_cache.h>
@@ -931,6 +932,9 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
 	amdgpu_bo_kunmap(abo);
 
+	if (abo->gem_base.dma_buf)
+		dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
 	/* remember the eviction */
 	if (evict)
 		atomic64_inc(&adev->num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c9991738477..f575fb46d7a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,13 +32,9 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-	int npages = bo->tbo.num_pages;
-
-	return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
+static void amdgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
+				     struct sg_table *sgt,
+				     enum dma_data_direction dir);
 
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
@@ -126,22 +122,21 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 	return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-				 struct device *target_dev,
-				 struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+		       enum dma_data_direction dir)
 {
+	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+	struct sg_table *sgt;
 	long r;
 
-	r = drm_gem_map_attach(dma_buf, target_dev, attach);
-	if (r)
-		return r;
-
-	r = amdgpu_bo_reserve(bo, false);
-	if (unlikely(r != 0))
-		goto error_detach;
-
+	if (!attach->invalidate_mappings) {
+		r = amdgpu_bo_reserve(bo, false);
+		if (unlikely(r != 0))
+			return ERR_PTR(r);
+	}
 
 	if (dma_buf->ops != &amdgpu_dmabuf_ops) {
 		/*
@@ -157,41 +152,80 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 		}
 	}
 
-	/* pin buffer into GTT */
-	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-	if (r)
-		goto error_unreserve;
+	if (!attach->invalidate_mappings || true) {
+		/* pin buffer into GTT */
+		r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+		if (r)
+			goto error_unreserve;
+
+	} else {
+		/* move buffer into GTT */
+		struct ttm_operation_ctx ctx = { false, false };
+
+		amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+		r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+		if (r)
+			goto error_unreserve;
+	}
 
 	if (dma_buf->ops != &amdgpu_dmabuf_ops)
 		bo->prime_shared_count++;
 
+	sgt = drm_prime_pages_to_sg(bo->tbo.ttm->pages, bo->tbo.num_pages);
+	if (IS_ERR(sgt)) {
+		r = PTR_ERR(sgt);
+		goto error_unmap;
+	}
+
+	if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+			      DMA_ATTR_SKIP_CPU_SYNC))
+		goto error_free;
+
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
+
+	return sgt;
+
+error_free:
+	sg_free_table(sgt);
+	kfree(sgt);
+	r = -ENOMEM;
+
+error_unmap:
+	amdgpu_gem_unmap_dma_buf(attach, NULL, dir);
+
 error_unreserve:
-	amdgpu_bo_unreserve(bo);
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
 
-error_detach:
-	if (r)
-		drm_gem_map_detach(dma_buf, attach);
-	return r;
+	return ERR_PTR(r);
 }
 
-static void amdgpu_gem_map_detach(struct dma_buf *dma_buf,
-				  struct dma_buf_attachment *attach)
+static void amdgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
+				     struct sg_table *sgt,
+				     enum dma_data_direction dir)
 {
+	struct dma_buf *dma_buf = attach->dmabuf;
 	struct drm_gem_object *obj = dma_buf->priv;
 	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-	int ret = 0;
 
-	ret = amdgpu_bo_reserve(bo, true);
-	if (unlikely(ret != 0))
-		goto error;
+	if (!attach->invalidate_mappings) {
+		amdgpu_bo_reserve(bo, true);
+		amdgpu_bo_unpin(bo);
+	}
 
-	amdgpu_bo_unpin(bo);
-	if (dma_buf->ops != &amdgpu_dmabuf_ops && bo->prime_shared_count)
+	if (dma_buf->ops != &amdgpu_dmabuf_ops &&
+	    bo->prime_shared_count)
 		bo->prime_shared_count--;
-	amdgpu_bo_unreserve(bo);
 
-error:
-	drm_gem_map_detach(dma_buf, attach);
+	if (!attach->invalidate_mappings)
+		amdgpu_bo_unreserve(bo);
+
+	if (sgt) {
+		dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
 }
 
 struct reservation_object *amdgpu_gem_prime_res_obj(struct drm_gem_object *obj)
@@ -230,10 +264,9 @@ static int amdgpu_gem_begin_cpu_access(struct dma_buf *dma_buf,
 }
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops = {
-	.attach = amdgpu_gem_map_attach,
-	.detach = amdgpu_gem_map_detach,
-	.map_dma_buf = drm_gem_map_dma_buf,
-	.unmap_dma_buf = drm_gem_unmap_dma_buf,
+	.supports_mapping_invalidation = true,
+	.map_dma_buf = amdgpu_gem_map_dma_buf,
+	.unmap_dma_buf = amdgpu_gem_unmap_dma_buf,
 	.release = drm_gem_dmabuf_release,
 	.begin_cpu_access = amdgpu_gem_begin_cpu_access,
 	.map = drm_gem_dmabuf_kmap,
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/4] drm/amdgpu: add independent DMA-buf import
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig, linux-media, dri-devel, amd-gfx; +Cc: sumit.semwal

Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 64 +++++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 +++++++++++++---
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index f575fb46d7a8..7963ce329519 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -298,22 +298,64 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+	struct ttm_operation_ctx ctx = { false, false };
+	struct drm_gem_object *obj = attach->priv;
+	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+	int r;
+
+	amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (r)
+		DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
 					    struct dma_buf *dma_buf)
 {
+	struct dma_buf_attachment *attach;
 	struct drm_gem_object *obj;
+	int ret;
 
-	if (dma_buf->ops == &amdgpu_dmabuf_ops) {
-		obj = dma_buf->priv;
-		if (obj->dev == dev) {
-			/*
-			 * Importing dmabuf exported from out own gem increases
-			 * refcount on gem itself instead of f_count of dmabuf.
-			 */
-			drm_gem_object_get(obj);
-			return obj;
-		}
+	if (dma_buf->ops != &amdgpu_dmabuf_ops)
+		return drm_gem_prime_import(dev, dma_buf);
+
+	obj = dma_buf->priv;
+	if (obj->dev == dev) {
+		/*
+		 * Importing dmabuf exported from out own gem increases
+		 * refcount on gem itself instead of f_count of dmabuf.
+		 */
+		drm_gem_object_get(obj);
+		return obj;
+	}
+
+	/*
+	 * Attach, but don't map other amdgpu BOs
+	 */
+	attach = dma_buf_attach(dma_buf, dev->dev);
+	if (IS_ERR(attach))
+		return ERR_CAST(attach);
+
+	get_dma_buf(dma_buf);
+
+	obj = amdgpu_gem_prime_import_sg_table(dev, attach, NULL);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto fail_detach;
 	}
 
-	return drm_gem_prime_import(dev, dma_buf);
+	obj->import_attach = attach;
+	attach->invalidate_mappings = amdgpu_gem_prime_invalidate_mappings;
+	attach->priv = obj;
+
+	return obj;
+
+fail_detach:
+	dma_buf_detach(dma_buf, attach);
+	dma_buf_put(dma_buf);
+
+	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 291dd3d600cd..aeead0281e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -43,6 +43,7 @@
 #include <linux/pagemap.h>
 #include <linux/debugfs.h>
 #include <linux/iommu.h>
+#include <linux/dma-buf.h>
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
@@ -685,6 +686,7 @@ struct amdgpu_ttm_gup_task_list {
 
 struct amdgpu_ttm_tt {
 	struct ttm_dma_tt	ttm;
+	struct amdgpu_bo	*bo;
 	u64			offset;
 	uint64_t		userptr;
 	struct mm_struct	*usermm;
@@ -993,6 +995,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct ttm_buffer_object *bo,
 		return NULL;
 	}
 	gtt->ttm.ttm.func = &amdgpu_backend_func;
+	gtt->bo = ttm_to_amdgpu_bo(bo);
 	if (ttm_sg_tt_init(&gtt->ttm, bo, page_flags)) {
 		kfree(gtt);
 		return NULL;
@@ -1005,7 +1008,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
-	bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
 
 	if (gtt && gtt->userptr) {
 		ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL);
@@ -1017,7 +1019,19 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 		return 0;
 	}
 
-	if (slave && ttm->sg) {
+	if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
+		if (!ttm->sg) {
+			struct dma_buf_attachment *attach;
+			struct sg_table *sgt;
+
+			attach = gtt->bo->gem_base.import_attach;
+			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+			if (IS_ERR(sgt))
+				return PTR_ERR(sgt);
+
+			ttm->sg = sgt;
+		}
+
 		drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
 						 gtt->ttm.dma_address,
 						 ttm->num_pages);
@@ -1036,9 +1050,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 
 static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
 {
-	struct amdgpu_device *adev;
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
-	bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
+	struct amdgpu_device *adev;
 
 	if (gtt && gtt->userptr) {
 		amdgpu_ttm_tt_set_user_pages(ttm, NULL);
@@ -1047,7 +1060,16 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
 		return;
 	}
 
-	if (slave)
+	if (ttm->sg && !gtt->bo->tbo.sg) {
+		struct dma_buf_attachment *attach;
+
+		attach = gtt->bo->gem_base.import_attach;
+		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
+		ttm->sg = NULL;
+		return;
+	}
+
+	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
 		return;
 
 	adev = amdgpu_ttm_adev(ttm->bdev);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/4] drm/amdgpu: add independent DMA-buf import
@ 2018-03-09 19:11   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-09 19:11 UTC (permalink / raw)
  To: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: sumit.semwal-QSEj5FYQhm4dnm+yROfE0A

Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 64 +++++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 +++++++++++++---
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index f575fb46d7a8..7963ce329519 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -298,22 +298,64 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+	struct ttm_operation_ctx ctx = { false, false };
+	struct drm_gem_object *obj = attach->priv;
+	struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+	int r;
+
+	amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (r)
+		DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
 					    struct dma_buf *dma_buf)
 {
+	struct dma_buf_attachment *attach;
 	struct drm_gem_object *obj;
+	int ret;
 
-	if (dma_buf->ops == &amdgpu_dmabuf_ops) {
-		obj = dma_buf->priv;
-		if (obj->dev == dev) {
-			/*
-			 * Importing dmabuf exported from out own gem increases
-			 * refcount on gem itself instead of f_count of dmabuf.
-			 */
-			drm_gem_object_get(obj);
-			return obj;
-		}
+	if (dma_buf->ops != &amdgpu_dmabuf_ops)
+		return drm_gem_prime_import(dev, dma_buf);
+
+	obj = dma_buf->priv;
+	if (obj->dev == dev) {
+		/*
+		 * Importing dmabuf exported from out own gem increases
+		 * refcount on gem itself instead of f_count of dmabuf.
+		 */
+		drm_gem_object_get(obj);
+		return obj;
+	}
+
+	/*
+	 * Attach, but don't map other amdgpu BOs
+	 */
+	attach = dma_buf_attach(dma_buf, dev->dev);
+	if (IS_ERR(attach))
+		return ERR_CAST(attach);
+
+	get_dma_buf(dma_buf);
+
+	obj = amdgpu_gem_prime_import_sg_table(dev, attach, NULL);
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
+		goto fail_detach;
 	}
 
-	return drm_gem_prime_import(dev, dma_buf);
+	obj->import_attach = attach;
+	attach->invalidate_mappings = amdgpu_gem_prime_invalidate_mappings;
+	attach->priv = obj;
+
+	return obj;
+
+fail_detach:
+	dma_buf_detach(dma_buf, attach);
+	dma_buf_put(dma_buf);
+
+	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 291dd3d600cd..aeead0281e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -43,6 +43,7 @@
 #include <linux/pagemap.h>
 #include <linux/debugfs.h>
 #include <linux/iommu.h>
+#include <linux/dma-buf.h>
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
@@ -685,6 +686,7 @@ struct amdgpu_ttm_gup_task_list {
 
 struct amdgpu_ttm_tt {
 	struct ttm_dma_tt	ttm;
+	struct amdgpu_bo	*bo;
 	u64			offset;
 	uint64_t		userptr;
 	struct mm_struct	*usermm;
@@ -993,6 +995,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct ttm_buffer_object *bo,
 		return NULL;
 	}
 	gtt->ttm.ttm.func = &amdgpu_backend_func;
+	gtt->bo = ttm_to_amdgpu_bo(bo);
 	if (ttm_sg_tt_init(&gtt->ttm, bo, page_flags)) {
 		kfree(gtt);
 		return NULL;
@@ -1005,7 +1008,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
-	bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
 
 	if (gtt && gtt->userptr) {
 		ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL);
@@ -1017,7 +1019,19 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 		return 0;
 	}
 
-	if (slave && ttm->sg) {
+	if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
+		if (!ttm->sg) {
+			struct dma_buf_attachment *attach;
+			struct sg_table *sgt;
+
+			attach = gtt->bo->gem_base.import_attach;
+			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+			if (IS_ERR(sgt))
+				return PTR_ERR(sgt);
+
+			ttm->sg = sgt;
+		}
+
 		drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
 						 gtt->ttm.dma_address,
 						 ttm->num_pages);
@@ -1036,9 +1050,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 
 static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
 {
-	struct amdgpu_device *adev;
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
-	bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
+	struct amdgpu_device *adev;
 
 	if (gtt && gtt->userptr) {
 		amdgpu_ttm_tt_set_user_pages(ttm, NULL);
@@ -1047,7 +1060,16 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_tt *ttm)
 		return;
 	}
 
-	if (slave)
+	if (ttm->sg && !gtt->bo->tbo.sg) {
+		struct dma_buf_attachment *attach;
+
+		attach = gtt->bo->gem_base.import_attach;
+		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
+		ttm->sg = NULL;
+		return;
+	}
+
+	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
 		return;
 
 	adev = amdgpu_ttm_adev(ttm->bdev);
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
  2018-03-09 19:11   ` Christian König
@ 2018-03-12 17:07     ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 17:07 UTC (permalink / raw)
  To: Christian K??nig; +Cc: linaro-mm-sig, linux-media, dri-devel, amd-gfx

On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
> Each importer can now provide an invalidate_mappings callback.
> 
> This allows the exporter to provide the mappings without the need to pin
> the backing store.
> 
> Signed-off-by: Christian K??nig <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-buf.c | 25 +++++++++++++++++++++++++
>  include/linux/dma-buf.h   | 36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index d78d5fc173dc..ed8d5844ae74 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
>  
>  	might_sleep();
>  
> +	if (attach->invalidate_mappings)
> +		reservation_object_assert_held(attach->dmabuf->resv);
> +
>  	if (WARN_ON(!attach || !attach->dmabuf))
>  		return ERR_PTR(-EINVAL);
>  
> @@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
>  {
>  	might_sleep();
>  
> +	if (attach->invalidate_mappings)
> +		reservation_object_assert_held(attach->dmabuf->resv);
> +
>  	if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
>  		return;
>  
> @@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
>  }
>  EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
>  
> +/**
> + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
> + *
> + * @dmabuf:	[in]	buffer which mappings should be invalidated
> + *
> + * Informs all attachmenst that they need to destroy and recreated all their
> + * mappings.
> + */
> +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
> +{
> +	struct dma_buf_attachment *attach;
> +
> +	reservation_object_assert_held(dmabuf->resv);
> +
> +	list_for_each_entry(attach, &dmabuf->attachments, node)
> +		attach->invalidate_mappings(attach);

To make the locking work I think we also need to require importers to hold
the reservation object while attaching/detaching. Otherwise the list walk
above could go boom.

We could use the existing dma-buf lock, but I think that'll just result in
deadlocks.

> +}
> +EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
> +
>  /**
>   * DOC: cpu access
>   *
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 085db2fee2d7..c1e2f7d93509 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -91,6 +91,18 @@ struct dma_buf_ops {
>  	 */
>  	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
>  
> +	/**
> +	 * @supports_mapping_invalidation:
> +	 *
> +	 * True for exporters which supports unpinned DMA-buf operation using
> +	 * the reservation lock.
> +	 *
> +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
> +	 * @unmap_dma_buf callbacks can be called with the reservation lock
> +	 * held.
> +	 */
> +	bool supports_mapping_invalidation;

Why do we need this? Importer could simply always register with the
invalidate_mapping hook registered, and exporters could use it when they
see fit. That gives us more lockdep coverage to make sure importers use
their attachment callbacks correctly (aka they hold the reservation
object).

> +
>  	/**
>  	 * @map_dma_buf:
>  	 *
> @@ -326,6 +338,29 @@ struct dma_buf_attachment {
>  	struct device *dev;
>  	struct list_head node;
>  	void *priv;
> +
> +	/**
> +	 * @invalidate_mappings:
> +	 *
> +	 * Optional callback provided by the importer of the attachment which
> +	 * must be set before mappings are created.

This doesn't work, it must be set before the attachment is created,
otherwise you race with your invalidate callback.

I think the simplest option would be to add a new dma_buf_attach_dynamic
(well except a less crappy name).

> +	 *
> +	 * If provided the exporter can avoid pinning the backing store while
> +	 * mappings exists.
> +	 *
> +	 * The function is called with the lock of the reservation object
> +	 * associated with the dma_buf held and the mapping function must be
> +	 * called with this lock held as well. This makes sure that no mapping
> +	 * is created concurrently with an ongoing invalidation.
> +	 *
> +	 * After the callback all existing mappings are still valid until all
> +	 * fences in the dma_bufs reservation object are signaled, but should be
> +	 * destroyed by the importer as soon as possible.

Do we guarantee that the importer will attach a fence, after which the
mapping will be gone? What about re-trying? Or just best effort (i.e. only
useful for evicting to try to make room).

I think a helper which both unmaps _and_ waits for all the fences to clear
would be best, with some guarantees that it'll either fail or all the
mappings _will_ be gone. The locking for that one will be hilarious, since
we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
throw away the dmabuf->lock and superseed it entirely by the reservation
lock.


> +	 *
> +	 * New mappings can be created immediately, but can't be used before the
> +	 * exclusive fence in the dma_bufs reservation object is signaled.
> +	 */
> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);

Bunch of questions about exact semantics, but I very much like this. And I
think besides those technical details, the overall approach seems sound.
-Daniel

>  };
>  
>  /**
> @@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
>  					enum dma_data_direction);
>  void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
>  				enum dma_data_direction);
> +void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
>  int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  			     enum dma_data_direction dir);
>  int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
> -- 
> 2.14.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-12 17:07     ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 17:07 UTC (permalink / raw)
  To: Christian K??nig; +Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
> Each importer can now provide an invalidate_mappings callback.
> 
> This allows the exporter to provide the mappings without the need to pin
> the backing store.
> 
> Signed-off-by: Christian K??nig <christian.koenig@amd.com>
> ---
>  drivers/dma-buf/dma-buf.c | 25 +++++++++++++++++++++++++
>  include/linux/dma-buf.h   | 36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index d78d5fc173dc..ed8d5844ae74 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
>  
>  	might_sleep();
>  
> +	if (attach->invalidate_mappings)
> +		reservation_object_assert_held(attach->dmabuf->resv);
> +
>  	if (WARN_ON(!attach || !attach->dmabuf))
>  		return ERR_PTR(-EINVAL);
>  
> @@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
>  {
>  	might_sleep();
>  
> +	if (attach->invalidate_mappings)
> +		reservation_object_assert_held(attach->dmabuf->resv);
> +
>  	if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
>  		return;
>  
> @@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
>  }
>  EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
>  
> +/**
> + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
> + *
> + * @dmabuf:	[in]	buffer which mappings should be invalidated
> + *
> + * Informs all attachmenst that they need to destroy and recreated all their
> + * mappings.
> + */
> +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
> +{
> +	struct dma_buf_attachment *attach;
> +
> +	reservation_object_assert_held(dmabuf->resv);
> +
> +	list_for_each_entry(attach, &dmabuf->attachments, node)
> +		attach->invalidate_mappings(attach);

To make the locking work I think we also need to require importers to hold
the reservation object while attaching/detaching. Otherwise the list walk
above could go boom.

We could use the existing dma-buf lock, but I think that'll just result in
deadlocks.

> +}
> +EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
> +
>  /**
>   * DOC: cpu access
>   *
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 085db2fee2d7..c1e2f7d93509 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -91,6 +91,18 @@ struct dma_buf_ops {
>  	 */
>  	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
>  
> +	/**
> +	 * @supports_mapping_invalidation:
> +	 *
> +	 * True for exporters which supports unpinned DMA-buf operation using
> +	 * the reservation lock.
> +	 *
> +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
> +	 * @unmap_dma_buf callbacks can be called with the reservation lock
> +	 * held.
> +	 */
> +	bool supports_mapping_invalidation;

Why do we need this? Importer could simply always register with the
invalidate_mapping hook registered, and exporters could use it when they
see fit. That gives us more lockdep coverage to make sure importers use
their attachment callbacks correctly (aka they hold the reservation
object).

> +
>  	/**
>  	 * @map_dma_buf:
>  	 *
> @@ -326,6 +338,29 @@ struct dma_buf_attachment {
>  	struct device *dev;
>  	struct list_head node;
>  	void *priv;
> +
> +	/**
> +	 * @invalidate_mappings:
> +	 *
> +	 * Optional callback provided by the importer of the attachment which
> +	 * must be set before mappings are created.

This doesn't work, it must be set before the attachment is created,
otherwise you race with your invalidate callback.

I think the simplest option would be to add a new dma_buf_attach_dynamic
(well except a less crappy name).

> +	 *
> +	 * If provided the exporter can avoid pinning the backing store while
> +	 * mappings exists.
> +	 *
> +	 * The function is called with the lock of the reservation object
> +	 * associated with the dma_buf held and the mapping function must be
> +	 * called with this lock held as well. This makes sure that no mapping
> +	 * is created concurrently with an ongoing invalidation.
> +	 *
> +	 * After the callback all existing mappings are still valid until all
> +	 * fences in the dma_bufs reservation object are signaled, but should be
> +	 * destroyed by the importer as soon as possible.

Do we guarantee that the importer will attach a fence, after which the
mapping will be gone? What about re-trying? Or just best effort (i.e. only
useful for evicting to try to make room).

I think a helper which both unmaps _and_ waits for all the fences to clear
would be best, with some guarantees that it'll either fail or all the
mappings _will_ be gone. The locking for that one will be hilarious, since
we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
throw away the dmabuf->lock and superseed it entirely by the reservation
lock.


> +	 *
> +	 * New mappings can be created immediately, but can't be used before the
> +	 * exclusive fence in the dma_bufs reservation object is signaled.
> +	 */
> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);

Bunch of questions about exact semantics, but I very much like this. And I
think besides those technical details, the overall approach seems sound.
-Daniel

>  };
>  
>  /**
> @@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
>  					enum dma_data_direction);
>  void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
>  				enum dma_data_direction);
> +void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
>  int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  			     enum dma_data_direction dir);
>  int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
> -- 
> 2.14.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
@ 2018-03-12 17:24   ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 17:24 UTC (permalink / raw)
  To: Christian K??nig; +Cc: linaro-mm-sig, linux-media, dri-devel, amd-gfx

On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
> This set of patches adds an option invalidate_mappings callback to each
> DMA-buf attachment which can be filled in by the importer.
> 
> This callback allows the exporter to provided the DMA-buf content
> without pinning it. The reservation objects lock acts as synchronization
> point for buffer moves and creating mappings.
> 
> This set includes an implementation for amdgpu which should be rather
> easily portable to other DRM drivers.

Bunch of higher level comments, and one I've forgotten in reply to patch
1:

- What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
  for scanout)?

- pulling the dma-buf implementations into amdgpu makes sense, that's
  kinda how it was meant to be anyway. The gem prime helpers are a bit too
  much midlayer for my taste (mostly because nvidia wanted to bypass the
  EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
  extract more helpers once there's more ttm based drivers doing this.

Overall I like, there's some details to figure out first.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
@ 2018-03-12 17:24   ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 17:24 UTC (permalink / raw)
  To: Christian K??nig
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
> This set of patches adds an option invalidate_mappings callback to each
> DMA-buf attachment which can be filled in by the importer.
> 
> This callback allows the exporter to provided the DMA-buf content
> without pinning it. The reservation objects lock acts as synchronization
> point for buffer moves and creating mappings.
> 
> This set includes an implementation for amdgpu which should be rather
> easily portable to other DRM drivers.

Bunch of higher level comments, and one I've forgotten in reply to patch
1:

- What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
  for scanout)?

- pulling the dma-buf implementations into amdgpu makes sense, that's
  kinda how it was meant to be anyway. The gem prime helpers are a bit too
  much midlayer for my taste (mostly because nvidia wanted to bypass the
  EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
  extract more helpers once there's more ttm based drivers doing this.

Overall I like, there's some details to figure out first.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
  2018-03-12 17:07     ` Daniel Vetter
@ 2018-03-12 19:13       ` Christian König
  -1 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-12 19:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linaro-mm-sig, linux-media, dri-devel, amd-gfx

Am 12.03.2018 um 18:07 schrieb Daniel Vetter:
> On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
>> [SNIP]
>>   
>> +/**
>> + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
>> + *
>> + * @dmabuf:	[in]	buffer which mappings should be invalidated
>> + *
>> + * Informs all attachmenst that they need to destroy and recreated all their
>> + * mappings.
>> + */
>> +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
>> +{
>> +	struct dma_buf_attachment *attach;
>> +
>> +	reservation_object_assert_held(dmabuf->resv);
>> +
>> +	list_for_each_entry(attach, &dmabuf->attachments, node)
>> +		attach->invalidate_mappings(attach);
> To make the locking work I think we also need to require importers to hold
> the reservation object while attaching/detaching. Otherwise the list walk
> above could go boom.

Oh, good point. Going, to fix this.

> [SNIP]
>> +	/**
>> +	 * @supports_mapping_invalidation:
>> +	 *
>> +	 * True for exporters which supports unpinned DMA-buf operation using
>> +	 * the reservation lock.
>> +	 *
>> +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
>> +	 * @unmap_dma_buf callbacks can be called with the reservation lock
>> +	 * held.
>> +	 */
>> +	bool supports_mapping_invalidation;
> Why do we need this? Importer could simply always register with the
> invalidate_mapping hook registered, and exporters could use it when they
> see fit. That gives us more lockdep coverage to make sure importers use
> their attachment callbacks correctly (aka they hold the reservation
> object).

One sole reason: Backward compability.

I didn't wanted to audit all those different drivers if they can handle 
being called with the reservation lock held.

>
>> +
>>   	/**
>>   	 * @map_dma_buf:
>>   	 *
>> @@ -326,6 +338,29 @@ struct dma_buf_attachment {
>>   	struct device *dev;
>>   	struct list_head node;
>>   	void *priv;
>> +
>> +	/**
>> +	 * @invalidate_mappings:
>> +	 *
>> +	 * Optional callback provided by the importer of the attachment which
>> +	 * must be set before mappings are created.
> This doesn't work, it must be set before the attachment is created,
> otherwise you race with your invalidate callback.

Another good point.

>
> I think the simplest option would be to add a new dma_buf_attach_dynamic
> (well except a less crappy name).

Well how about adding an optional invalidate_mappings parameter to the 
existing dma_buf_attach?

>
>> +	 *
>> +	 * If provided the exporter can avoid pinning the backing store while
>> +	 * mappings exists.
>> +	 *
>> +	 * The function is called with the lock of the reservation object
>> +	 * associated with the dma_buf held and the mapping function must be
>> +	 * called with this lock held as well. This makes sure that no mapping
>> +	 * is created concurrently with an ongoing invalidation.
>> +	 *
>> +	 * After the callback all existing mappings are still valid until all
>> +	 * fences in the dma_bufs reservation object are signaled, but should be
>> +	 * destroyed by the importer as soon as possible.
> Do we guarantee that the importer will attach a fence, after which the
> mapping will be gone? What about re-trying? Or just best effort (i.e. only
> useful for evicting to try to make room).

The importer should attach fences for all it's operations with the DMA-buf.

> I think a helper which both unmaps _and_ waits for all the fences to clear
> would be best, with some guarantees that it'll either fail or all the
> mappings _will_ be gone. The locking for that one will be hilarious, since
> we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> throw away the dmabuf->lock and superseed it entirely by the reservation
> lock.

Big NAK on that. The whole API is asynchronously, e.g. we never block 
for any operation to finish.

Otherwise you run into big trouble with cross device GPU resets and 
stuff like that.

>> +	 *
>> +	 * New mappings can be created immediately, but can't be used before the
>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>> +	 */
>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> Bunch of questions about exact semantics, but I very much like this. And I
> think besides those technical details, the overall approach seems sound.

Yeah this initial implementation was buggy like hell. Just wanted to 
confirm that the idea is going in the right direction.

Thanks for the comments,
Christian.

> -Daniel
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-12 19:13       ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-12 19:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

Am 12.03.2018 um 18:07 schrieb Daniel Vetter:
> On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
>> [SNIP]
>>   
>> +/**
>> + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
>> + *
>> + * @dmabuf:	[in]	buffer which mappings should be invalidated
>> + *
>> + * Informs all attachmenst that they need to destroy and recreated all their
>> + * mappings.
>> + */
>> +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
>> +{
>> +	struct dma_buf_attachment *attach;
>> +
>> +	reservation_object_assert_held(dmabuf->resv);
>> +
>> +	list_for_each_entry(attach, &dmabuf->attachments, node)
>> +		attach->invalidate_mappings(attach);
> To make the locking work I think we also need to require importers to hold
> the reservation object while attaching/detaching. Otherwise the list walk
> above could go boom.

Oh, good point. Going, to fix this.

> [SNIP]
>> +	/**
>> +	 * @supports_mapping_invalidation:
>> +	 *
>> +	 * True for exporters which supports unpinned DMA-buf operation using
>> +	 * the reservation lock.
>> +	 *
>> +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
>> +	 * @unmap_dma_buf callbacks can be called with the reservation lock
>> +	 * held.
>> +	 */
>> +	bool supports_mapping_invalidation;
> Why do we need this? Importer could simply always register with the
> invalidate_mapping hook registered, and exporters could use it when they
> see fit. That gives us more lockdep coverage to make sure importers use
> their attachment callbacks correctly (aka they hold the reservation
> object).

One sole reason: Backward compability.

I didn't wanted to audit all those different drivers if they can handle 
being called with the reservation lock held.

>
>> +
>>   	/**
>>   	 * @map_dma_buf:
>>   	 *
>> @@ -326,6 +338,29 @@ struct dma_buf_attachment {
>>   	struct device *dev;
>>   	struct list_head node;
>>   	void *priv;
>> +
>> +	/**
>> +	 * @invalidate_mappings:
>> +	 *
>> +	 * Optional callback provided by the importer of the attachment which
>> +	 * must be set before mappings are created.
> This doesn't work, it must be set before the attachment is created,
> otherwise you race with your invalidate callback.

Another good point.

>
> I think the simplest option would be to add a new dma_buf_attach_dynamic
> (well except a less crappy name).

Well how about adding an optional invalidate_mappings parameter to the 
existing dma_buf_attach?

>
>> +	 *
>> +	 * If provided the exporter can avoid pinning the backing store while
>> +	 * mappings exists.
>> +	 *
>> +	 * The function is called with the lock of the reservation object
>> +	 * associated with the dma_buf held and the mapping function must be
>> +	 * called with this lock held as well. This makes sure that no mapping
>> +	 * is created concurrently with an ongoing invalidation.
>> +	 *
>> +	 * After the callback all existing mappings are still valid until all
>> +	 * fences in the dma_bufs reservation object are signaled, but should be
>> +	 * destroyed by the importer as soon as possible.
> Do we guarantee that the importer will attach a fence, after which the
> mapping will be gone? What about re-trying? Or just best effort (i.e. only
> useful for evicting to try to make room).

The importer should attach fences for all it's operations with the DMA-buf.

> I think a helper which both unmaps _and_ waits for all the fences to clear
> would be best, with some guarantees that it'll either fail or all the
> mappings _will_ be gone. The locking for that one will be hilarious, since
> we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> throw away the dmabuf->lock and superseed it entirely by the reservation
> lock.

Big NAK on that. The whole API is asynchronously, e.g. we never block 
for any operation to finish.

Otherwise you run into big trouble with cross device GPU resets and 
stuff like that.

>> +	 *
>> +	 * New mappings can be created immediately, but can't be used before the
>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>> +	 */
>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> Bunch of questions about exact semantics, but I very much like this. And I
> think besides those technical details, the overall approach seems sound.

Yeah this initial implementation was buggy like hell. Just wanted to 
confirm that the idea is going in the right direction.

Thanks for the comments,
Christian.

> -Daniel
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
  2018-03-12 17:24   ` Daniel Vetter
@ 2018-03-12 19:15     ` Christian König
  -1 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-12 19:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linaro-mm-sig, linux-media, dri-devel, amd-gfx

Am 12.03.2018 um 18:24 schrieb Daniel Vetter:
> On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
>> This set of patches adds an option invalidate_mappings callback to each
>> DMA-buf attachment which can be filled in by the importer.
>>
>> This callback allows the exporter to provided the DMA-buf content
>> without pinning it. The reservation objects lock acts as synchronization
>> point for buffer moves and creating mappings.
>>
>> This set includes an implementation for amdgpu which should be rather
>> easily portable to other DRM drivers.
> Bunch of higher level comments, and one I've forgotten in reply to patch
> 1:
>
> - What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
>    for scanout)?

When you need to pin an imported DMA-buf you need to detach and reattach 
without the invalidate_mappings callback.

> - pulling the dma-buf implementations into amdgpu makes sense, that's
>    kinda how it was meant to be anyway. The gem prime helpers are a bit too
>    much midlayer for my taste (mostly because nvidia wanted to bypass the
>    EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
>    extract more helpers once there's more ttm based drivers doing this.

Yeah, I though to abstract that similar to the AGP backend.

Just moving some callbacks around in TTM should be sufficient to 
de-midlayer the whole thing.

Thanks,
Christian.

>
> Overall I like, there's some details to figure out first.
> -Daniel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
@ 2018-03-12 19:15     ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-12 19:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

Am 12.03.2018 um 18:24 schrieb Daniel Vetter:
> On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
>> This set of patches adds an option invalidate_mappings callback to each
>> DMA-buf attachment which can be filled in by the importer.
>>
>> This callback allows the exporter to provided the DMA-buf content
>> without pinning it. The reservation objects lock acts as synchronization
>> point for buffer moves and creating mappings.
>>
>> This set includes an implementation for amdgpu which should be rather
>> easily portable to other DRM drivers.
> Bunch of higher level comments, and one I've forgotten in reply to patch
> 1:
>
> - What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
>    for scanout)?

When you need to pin an imported DMA-buf you need to detach and reattach 
without the invalidate_mappings callback.

> - pulling the dma-buf implementations into amdgpu makes sense, that's
>    kinda how it was meant to be anyway. The gem prime helpers are a bit too
>    much midlayer for my taste (mostly because nvidia wanted to bypass the
>    EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
>    extract more helpers once there's more ttm based drivers doing this.

Yeah, I though to abstract that similar to the AGP backend.

Just moving some callbacks around in TTM should be sufficient to 
de-midlayer the whole thing.

Thanks,
Christian.

>
> Overall I like, there's some details to figure out first.
> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
@ 2018-03-12 19:41       ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 19:41 UTC (permalink / raw)
  To: Christian König; +Cc: linaro-mm-sig, linux-media, dri-devel, amd-gfx list

On Mon, Mar 12, 2018 at 8:15 PM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 12.03.2018 um 18:24 schrieb Daniel Vetter:
>>
>> On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
>>>
>>> This set of patches adds an option invalidate_mappings callback to each
>>> DMA-buf attachment which can be filled in by the importer.
>>>
>>> This callback allows the exporter to provided the DMA-buf content
>>> without pinning it. The reservation objects lock acts as synchronization
>>> point for buffer moves and creating mappings.
>>>
>>> This set includes an implementation for amdgpu which should be rather
>>> easily portable to other DRM drivers.
>>
>> Bunch of higher level comments, and one I've forgotten in reply to patch
>> 1:
>>
>> - What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
>>    for scanout)?
>
>
> When you need to pin an imported DMA-buf you need to detach and reattach
> without the invalidate_mappings callback.

I think that must both be better documented, and also somehow enforced
with checks. Atm nothing makes sure you actually manage to unmap if
you claim to be able to do so.

I think a helper to switch from pinned to unpinned would be lovely
(just need to clear/reset the ->invalidate_mapping pointer while
holding the reservation). Or do you expect to map buffers differently
depending whether you can move them or not? At least for i915 we'd
need to rework our driver quite a bit if you expect us to throw the
mapping away just to be able to pin it. Atm pinning requires that it's
mapped already (and depending upon platform the gpu might be using
that exact mapping to render, so unmapping for pinning is a bad idea
for us).

>> - pulling the dma-buf implementations into amdgpu makes sense, that's
>>    kinda how it was meant to be anyway. The gem prime helpers are a bit
>> too
>>    much midlayer for my taste (mostly because nvidia wanted to bypass the
>>    EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
>>    extract more helpers once there's more ttm based drivers doing this.
>
>
> Yeah, I though to abstract that similar to the AGP backend.
>
> Just moving some callbacks around in TTM should be sufficient to de-midlayer
> the whole thing.

Yeah TTM has all the abstractions needed to handle dma-bufs
"properly", it's just sometimes at the wrong level or can't be
overriden. At least per my understanding of TTM (which is most likely
... confused).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RFC: unpinned DMA-buf exporting
@ 2018-03-12 19:41       ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-12 19:41 UTC (permalink / raw)
  To: Christian König
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw, amd-gfx list, dri-devel,
	linux-media-u79uwXL29TY76Z2rM5mHXA

On Mon, Mar 12, 2018 at 8:15 PM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 12.03.2018 um 18:24 schrieb Daniel Vetter:
>>
>> On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:
>>>
>>> This set of patches adds an option invalidate_mappings callback to each
>>> DMA-buf attachment which can be filled in by the importer.
>>>
>>> This callback allows the exporter to provided the DMA-buf content
>>> without pinning it. The reservation objects lock acts as synchronization
>>> point for buffer moves and creating mappings.
>>>
>>> This set includes an implementation for amdgpu which should be rather
>>> easily portable to other DRM drivers.
>>
>> Bunch of higher level comments, and one I've forgotten in reply to patch
>> 1:
>>
>> - What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
>>    for scanout)?
>
>
> When you need to pin an imported DMA-buf you need to detach and reattach
> without the invalidate_mappings callback.

I think that must both be better documented, and also somehow enforced
with checks. Atm nothing makes sure you actually manage to unmap if
you claim to be able to do so.

I think a helper to switch from pinned to unpinned would be lovely
(just need to clear/reset the ->invalidate_mapping pointer while
holding the reservation). Or do you expect to map buffers differently
depending whether you can move them or not? At least for i915 we'd
need to rework our driver quite a bit if you expect us to throw the
mapping away just to be able to pin it. Atm pinning requires that it's
mapped already (and depending upon platform the gpu might be using
that exact mapping to render, so unmapping for pinning is a bad idea
for us).

>> - pulling the dma-buf implementations into amdgpu makes sense, that's
>>    kinda how it was meant to be anyway. The gem prime helpers are a bit
>> too
>>    much midlayer for my taste (mostly because nvidia wanted to bypass the
>>    EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
>>    extract more helpers once there's more ttm based drivers doing this.
>
>
> Yeah, I though to abstract that similar to the AGP backend.
>
> Just moving some callbacks around in TTM should be sufficient to de-midlayer
> the whole thing.

Yeah TTM has all the abstractions needed to handle dma-bufs
"properly", it's just sometimes at the wrong level or can't be
overriden. At least per my understanding of TTM (which is most likely
... confused).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
  2018-03-12 19:13       ` Christian König
@ 2018-03-13 15:17         ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-13 15:17 UTC (permalink / raw)
  To: christian.koenig
  Cc: Daniel Vetter, linaro-mm-sig, linux-media, dri-devel, amd-gfx

On Mon, Mar 12, 2018 at 08:13:15PM +0100, Christian K??nig wrote:
> Am 12.03.2018 um 18:07 schrieb Daniel Vetter:
> > On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
> > > [SNIP]
> > > +/**
> > > + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
> > > + *
> > > + * @dmabuf:	[in]	buffer which mappings should be invalidated
> > > + *
> > > + * Informs all attachmenst that they need to destroy and recreated all their
> > > + * mappings.
> > > + */
> > > +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
> > > +{
> > > +	struct dma_buf_attachment *attach;
> > > +
> > > +	reservation_object_assert_held(dmabuf->resv);
> > > +
> > > +	list_for_each_entry(attach, &dmabuf->attachments, node)
> > > +		attach->invalidate_mappings(attach);
> > To make the locking work I think we also need to require importers to hold
> > the reservation object while attaching/detaching. Otherwise the list walk
> > above could go boom.
> 
> Oh, good point. Going, to fix this.
> 
> > [SNIP]
> > > +	/**
> > > +	 * @supports_mapping_invalidation:
> > > +	 *
> > > +	 * True for exporters which supports unpinned DMA-buf operation using
> > > +	 * the reservation lock.
> > > +	 *
> > > +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
> > > +	 * @unmap_dma_buf callbacks can be called with the reservation lock
> > > +	 * held.
> > > +	 */
> > > +	bool supports_mapping_invalidation;
> > Why do we need this? Importer could simply always register with the
> > invalidate_mapping hook registered, and exporters could use it when they
> > see fit. That gives us more lockdep coverage to make sure importers use
> > their attachment callbacks correctly (aka they hold the reservation
> > object).
> 
> One sole reason: Backward compability.
> 
> I didn't wanted to audit all those different drivers if they can handle
> being called with the reservation lock held.
> 
> > 
> > > +
> > >   	/**
> > >   	 * @map_dma_buf:
> > >   	 *
> > > @@ -326,6 +338,29 @@ struct dma_buf_attachment {
> > >   	struct device *dev;
> > >   	struct list_head node;
> > >   	void *priv;
> > > +
> > > +	/**
> > > +	 * @invalidate_mappings:
> > > +	 *
> > > +	 * Optional callback provided by the importer of the attachment which
> > > +	 * must be set before mappings are created.
> > This doesn't work, it must be set before the attachment is created,
> > otherwise you race with your invalidate callback.
> 
> Another good point.
> 
> > 
> > I think the simplest option would be to add a new dma_buf_attach_dynamic
> > (well except a less crappy name).
> 
> Well how about adding an optional invalidate_mappings parameter to the
> existing dma_buf_attach?

Not sure that's best, it might confuse dumb importers and you need to
change all the callers. But up to you.

> > > +	 *
> > > +	 * If provided the exporter can avoid pinning the backing store while
> > > +	 * mappings exists.
> > > +	 *
> > > +	 * The function is called with the lock of the reservation object
> > > +	 * associated with the dma_buf held and the mapping function must be
> > > +	 * called with this lock held as well. This makes sure that no mapping
> > > +	 * is created concurrently with an ongoing invalidation.
> > > +	 *
> > > +	 * After the callback all existing mappings are still valid until all
> > > +	 * fences in the dma_bufs reservation object are signaled, but should be
> > > +	 * destroyed by the importer as soon as possible.
> > Do we guarantee that the importer will attach a fence, after which the
> > mapping will be gone? What about re-trying? Or just best effort (i.e. only
> > useful for evicting to try to make room).
> 
> The importer should attach fences for all it's operations with the DMA-buf.
> 
> > I think a helper which both unmaps _and_ waits for all the fences to clear
> > would be best, with some guarantees that it'll either fail or all the
> > mappings _will_ be gone. The locking for that one will be hilarious, since
> > we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> > throw away the dmabuf->lock and superseed it entirely by the reservation
> > lock.
> 
> Big NAK on that. The whole API is asynchronously, e.g. we never block for
> any operation to finish.
> 
> Otherwise you run into big trouble with cross device GPU resets and stuff
> like that.

But how will the unmapping work then? You can't throw the sg list away
before the dma stopped. The dma only stops once the fence is signalled.
The importer can't call dma_buf_detach because the reservation lock is
hogged already by the exporter trying to unmap everything.

How is this supposed to work?

Re GPU might cause a deadlock: Isn't that already a problem if you hold
reservations of buffers used on other gpus, which want those reservations
to complete the gpu reset, but that gpu reset blocks some fence that the
reservation holder is waiting for?

We have tons of fun with deadlocks against GPU resets, and loooooots of
testcases, and I kinda get the impression amdgpu is throwing a lot of
issues under the rug through trylock tricks that shut up lockdep, but
don't fix much really.

btw adding cross-release lockdep annotations for fences will probably turn
up _lots_ more bugs in this area.

> > > +	 *
> > > +	 * New mappings can be created immediately, but can't be used before the
> > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > +	 */
> > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > Bunch of questions about exact semantics, but I very much like this. And I
> > think besides those technical details, the overall approach seems sound.
> 
> Yeah this initial implementation was buggy like hell. Just wanted to confirm
> that the idea is going in the right direction.

I wanted this 7 years ago, idea very much acked :-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 15:17         ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-13 15:17 UTC (permalink / raw)
  To: christian.koenig; +Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

On Mon, Mar 12, 2018 at 08:13:15PM +0100, Christian K??nig wrote:
> Am 12.03.2018 um 18:07 schrieb Daniel Vetter:
> > On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:
> > > [SNIP]
> > > +/**
> > > + * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
> > > + *
> > > + * @dmabuf:	[in]	buffer which mappings should be invalidated
> > > + *
> > > + * Informs all attachmenst that they need to destroy and recreated all their
> > > + * mappings.
> > > + */
> > > +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
> > > +{
> > > +	struct dma_buf_attachment *attach;
> > > +
> > > +	reservation_object_assert_held(dmabuf->resv);
> > > +
> > > +	list_for_each_entry(attach, &dmabuf->attachments, node)
> > > +		attach->invalidate_mappings(attach);
> > To make the locking work I think we also need to require importers to hold
> > the reservation object while attaching/detaching. Otherwise the list walk
> > above could go boom.
> 
> Oh, good point. Going, to fix this.
> 
> > [SNIP]
> > > +	/**
> > > +	 * @supports_mapping_invalidation:
> > > +	 *
> > > +	 * True for exporters which supports unpinned DMA-buf operation using
> > > +	 * the reservation lock.
> > > +	 *
> > > +	 * When attachment->invalidate_mappings is set the @map_dma_buf and
> > > +	 * @unmap_dma_buf callbacks can be called with the reservation lock
> > > +	 * held.
> > > +	 */
> > > +	bool supports_mapping_invalidation;
> > Why do we need this? Importer could simply always register with the
> > invalidate_mapping hook registered, and exporters could use it when they
> > see fit. That gives us more lockdep coverage to make sure importers use
> > their attachment callbacks correctly (aka they hold the reservation
> > object).
> 
> One sole reason: Backward compability.
> 
> I didn't wanted to audit all those different drivers if they can handle
> being called with the reservation lock held.
> 
> > 
> > > +
> > >   	/**
> > >   	 * @map_dma_buf:
> > >   	 *
> > > @@ -326,6 +338,29 @@ struct dma_buf_attachment {
> > >   	struct device *dev;
> > >   	struct list_head node;
> > >   	void *priv;
> > > +
> > > +	/**
> > > +	 * @invalidate_mappings:
> > > +	 *
> > > +	 * Optional callback provided by the importer of the attachment which
> > > +	 * must be set before mappings are created.
> > This doesn't work, it must be set before the attachment is created,
> > otherwise you race with your invalidate callback.
> 
> Another good point.
> 
> > 
> > I think the simplest option would be to add a new dma_buf_attach_dynamic
> > (well except a less crappy name).
> 
> Well how about adding an optional invalidate_mappings parameter to the
> existing dma_buf_attach?

Not sure that's best, it might confuse dumb importers and you need to
change all the callers. But up to you.

> > > +	 *
> > > +	 * If provided the exporter can avoid pinning the backing store while
> > > +	 * mappings exists.
> > > +	 *
> > > +	 * The function is called with the lock of the reservation object
> > > +	 * associated with the dma_buf held and the mapping function must be
> > > +	 * called with this lock held as well. This makes sure that no mapping
> > > +	 * is created concurrently with an ongoing invalidation.
> > > +	 *
> > > +	 * After the callback all existing mappings are still valid until all
> > > +	 * fences in the dma_bufs reservation object are signaled, but should be
> > > +	 * destroyed by the importer as soon as possible.
> > Do we guarantee that the importer will attach a fence, after which the
> > mapping will be gone? What about re-trying? Or just best effort (i.e. only
> > useful for evicting to try to make room).
> 
> The importer should attach fences for all it's operations with the DMA-buf.
> 
> > I think a helper which both unmaps _and_ waits for all the fences to clear
> > would be best, with some guarantees that it'll either fail or all the
> > mappings _will_ be gone. The locking for that one will be hilarious, since
> > we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> > throw away the dmabuf->lock and superseed it entirely by the reservation
> > lock.
> 
> Big NAK on that. The whole API is asynchronously, e.g. we never block for
> any operation to finish.
> 
> Otherwise you run into big trouble with cross device GPU resets and stuff
> like that.

But how will the unmapping work then? You can't throw the sg list away
before the dma stopped. The dma only stops once the fence is signalled.
The importer can't call dma_buf_detach because the reservation lock is
hogged already by the exporter trying to unmap everything.

How is this supposed to work?

Re GPU might cause a deadlock: Isn't that already a problem if you hold
reservations of buffers used on other gpus, which want those reservations
to complete the gpu reset, but that gpu reset blocks some fence that the
reservation holder is waiting for?

We have tons of fun with deadlocks against GPU resets, and loooooots of
testcases, and I kinda get the impression amdgpu is throwing a lot of
issues under the rug through trylock tricks that shut up lockdep, but
don't fix much really.

btw adding cross-release lockdep annotations for fences will probably turn
up _lots_ more bugs in this area.

> > > +	 *
> > > +	 * New mappings can be created immediately, but can't be used before the
> > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > +	 */
> > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > Bunch of questions about exact semantics, but I very much like this. And I
> > think besides those technical details, the overall approach seems sound.
> 
> Yeah this initial implementation was buggy like hell. Just wanted to confirm
> that the idea is going in the right direction.

I wanted this 7 years ago, idea very much acked :-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
  2018-03-13 15:17         ` Daniel Vetter
@ 2018-03-13 15:52           ` Christian König
  -1 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-13 15:52 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> [SNIP]
>>> I think a helper which both unmaps _and_ waits for all the fences to clear
>>> would be best, with some guarantees that it'll either fail or all the
>>> mappings _will_ be gone. The locking for that one will be hilarious, since
>>> we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
>>> throw away the dmabuf->lock and superseed it entirely by the reservation
>>> lock.
>> Big NAK on that. The whole API is asynchronously, e.g. we never block for
>> any operation to finish.
>>
>> Otherwise you run into big trouble with cross device GPU resets and stuff
>> like that.
> But how will the unmapping work then? You can't throw the sg list away
> before the dma stopped. The dma only stops once the fence is signalled.
> The importer can't call dma_buf_detach because the reservation lock is
> hogged already by the exporter trying to unmap everything.
>
> How is this supposed to work?

Even after invalidation the sg list stays alive until it is explicitly 
destroyed by the importer using dma_buf_unmap_attachment() which in turn 
is only allowed after all fences have signaled.

The implementation is in ttm_bo_pipeline_gutting(), basically we use the 
same functionality as for pipelined moves/evictions which hangs the old 
backing store on a dummy object and destroys it after all fences signaled.

While the old sg list is still about to be destroyed the importer can 
request a new sg list for the new location of the DMA-buf using 
dma_buf_map_attachment(). This new location becomes valid after the move 
fence in the reservation object is signaled.

So from the CPU point of view multiple sg list could exists at the same 
time which allows us to have a seamless transition from the old to the 
new location from the GPU point of view.

> Re GPU might cause a deadlock: Isn't that already a problem if you hold
> reservations of buffers used on other gpus, which want those reservations
> to complete the gpu reset, but that gpu reset blocks some fence that the
> reservation holder is waiting for?

Correct, that's why amdgpu and TTM tries quite hard to never wait for a 
fence while a reservation object is locked.

The only use case I haven't fixed so far is reaping deleted object 
during eviction, but that is only a matter of my free time to fix it.

> We have tons of fun with deadlocks against GPU resets, and loooooots of
> testcases, and I kinda get the impression amdgpu is throwing a lot of
> issues under the rug through trylock tricks that shut up lockdep, but
> don't fix much really.

Hui? Why do you think that? The only trylock I'm aware of is during 
eviction and there it isn't a problem.

> btw adding cross-release lockdep annotations for fences will probably turn
> up _lots_ more bugs in this area.

At least for amdgpu that should be handled by now.

>>>> +	 *
>>>> +	 * New mappings can be created immediately, but can't be used before the
>>>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>>>> +	 */
>>>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
>>> Bunch of questions about exact semantics, but I very much like this. And I
>>> think besides those technical details, the overall approach seems sound.
>> Yeah this initial implementation was buggy like hell. Just wanted to confirm
>> that the idea is going in the right direction.
> I wanted this 7 years ago, idea very much acked :-)
>
Ok, thanks. Good to know.

Christian.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 15:52           ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-13 15:52 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: linaro-mm-sig, dri-devel, amd-gfx, linux-media

Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> [SNIP]
>>> I think a helper which both unmaps _and_ waits for all the fences to clear
>>> would be best, with some guarantees that it'll either fail or all the
>>> mappings _will_ be gone. The locking for that one will be hilarious, since
>>> we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
>>> throw away the dmabuf->lock and superseed it entirely by the reservation
>>> lock.
>> Big NAK on that. The whole API is asynchronously, e.g. we never block for
>> any operation to finish.
>>
>> Otherwise you run into big trouble with cross device GPU resets and stuff
>> like that.
> But how will the unmapping work then? You can't throw the sg list away
> before the dma stopped. The dma only stops once the fence is signalled.
> The importer can't call dma_buf_detach because the reservation lock is
> hogged already by the exporter trying to unmap everything.
>
> How is this supposed to work?

Even after invalidation the sg list stays alive until it is explicitly 
destroyed by the importer using dma_buf_unmap_attachment() which in turn 
is only allowed after all fences have signaled.

The implementation is in ttm_bo_pipeline_gutting(), basically we use the 
same functionality as for pipelined moves/evictions which hangs the old 
backing store on a dummy object and destroys it after all fences signaled.

While the old sg list is still about to be destroyed the importer can 
request a new sg list for the new location of the DMA-buf using 
dma_buf_map_attachment(). This new location becomes valid after the move 
fence in the reservation object is signaled.

So from the CPU point of view multiple sg list could exists at the same 
time which allows us to have a seamless transition from the old to the 
new location from the GPU point of view.

> Re GPU might cause a deadlock: Isn't that already a problem if you hold
> reservations of buffers used on other gpus, which want those reservations
> to complete the gpu reset, but that gpu reset blocks some fence that the
> reservation holder is waiting for?

Correct, that's why amdgpu and TTM tries quite hard to never wait for a 
fence while a reservation object is locked.

The only use case I haven't fixed so far is reaping deleted object 
during eviction, but that is only a matter of my free time to fix it.

> We have tons of fun with deadlocks against GPU resets, and loooooots of
> testcases, and I kinda get the impression amdgpu is throwing a lot of
> issues under the rug through trylock tricks that shut up lockdep, but
> don't fix much really.

Hui? Why do you think that? The only trylock I'm aware of is during 
eviction and there it isn't a problem.

> btw adding cross-release lockdep annotations for fences will probably turn
> up _lots_ more bugs in this area.

At least for amdgpu that should be handled by now.

>>>> +	 *
>>>> +	 * New mappings can be created immediately, but can't be used before the
>>>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>>>> +	 */
>>>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
>>> Bunch of questions about exact semantics, but I very much like this. And I
>>> think besides those technical details, the overall approach seems sound.
>> Yeah this initial implementation was buggy like hell. Just wanted to confirm
>> that the idea is going in the right direction.
> I wanted this 7 years ago, idea very much acked :-)
>
Ok, thanks. Good to know.

Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 16:00             ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-13 16:00 UTC (permalink / raw)
  To: christian.koenig
  Cc: Daniel Vetter, linaro-mm-sig, amd-gfx, dri-devel, linux-media

On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
> Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> > [SNIP]
> > > > I think a helper which both unmaps _and_ waits for all the fences to clear
> > > > would be best, with some guarantees that it'll either fail or all the
> > > > mappings _will_ be gone. The locking for that one will be hilarious, since
> > > > we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> > > > throw away the dmabuf->lock and superseed it entirely by the reservation
> > > > lock.
> > > Big NAK on that. The whole API is asynchronously, e.g. we never block for
> > > any operation to finish.
> > > 
> > > Otherwise you run into big trouble with cross device GPU resets and stuff
> > > like that.
> > But how will the unmapping work then? You can't throw the sg list away
> > before the dma stopped. The dma only stops once the fence is signalled.
> > The importer can't call dma_buf_detach because the reservation lock is
> > hogged already by the exporter trying to unmap everything.
> > 
> > How is this supposed to work?
> 
> Even after invalidation the sg list stays alive until it is explicitly
> destroyed by the importer using dma_buf_unmap_attachment() which in turn is
> only allowed after all fences have signaled.
> 
> The implementation is in ttm_bo_pipeline_gutting(), basically we use the
> same functionality as for pipelined moves/evictions which hangs the old
> backing store on a dummy object and destroys it after all fences signaled.
> 
> While the old sg list is still about to be destroyed the importer can
> request a new sg list for the new location of the DMA-buf using
> dma_buf_map_attachment(). This new location becomes valid after the move
> fence in the reservation object is signaled.
> 
> So from the CPU point of view multiple sg list could exists at the same time
> which allows us to have a seamless transition from the old to the new
> location from the GPU point of view.

Ok, so plan is to support fully pipeline moves and everything, with the
old sg tables lazily cleaned up. I was thinking more about evicting stuff
and throwing it out, where there's not going to be any new sg list but the
object is going to be swapped out.

I think some state flow charts (we can do SVG or DOT) in the kerneldoc
would be sweet.

> > Re GPU might cause a deadlock: Isn't that already a problem if you hold
> > reservations of buffers used on other gpus, which want those reservations
> > to complete the gpu reset, but that gpu reset blocks some fence that the
> > reservation holder is waiting for?
> 
> Correct, that's why amdgpu and TTM tries quite hard to never wait for a
> fence while a reservation object is locked.

We might have a fairly huge mismatch of expectations here :-/

> The only use case I haven't fixed so far is reaping deleted object during
> eviction, but that is only a matter of my free time to fix it.

Yeah, this is the hard one.

In general the assumption is that dma_fence will get signalled no matter
what you're doing, assuming the only thing you need is to not block
interrupts. The i915 gpu reset logic to make that work is a bit a work of
art ...

If we expect amdgpu and i915 to cooperate with shared buffers I guess one
has to give in. No idea how to do that best.

> > We have tons of fun with deadlocks against GPU resets, and loooooots of
> > testcases, and I kinda get the impression amdgpu is throwing a lot of
> > issues under the rug through trylock tricks that shut up lockdep, but
> > don't fix much really.
> 
> Hui? Why do you think that? The only trylock I'm aware of is during eviction
> and there it isn't a problem.

mmap fault handler had one too last time I looked, and it smelled fishy.

> > btw adding cross-release lockdep annotations for fences will probably turn
> > up _lots_ more bugs in this area.
> 
> At least for amdgpu that should be handled by now.

You're sure? :-)

Trouble is that cross-release wasn't even ever enabled, much less anyone
typed the dma_fence annotations. And just cross-release alone turned up
_lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
reset) and core mm stuff.

I'd be seriously surprised if it wouldn't find an entire rats nest of
issues around dma_fence once we enable it.
-Daniel

> > > > > +	 *
> > > > > +	 * New mappings can be created immediately, but can't be used before the
> > > > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > > > +	 */
> > > > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > > > Bunch of questions about exact semantics, but I very much like this. And I
> > > > think besides those technical details, the overall approach seems sound.
> > > Yeah this initial implementation was buggy like hell. Just wanted to confirm
> > > that the idea is going in the right direction.
> > I wanted this 7 years ago, idea very much acked :-)
> > 
> Ok, thanks. Good to know.
> 
> Christian.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 16:00             ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-13 16:00 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	linux-media-u79uwXL29TY76Z2rM5mHXA

On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
> Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> > [SNIP]
> > > > I think a helper which both unmaps _and_ waits for all the fences to clear
> > > > would be best, with some guarantees that it'll either fail or all the
> > > > mappings _will_ be gone. The locking for that one will be hilarious, since
> > > > we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
> > > > throw away the dmabuf->lock and superseed it entirely by the reservation
> > > > lock.
> > > Big NAK on that. The whole API is asynchronously, e.g. we never block for
> > > any operation to finish.
> > > 
> > > Otherwise you run into big trouble with cross device GPU resets and stuff
> > > like that.
> > But how will the unmapping work then? You can't throw the sg list away
> > before the dma stopped. The dma only stops once the fence is signalled.
> > The importer can't call dma_buf_detach because the reservation lock is
> > hogged already by the exporter trying to unmap everything.
> > 
> > How is this supposed to work?
> 
> Even after invalidation the sg list stays alive until it is explicitly
> destroyed by the importer using dma_buf_unmap_attachment() which in turn is
> only allowed after all fences have signaled.
> 
> The implementation is in ttm_bo_pipeline_gutting(), basically we use the
> same functionality as for pipelined moves/evictions which hangs the old
> backing store on a dummy object and destroys it after all fences signaled.
> 
> While the old sg list is still about to be destroyed the importer can
> request a new sg list for the new location of the DMA-buf using
> dma_buf_map_attachment(). This new location becomes valid after the move
> fence in the reservation object is signaled.
> 
> So from the CPU point of view multiple sg list could exists at the same time
> which allows us to have a seamless transition from the old to the new
> location from the GPU point of view.

Ok, so plan is to support fully pipeline moves and everything, with the
old sg tables lazily cleaned up. I was thinking more about evicting stuff
and throwing it out, where there's not going to be any new sg list but the
object is going to be swapped out.

I think some state flow charts (we can do SVG or DOT) in the kerneldoc
would be sweet.

> > Re GPU might cause a deadlock: Isn't that already a problem if you hold
> > reservations of buffers used on other gpus, which want those reservations
> > to complete the gpu reset, but that gpu reset blocks some fence that the
> > reservation holder is waiting for?
> 
> Correct, that's why amdgpu and TTM tries quite hard to never wait for a
> fence while a reservation object is locked.

We might have a fairly huge mismatch of expectations here :-/

> The only use case I haven't fixed so far is reaping deleted object during
> eviction, but that is only a matter of my free time to fix it.

Yeah, this is the hard one.

In general the assumption is that dma_fence will get signalled no matter
what you're doing, assuming the only thing you need is to not block
interrupts. The i915 gpu reset logic to make that work is a bit a work of
art ...

If we expect amdgpu and i915 to cooperate with shared buffers I guess one
has to give in. No idea how to do that best.

> > We have tons of fun with deadlocks against GPU resets, and loooooots of
> > testcases, and I kinda get the impression amdgpu is throwing a lot of
> > issues under the rug through trylock tricks that shut up lockdep, but
> > don't fix much really.
> 
> Hui? Why do you think that? The only trylock I'm aware of is during eviction
> and there it isn't a problem.

mmap fault handler had one too last time I looked, and it smelled fishy.

> > btw adding cross-release lockdep annotations for fences will probably turn
> > up _lots_ more bugs in this area.
> 
> At least for amdgpu that should be handled by now.

You're sure? :-)

Trouble is that cross-release wasn't even ever enabled, much less anyone
typed the dma_fence annotations. And just cross-release alone turned up
_lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
reset) and core mm stuff.

I'd be seriously surprised if it wouldn't find an entire rats nest of
issues around dma_fence once we enable it.
-Daniel

> > > > > +	 *
> > > > > +	 * New mappings can be created immediately, but can't be used before the
> > > > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > > > +	 */
> > > > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > > > Bunch of questions about exact semantics, but I very much like this. And I
> > > > think besides those technical details, the overall approach seems sound.
> > > Yeah this initial implementation was buggy like hell. Just wanted to confirm
> > > that the idea is going in the right direction.
> > I wanted this 7 years ago, idea very much acked :-)
> > 
> Ok, thanks. Good to know.
> 
> Christian.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 17:20               ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-13 17:20 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: linaro-mm-sig, dri-devel, amd-gfx, linux-media

Am 13.03.2018 um 17:00 schrieb Daniel Vetter:
> On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
>> Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
>> [SNIP]
> Ok, so plan is to support fully pipeline moves and everything, with the
> old sg tables lazily cleaned up. I was thinking more about evicting stuff
> and throwing it out, where there's not going to be any new sg list but the
> object is going to be swapped out.

Yes, exactly. Well my example was the unlikely case when the object is 
swapped out and immediately swapped in again because somebody needs it.

>
> I think some state flow charts (we can do SVG or DOT) in the kerneldoc
> would be sweet.Yeah, probably a good idea.

Sounds good and I find it great that you're volunteering for that :D

Ok seriously, my drawing capabilities are a bit underdeveloped. So I 
would prefer if somebody could at least help with that.

>>> Re GPU might cause a deadlock: Isn't that already a problem if you hold
>>> reservations of buffers used on other gpus, which want those reservations
>>> to complete the gpu reset, but that gpu reset blocks some fence that the
>>> reservation holder is waiting for?
>> Correct, that's why amdgpu and TTM tries quite hard to never wait for a
>> fence while a reservation object is locked.
> We might have a fairly huge mismatch of expectations here :-/

What do you mean with that?

>> The only use case I haven't fixed so far is reaping deleted object during
>> eviction, but that is only a matter of my free time to fix it.
> Yeah, this is the hard one.

Actually it isn't so hard, it's just that I didn't had time so far to 
clean it up and we never hit that issue so far during our reset testing.

The main point missing just a bit of functionality in the reservation 
object and Chris and I already had a good idea how to implement that.

> In general the assumption is that dma_fence will get signalled no matter
> what you're doing, assuming the only thing you need is to not block
> interrupts. The i915 gpu reset logic to make that work is a bit a work of
> art ...

Correct, but I don't understand why that is so hard on i915? Our GPU 
scheduler makes all of that rather trivial, e.g. fences either signal 
correctly or are aborted and set as erroneous after a timeout.

> If we expect amdgpu and i915 to cooperate with shared buffers I guess one
> has to give in. No idea how to do that best.

Again at least from amdgpu side I don't see much of an issue with that. 
So what exactly do you have in mind here?

>>> We have tons of fun with deadlocks against GPU resets, and loooooots of
>>> testcases, and I kinda get the impression amdgpu is throwing a lot of
>>> issues under the rug through trylock tricks that shut up lockdep, but
>>> don't fix much really.
>> Hui? Why do you think that? The only trylock I'm aware of is during eviction
>> and there it isn't a problem.
> mmap fault handler had one too last time I looked, and it smelled fishy.

Good point, never wrapped my head fully around that one either.

>>> btw adding cross-release lockdep annotations for fences will probably turn
>>> up _lots_ more bugs in this area.
>> At least for amdgpu that should be handled by now.
> You're sure? :-)

Yes, except for fallback paths and bootup self tests we simply never 
wait for fences while holding locks.

> Trouble is that cross-release wasn't even ever enabled, much less anyone
> typed the dma_fence annotations. And just cross-release alone turned up
> _lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
> reset) and core mm stuff.

Yeah, we had lots of fun with the mm locks as well but as far as I know 
Felix and I already fixed all of them.

Christian.

> I'd be seriously surprised if it wouldn't find an entire rats nest of
> issues around dma_fence once we enable it.
> -Daniel
>
>>>>>> +	 *
>>>>>> +	 * New mappings can be created immediately, but can't be used before the
>>>>>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>>>>>> +	 */
>>>>>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
>>>>> Bunch of questions about exact semantics, but I very much like this. And I
>>>>> think besides those technical details, the overall approach seems sound.
>>>> Yeah this initial implementation was buggy like hell. Just wanted to confirm
>>>> that the idea is going in the right direction.
>>> I wanted this 7 years ago, idea very much acked :-)
>>>
>> Ok, thanks. Good to know.
>>
>> Christian.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-13 17:20               ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-13 17:20 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig-5C7GfCeVMHo
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

Am 13.03.2018 um 17:00 schrieb Daniel Vetter:
> On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
>> Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
>> [SNIP]
> Ok, so plan is to support fully pipeline moves and everything, with the
> old sg tables lazily cleaned up. I was thinking more about evicting stuff
> and throwing it out, where there's not going to be any new sg list but the
> object is going to be swapped out.

Yes, exactly. Well my example was the unlikely case when the object is 
swapped out and immediately swapped in again because somebody needs it.

>
> I think some state flow charts (we can do SVG or DOT) in the kerneldoc
> would be sweet.Yeah, probably a good idea.

Sounds good and I find it great that you're volunteering for that :D

Ok seriously, my drawing capabilities are a bit underdeveloped. So I 
would prefer if somebody could at least help with that.

>>> Re GPU might cause a deadlock: Isn't that already a problem if you hold
>>> reservations of buffers used on other gpus, which want those reservations
>>> to complete the gpu reset, but that gpu reset blocks some fence that the
>>> reservation holder is waiting for?
>> Correct, that's why amdgpu and TTM tries quite hard to never wait for a
>> fence while a reservation object is locked.
> We might have a fairly huge mismatch of expectations here :-/

What do you mean with that?

>> The only use case I haven't fixed so far is reaping deleted object during
>> eviction, but that is only a matter of my free time to fix it.
> Yeah, this is the hard one.

Actually it isn't so hard, it's just that I didn't had time so far to 
clean it up and we never hit that issue so far during our reset testing.

The main point missing just a bit of functionality in the reservation 
object and Chris and I already had a good idea how to implement that.

> In general the assumption is that dma_fence will get signalled no matter
> what you're doing, assuming the only thing you need is to not block
> interrupts. The i915 gpu reset logic to make that work is a bit a work of
> art ...

Correct, but I don't understand why that is so hard on i915? Our GPU 
scheduler makes all of that rather trivial, e.g. fences either signal 
correctly or are aborted and set as erroneous after a timeout.

> If we expect amdgpu and i915 to cooperate with shared buffers I guess one
> has to give in. No idea how to do that best.

Again at least from amdgpu side I don't see much of an issue with that. 
So what exactly do you have in mind here?

>>> We have tons of fun with deadlocks against GPU resets, and loooooots of
>>> testcases, and I kinda get the impression amdgpu is throwing a lot of
>>> issues under the rug through trylock tricks that shut up lockdep, but
>>> don't fix much really.
>> Hui? Why do you think that? The only trylock I'm aware of is during eviction
>> and there it isn't a problem.
> mmap fault handler had one too last time I looked, and it smelled fishy.

Good point, never wrapped my head fully around that one either.

>>> btw adding cross-release lockdep annotations for fences will probably turn
>>> up _lots_ more bugs in this area.
>> At least for amdgpu that should be handled by now.
> You're sure? :-)

Yes, except for fallback paths and bootup self tests we simply never 
wait for fences while holding locks.

> Trouble is that cross-release wasn't even ever enabled, much less anyone
> typed the dma_fence annotations. And just cross-release alone turned up
> _lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
> reset) and core mm stuff.

Yeah, we had lots of fun with the mm locks as well but as far as I know 
Felix and I already fixed all of them.

Christian.

> I'd be seriously surprised if it wouldn't find an entire rats nest of
> issues around dma_fence once we enable it.
> -Daniel
>
>>>>>> +	 *
>>>>>> +	 * New mappings can be created immediately, but can't be used before the
>>>>>> +	 * exclusive fence in the dma_bufs reservation object is signaled.
>>>>>> +	 */
>>>>>> +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
>>>>> Bunch of questions about exact semantics, but I very much like this. And I
>>>>> think besides those technical details, the overall approach seems sound.
>>>> Yeah this initial implementation was buggy like hell. Just wanted to confirm
>>>> that the idea is going in the right direction.
>>> I wanted this 7 years ago, idea very much acked :-)
>>>
>> Ok, thanks. Good to know.
>>
>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15  9:20                 ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-15  9:20 UTC (permalink / raw)
  To: christian.koenig
  Cc: Daniel Vetter, linaro-mm-sig, dri-devel, amd-gfx, linux-media

On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
> Am 13.03.2018 um 17:00 schrieb Daniel Vetter:
> > On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
> > > Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> > > [SNIP]
> > Ok, so plan is to support fully pipeline moves and everything, with the
> > old sg tables lazily cleaned up. I was thinking more about evicting stuff
> > and throwing it out, where there's not going to be any new sg list but the
> > object is going to be swapped out.
> 
> Yes, exactly. Well my example was the unlikely case when the object is
> swapped out and immediately swapped in again because somebody needs it.
> 
> > 
> > I think some state flow charts (we can do SVG or DOT) in the kerneldoc
> > would be sweet.Yeah, probably a good idea.
> 
> Sounds good and I find it great that you're volunteering for that :D
> 
> Ok seriously, my drawing capabilities are a bit underdeveloped. So I would
> prefer if somebody could at least help with that.

Take a look at the DOT graphs for atomic I've done a while ago. I think we
could make a formidable competition for who's doing the worst diagrams :-)

> > > > Re GPU might cause a deadlock: Isn't that already a problem if you hold
> > > > reservations of buffers used on other gpus, which want those reservations
> > > > to complete the gpu reset, but that gpu reset blocks some fence that the
> > > > reservation holder is waiting for?
> > > Correct, that's why amdgpu and TTM tries quite hard to never wait for a
> > > fence while a reservation object is locked.
> > We might have a fairly huge mismatch of expectations here :-/
> 
> What do you mean with that?

i915 expects that other drivers don't have this requirement. Our gpu reset
can proceed even if it's all locked down.

> > > The only use case I haven't fixed so far is reaping deleted object during
> > > eviction, but that is only a matter of my free time to fix it.
> > Yeah, this is the hard one.
> 
> Actually it isn't so hard, it's just that I didn't had time so far to clean
> it up and we never hit that issue so far during our reset testing.
> 
> The main point missing just a bit of functionality in the reservation object
> and Chris and I already had a good idea how to implement that.
> 
> > In general the assumption is that dma_fence will get signalled no matter
> > what you're doing, assuming the only thing you need is to not block
> > interrupts. The i915 gpu reset logic to make that work is a bit a work of
> > art ...
> 
> Correct, but I don't understand why that is so hard on i915? Our GPU
> scheduler makes all of that rather trivial, e.g. fences either signal
> correctly or are aborted and set as erroneous after a timeout.

Yes, i915 does the same. It's the locking requirement we disagree on, i915
can reset while holding locks. I think right now we don't reset while
holding reservation locks, but only while holding our own locks. I think
cross-release would help model us this and uncover all the funny
dependency loops we have.

The issue I'm seeing:

amdgpu: Expects that you never hold any of the heavywheight locks while
waiting for a fence (since gpu resets will need them).

i915: Happily blocks on fences while holding all kinds of locks, expects
gpu reset to be able to recover even in this case.

Both drivers either complete the fence (with or without setting the error
status to EIO or something like that), that's not the difference. The work
of art I referenced is how we managed to complete gpu reset (including
resubmitting) while holding plenty of locks.

> > If we expect amdgpu and i915 to cooperate with shared buffers I guess one
> > has to give in. No idea how to do that best.
> 
> Again at least from amdgpu side I don't see much of an issue with that. So
> what exactly do you have in mind here?
> 
> > > > We have tons of fun with deadlocks against GPU resets, and loooooots of
> > > > testcases, and I kinda get the impression amdgpu is throwing a lot of
> > > > issues under the rug through trylock tricks that shut up lockdep, but
> > > > don't fix much really.
> > > Hui? Why do you think that? The only trylock I'm aware of is during eviction
> > > and there it isn't a problem.
> > mmap fault handler had one too last time I looked, and it smelled fishy.
> 
> Good point, never wrapped my head fully around that one either.
> 
> > > > btw adding cross-release lockdep annotations for fences will probably turn
> > > > up _lots_ more bugs in this area.
> > > At least for amdgpu that should be handled by now.
> > You're sure? :-)
> 
> Yes, except for fallback paths and bootup self tests we simply never wait
> for fences while holding locks.

That's not what I meant with "are you sure". Did you enable the
cross-release stuff (after patching the bunch of leftover core kernel
issues still present), annotate dma_fence with the cross-release stuff,
run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
weep?

I didn't do the full thing yet, but just within i915 we've found tons of
small little deadlocks we never really considered thanks to cross release,
and that wasn't even including the dma_fence annotation. Luckily nothing
that needed a full-on driver redesign.

I guess I need to ping core kernel maintainers about cross-release again.
I'd much prefer if we could validate ->invalidate_mapping and the
locking/fence dependency issues using that, instead of me having to read
and understand all the drivers.

> > Trouble is that cross-release wasn't even ever enabled, much less anyone
> > typed the dma_fence annotations. And just cross-release alone turned up
> > _lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
> > reset) and core mm stuff.
> 
> Yeah, we had lots of fun with the mm locks as well but as far as I know
> Felix and I already fixed all of them.

Are you sure you mean cross-release fun, and not just normal lockdep fun?
The cross-release is orders of magnitude more nasty imo. And we had a few
discussions with core folks where they told us "no way we're going to
break this depency on our side", involving a chain of cpu hotplug
(suspend/resume does that to shut down non-boot cpus), worker threads,
userptr, gem locking and core mm. All components required to actually
close the loop.

I fear that with the ->invalidate_mapping callback (which inverts the
control flow between importer and exporter) and tying dma_fences into all
this it will be a _lot_ worse. And I'm definitely too stupid to understand
all the dependency chains without the aid of lockdep and a full test suite
(we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
-Daniel

> 
> Christian.
> 
> > I'd be seriously surprised if it wouldn't find an entire rats nest of
> > issues around dma_fence once we enable it.
> > -Daniel
> > 
> > > > > > > +	 *
> > > > > > > +	 * New mappings can be created immediately, but can't be used before the
> > > > > > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > > > > > +	 */
> > > > > > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > > > > > Bunch of questions about exact semantics, but I very much like this. And I
> > > > > > think besides those technical details, the overall approach seems sound.
> > > > > Yeah this initial implementation was buggy like hell. Just wanted to confirm
> > > > > that the idea is going in the right direction.
> > > > I wanted this 7 years ago, idea very much acked :-)
> > > > 
> > > Ok, thanks. Good to know.
> > > 
> > > Christian.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15  9:20                 ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-15  9:20 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	linux-media-u79uwXL29TY76Z2rM5mHXA

On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
> Am 13.03.2018 um 17:00 schrieb Daniel Vetter:
> > On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:
> > > Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
> > > [SNIP]
> > Ok, so plan is to support fully pipeline moves and everything, with the
> > old sg tables lazily cleaned up. I was thinking more about evicting stuff
> > and throwing it out, where there's not going to be any new sg list but the
> > object is going to be swapped out.
> 
> Yes, exactly. Well my example was the unlikely case when the object is
> swapped out and immediately swapped in again because somebody needs it.
> 
> > 
> > I think some state flow charts (we can do SVG or DOT) in the kerneldoc
> > would be sweet.Yeah, probably a good idea.
> 
> Sounds good and I find it great that you're volunteering for that :D
> 
> Ok seriously, my drawing capabilities are a bit underdeveloped. So I would
> prefer if somebody could at least help with that.

Take a look at the DOT graphs for atomic I've done a while ago. I think we
could make a formidable competition for who's doing the worst diagrams :-)

> > > > Re GPU might cause a deadlock: Isn't that already a problem if you hold
> > > > reservations of buffers used on other gpus, which want those reservations
> > > > to complete the gpu reset, but that gpu reset blocks some fence that the
> > > > reservation holder is waiting for?
> > > Correct, that's why amdgpu and TTM tries quite hard to never wait for a
> > > fence while a reservation object is locked.
> > We might have a fairly huge mismatch of expectations here :-/
> 
> What do you mean with that?

i915 expects that other drivers don't have this requirement. Our gpu reset
can proceed even if it's all locked down.

> > > The only use case I haven't fixed so far is reaping deleted object during
> > > eviction, but that is only a matter of my free time to fix it.
> > Yeah, this is the hard one.
> 
> Actually it isn't so hard, it's just that I didn't had time so far to clean
> it up and we never hit that issue so far during our reset testing.
> 
> The main point missing just a bit of functionality in the reservation object
> and Chris and I already had a good idea how to implement that.
> 
> > In general the assumption is that dma_fence will get signalled no matter
> > what you're doing, assuming the only thing you need is to not block
> > interrupts. The i915 gpu reset logic to make that work is a bit a work of
> > art ...
> 
> Correct, but I don't understand why that is so hard on i915? Our GPU
> scheduler makes all of that rather trivial, e.g. fences either signal
> correctly or are aborted and set as erroneous after a timeout.

Yes, i915 does the same. It's the locking requirement we disagree on, i915
can reset while holding locks. I think right now we don't reset while
holding reservation locks, but only while holding our own locks. I think
cross-release would help model us this and uncover all the funny
dependency loops we have.

The issue I'm seeing:

amdgpu: Expects that you never hold any of the heavywheight locks while
waiting for a fence (since gpu resets will need them).

i915: Happily blocks on fences while holding all kinds of locks, expects
gpu reset to be able to recover even in this case.

Both drivers either complete the fence (with or without setting the error
status to EIO or something like that), that's not the difference. The work
of art I referenced is how we managed to complete gpu reset (including
resubmitting) while holding plenty of locks.

> > If we expect amdgpu and i915 to cooperate with shared buffers I guess one
> > has to give in. No idea how to do that best.
> 
> Again at least from amdgpu side I don't see much of an issue with that. So
> what exactly do you have in mind here?
> 
> > > > We have tons of fun with deadlocks against GPU resets, and loooooots of
> > > > testcases, and I kinda get the impression amdgpu is throwing a lot of
> > > > issues under the rug through trylock tricks that shut up lockdep, but
> > > > don't fix much really.
> > > Hui? Why do you think that? The only trylock I'm aware of is during eviction
> > > and there it isn't a problem.
> > mmap fault handler had one too last time I looked, and it smelled fishy.
> 
> Good point, never wrapped my head fully around that one either.
> 
> > > > btw adding cross-release lockdep annotations for fences will probably turn
> > > > up _lots_ more bugs in this area.
> > > At least for amdgpu that should be handled by now.
> > You're sure? :-)
> 
> Yes, except for fallback paths and bootup self tests we simply never wait
> for fences while holding locks.

That's not what I meant with "are you sure". Did you enable the
cross-release stuff (after patching the bunch of leftover core kernel
issues still present), annotate dma_fence with the cross-release stuff,
run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
weep?

I didn't do the full thing yet, but just within i915 we've found tons of
small little deadlocks we never really considered thanks to cross release,
and that wasn't even including the dma_fence annotation. Luckily nothing
that needed a full-on driver redesign.

I guess I need to ping core kernel maintainers about cross-release again.
I'd much prefer if we could validate ->invalidate_mapping and the
locking/fence dependency issues using that, instead of me having to read
and understand all the drivers.

> > Trouble is that cross-release wasn't even ever enabled, much less anyone
> > typed the dma_fence annotations. And just cross-release alone turned up
> > _lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
> > reset) and core mm stuff.
> 
> Yeah, we had lots of fun with the mm locks as well but as far as I know
> Felix and I already fixed all of them.

Are you sure you mean cross-release fun, and not just normal lockdep fun?
The cross-release is orders of magnitude more nasty imo. And we had a few
discussions with core folks where they told us "no way we're going to
break this depency on our side", involving a chain of cpu hotplug
(suspend/resume does that to shut down non-boot cpus), worker threads,
userptr, gem locking and core mm. All components required to actually
close the loop.

I fear that with the ->invalidate_mapping callback (which inverts the
control flow between importer and exporter) and tying dma_fences into all
this it will be a _lot_ worse. And I'm definitely too stupid to understand
all the dependency chains without the aid of lockdep and a full test suite
(we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
-Daniel

> 
> Christian.
> 
> > I'd be seriously surprised if it wouldn't find an entire rats nest of
> > issues around dma_fence once we enable it.
> > -Daniel
> > 
> > > > > > > +	 *
> > > > > > > +	 * New mappings can be created immediately, but can't be used before the
> > > > > > > +	 * exclusive fence in the dma_bufs reservation object is signaled.
> > > > > > > +	 */
> > > > > > > +	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
> > > > > > Bunch of questions about exact semantics, but I very much like this. And I
> > > > > > think besides those technical details, the overall approach seems sound.
> > > > > Yeah this initial implementation was buggy like hell. Just wanted to confirm
> > > > > that the idea is going in the right direction.
> > > > I wanted this 7 years ago, idea very much acked :-)
> > > > 
> > > Ok, thanks. Good to know.
> > > 
> > > Christian.
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15  9:56                   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-15  9:56 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: linaro-mm-sig, amd-gfx, dri-devel, linux-media

Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
> [SNIP]
> Take a look at the DOT graphs for atomic I've done a while ago. I think we
> could make a formidable competition for who's doing the worst diagrams :-)

Thanks, going to give that a try.

> [SNIP]
> amdgpu: Expects that you never hold any of the heavywheight locks while
> waiting for a fence (since gpu resets will need them).
>
> i915: Happily blocks on fences while holding all kinds of locks, expects
> gpu reset to be able to recover even in this case.

In this case I can comfort you, the looks amdgpu needs to grab during 
GPU reset are the reservation lock of the VM page tables. I have strong 
doubt that i915 will ever hold those.

Could be that we run into problems because Thread A hold lock 1 tries to 
take lock 2, then i915 holds 2 and our reset path needs 1.

> [SNIP]
>> Yes, except for fallback paths and bootup self tests we simply never wait
>> for fences while holding locks.
> That's not what I meant with "are you sure". Did you enable the
> cross-release stuff (after patching the bunch of leftover core kernel
> issues still present), annotate dma_fence with the cross-release stuff,
> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
> weep?

Ok, what exactly do you mean with cross-release checking?

> I didn't do the full thing yet, but just within i915 we've found tons of
> small little deadlocks we never really considered thanks to cross release,
> and that wasn't even including the dma_fence annotation. Luckily nothing
> that needed a full-on driver redesign.
>
> I guess I need to ping core kernel maintainers about cross-release again.
> I'd much prefer if we could validate ->invalidate_mapping and the
> locking/fence dependency issues using that, instead of me having to read
> and understand all the drivers.
[SNIP]
> I fear that with the ->invalidate_mapping callback (which inverts the
> control flow between importer and exporter) and tying dma_fences into all
> this it will be a _lot_ worse. And I'm definitely too stupid to understand
> all the dependency chains without the aid of lockdep and a full test suite
> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).

Yes, that is also something I worry about.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15  9:56                   ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2018-03-15  9:56 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig-5C7GfCeVMHo
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
> [SNIP]
> Take a look at the DOT graphs for atomic I've done a while ago. I think we
> could make a formidable competition for who's doing the worst diagrams :-)

Thanks, going to give that a try.

> [SNIP]
> amdgpu: Expects that you never hold any of the heavywheight locks while
> waiting for a fence (since gpu resets will need them).
>
> i915: Happily blocks on fences while holding all kinds of locks, expects
> gpu reset to be able to recover even in this case.

In this case I can comfort you, the looks amdgpu needs to grab during 
GPU reset are the reservation lock of the VM page tables. I have strong 
doubt that i915 will ever hold those.

Could be that we run into problems because Thread A hold lock 1 tries to 
take lock 2, then i915 holds 2 and our reset path needs 1.

> [SNIP]
>> Yes, except for fallback paths and bootup self tests we simply never wait
>> for fences while holding locks.
> That's not what I meant with "are you sure". Did you enable the
> cross-release stuff (after patching the bunch of leftover core kernel
> issues still present), annotate dma_fence with the cross-release stuff,
> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
> weep?

Ok, what exactly do you mean with cross-release checking?

> I didn't do the full thing yet, but just within i915 we've found tons of
> small little deadlocks we never really considered thanks to cross release,
> and that wasn't even including the dma_fence annotation. Luckily nothing
> that needed a full-on driver redesign.
>
> I guess I need to ping core kernel maintainers about cross-release again.
> I'd much prefer if we could validate ->invalidate_mapping and the
> locking/fence dependency issues using that, instead of me having to read
> and understand all the drivers.
[SNIP]
> I fear that with the ->invalidate_mapping callback (which inverts the
> control flow between importer and exporter) and tying dma_fences into all
> this it will be a _lot_ worse. And I'm definitely too stupid to understand
> all the dependency chains without the aid of lockdep and a full test suite
> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).

Yes, that is also something I worry about.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15 11:02                     ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-15 11:02 UTC (permalink / raw)
  To: Christian König
  Cc: moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	dri-devel, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Mar 15, 2018 at 10:56 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
>>
>> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
>> [SNIP]
>> Take a look at the DOT graphs for atomic I've done a while ago. I think we
>> could make a formidable competition for who's doing the worst diagrams :-)
>
>
> Thanks, going to give that a try.
>
>> [SNIP]
>> amdgpu: Expects that you never hold any of the heavywheight locks while
>> waiting for a fence (since gpu resets will need them).
>>
>> i915: Happily blocks on fences while holding all kinds of locks, expects
>> gpu reset to be able to recover even in this case.
>
>
> In this case I can comfort you, the looks amdgpu needs to grab during GPU
> reset are the reservation lock of the VM page tables. I have strong doubt
> that i915 will ever hold those.

Ah good, means that very likely there's at least no huge fundamental
design issue that we run into.

> Could be that we run into problems because Thread A hold lock 1 tries to
> take lock 2, then i915 holds 2 and our reset path needs 1.

Yeah that might happen, but lockdep will catch those, and generally
those cases can be fixed with slight reordering or re-annotating of
the code to avoid upsetting lockdep. As long as we don't have a
full-on functional dependency (which is what I've feared).

>> [SNIP]
>>>
>>> Yes, except for fallback paths and bootup self tests we simply never wait
>>> for fences while holding locks.
>>
>> That's not what I meant with "are you sure". Did you enable the
>> cross-release stuff (after patching the bunch of leftover core kernel
>> issues still present), annotate dma_fence with the cross-release stuff,
>> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
>> weep?
>
>
> Ok, what exactly do you mean with cross-release checking?

Current lockdep doesn't spot deadlocks like the below:

thread A: holds mutex, waiting for completion.

thread B: acquires mutex before it will ever signal the completion A
is waiting for

->deadlock

cross-release lockdep support can catch these through new fancy
annotations. Similar waiter/signaller annotations exists for waiting
on workers and anything else, and it would be a perfect fit for
waiter/signaller code around dma_fence.

lwn has you covered a usual: https://lwn.net/Articles/709849/

Cheers, Daniel

>> I didn't do the full thing yet, but just within i915 we've found tons of
>> small little deadlocks we never really considered thanks to cross release,
>> and that wasn't even including the dma_fence annotation. Luckily nothing
>> that needed a full-on driver redesign.
>>
>> I guess I need to ping core kernel maintainers about cross-release again.
>> I'd much prefer if we could validate ->invalidate_mapping and the
>> locking/fence dependency issues using that, instead of me having to read
>> and understand all the drivers.
>
> [SNIP]
>>
>> I fear that with the ->invalidate_mapping callback (which inverts the
>> control flow between importer and exporter) and tying dma_fences into all
>> this it will be a _lot_ worse. And I'm definitely too stupid to understand
>> all the dependency chains without the aid of lockdep and a full test suite
>> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
>
>
> Yes, that is also something I worry about.
>
> Regards,
> Christian.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback
@ 2018-03-15 11:02                     ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2018-03-15 11:02 UTC (permalink / raw)
  To: Christian König
  Cc: moderated list:DMA BUFFER SHARING FRAMEWORK, dri-devel,
	amd-gfx list, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Mar 15, 2018 at 10:56 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
>>
>> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
>> [SNIP]
>> Take a look at the DOT graphs for atomic I've done a while ago. I think we
>> could make a formidable competition for who's doing the worst diagrams :-)
>
>
> Thanks, going to give that a try.
>
>> [SNIP]
>> amdgpu: Expects that you never hold any of the heavywheight locks while
>> waiting for a fence (since gpu resets will need them).
>>
>> i915: Happily blocks on fences while holding all kinds of locks, expects
>> gpu reset to be able to recover even in this case.
>
>
> In this case I can comfort you, the looks amdgpu needs to grab during GPU
> reset are the reservation lock of the VM page tables. I have strong doubt
> that i915 will ever hold those.

Ah good, means that very likely there's at least no huge fundamental
design issue that we run into.

> Could be that we run into problems because Thread A hold lock 1 tries to
> take lock 2, then i915 holds 2 and our reset path needs 1.

Yeah that might happen, but lockdep will catch those, and generally
those cases can be fixed with slight reordering or re-annotating of
the code to avoid upsetting lockdep. As long as we don't have a
full-on functional dependency (which is what I've feared).

>> [SNIP]
>>>
>>> Yes, except for fallback paths and bootup self tests we simply never wait
>>> for fences while holding locks.
>>
>> That's not what I meant with "are you sure". Did you enable the
>> cross-release stuff (after patching the bunch of leftover core kernel
>> issues still present), annotate dma_fence with the cross-release stuff,
>> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
>> weep?
>
>
> Ok, what exactly do you mean with cross-release checking?

Current lockdep doesn't spot deadlocks like the below:

thread A: holds mutex, waiting for completion.

thread B: acquires mutex before it will ever signal the completion A
is waiting for

->deadlock

cross-release lockdep support can catch these through new fancy
annotations. Similar waiter/signaller annotations exists for waiting
on workers and anything else, and it would be a perfect fit for
waiter/signaller code around dma_fence.

lwn has you covered a usual: https://lwn.net/Articles/709849/

Cheers, Daniel

>> I didn't do the full thing yet, but just within i915 we've found tons of
>> small little deadlocks we never really considered thanks to cross release,
>> and that wasn't even including the dma_fence annotation. Luckily nothing
>> that needed a full-on driver redesign.
>>
>> I guess I need to ping core kernel maintainers about cross-release again.
>> I'd much prefer if we could validate ->invalidate_mapping and the
>> locking/fence dependency issues using that, instead of me having to read
>> and understand all the drivers.
>
> [SNIP]
>>
>> I fear that with the ->invalidate_mapping callback (which inverts the
>> control flow between importer and exporter) and tying dma_fences into all
>> this it will be a _lot_ worse. And I'm definitely too stupid to understand
>> all the dependency chains without the aid of lockdep and a full test suite
>> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
>
>
> Yes, that is also something I worry about.
>
> Regards,
> Christian.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-03-15 11:03 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 19:11 RFC: unpinned DMA-buf exporting Christian König
2018-03-09 19:11 ` Christian König
2018-03-09 19:11 ` [PATCH 1/4] dma-buf: add optional invalidate_mappings callback Christian König
2018-03-09 19:11   ` Christian König
2018-03-12 17:07   ` Daniel Vetter
2018-03-12 17:07     ` Daniel Vetter
2018-03-12 19:13     ` Christian König
2018-03-12 19:13       ` Christian König
2018-03-13 15:17       ` Daniel Vetter
2018-03-13 15:17         ` Daniel Vetter
2018-03-13 15:52         ` Christian König
2018-03-13 15:52           ` Christian König
2018-03-13 16:00           ` Daniel Vetter
2018-03-13 16:00             ` Daniel Vetter
2018-03-13 17:20             ` Christian König
2018-03-13 17:20               ` Christian König
2018-03-15  9:20               ` Daniel Vetter
2018-03-15  9:20                 ` Daniel Vetter
2018-03-15  9:56                 ` Christian König
2018-03-15  9:56                   ` Christian König
2018-03-15 11:02                   ` Daniel Vetter
2018-03-15 11:02                     ` Daniel Vetter
2018-03-09 19:11 ` [PATCH 2/4] drm/ttm: keep a reference to transfer pipelined BOs Christian König
2018-03-09 19:11   ` Christian König
2018-03-09 19:11 ` [PATCH 3/4] drm/amdgpu: add independent DMA-buf export Christian König
2018-03-09 19:11   ` Christian König
2018-03-09 19:11 ` [PATCH 4/4] drm/amdgpu: add independent DMA-buf import Christian König
2018-03-09 19:11   ` Christian König
2018-03-12 17:24 ` RFC: unpinned DMA-buf exporting Daniel Vetter
2018-03-12 17:24   ` Daniel Vetter
2018-03-12 19:15   ` Christian König
2018-03-12 19:15     ` Christian König
2018-03-12 19:41     ` Daniel Vetter
2018-03-12 19:41       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.