All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Implicit vs explicit user fence sync
@ 2021-05-04 13:27 Christian König
  2021-05-04 13:27 ` [PATCH 01/12] dma-buf: add interface for user fence synchronization Christian König
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Hi guys,

with this patch set I want to look into how much more additional work it would be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only addition is that before a command submission or flip the kernel and classic drivers would need to wait for the user fence to signal before taking any locks.

For this prototype this patch set doesn't implement any user fence synchronization at all, but just assumes that faulting user pages is sufficient to make sure that we can wait for user space to finish submitting the work. If necessary this can be made even more strict, the only use case I could find which blocks this is the radeon driver and that should be handle able.

This of course doesn't give you the same semantic as the classic implicit sync to guarantee that you have exclusive access to a buffers, but this is also not necessary.

So I think the conclusion should be that we don't need to concentrate on implicit vs. explicit sync, but rather how to get the synchronization and timeout signalling figured out in general.

Regards,
Christian.


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/12] dma-buf: add interface for user fence synchronization
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support Christian König
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

This is a RFC/WIP patch which just adds the interface and lockdep
annotation without any actual implementation.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-resv.c | 18 ++++++++++++++++++
 include/linux/dma-resv.h   |  1 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 6ddbeb5dfbf6..e0305424957b 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -681,3 +681,21 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_test_signaled_rcu);
+
+/**
+ * dma_resv_sync_user_fence - block for user fences to signal
+ *
+ * @obj: The DMA resv object with the user fence attached
+ *
+ * To make sure we have proper synchronization between accesses block for user
+ * fences before starting a dma_fence based operation on the buffer.
+ */
+int dma_resv_sync_user_fence(struct dma_resv *obj)
+{
+	might_fault();
+
+	/* TODO: Actually come up with an implementation for this! */
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dma_resv_sync_user_fence);
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index d44a77e8a7e3..c525a36be900 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -289,5 +289,6 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj, bool wait_all, bool intr,
 			       unsigned long timeout);
 
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all);
+int dma_resv_sync_user_fence(struct dma_resv *obj);
 
 #endif /* _LINUX_RESERVATION_H */
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
  2021-05-04 13:27 ` [PATCH 01/12] dma-buf: add interface for user fence synchronization Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 03/12] drm/amdgpu: " Christian König
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index b103555b1f5d..6b4d980c02e8 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -804,6 +804,10 @@ static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt,
 	if (flags & MLX5_PF_FLAGS_ENABLE)
 		xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
 
+	err = dma_resv_sync_user_fence(umem_dmabuf->attach->dmabuf->resv);
+	if (err)
+		return err;
+
 	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
 	err = ib_umem_dmabuf_map_pages(umem_dmabuf);
 	if (err) {
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/12] drm/amdgpu: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
  2021-05-04 13:27 ` [PATCH 01/12] dma-buf: add interface for user fence synchronization Christian König
  2021-05-04 13:27 ` [PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper Christian König
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b5c766998045..afd58c6d88a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -534,6 +534,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 			return r;
 	}
 
+	/* Sync to user fences */
+	amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+		r = dma_resv_sync_user_fence(e->tv.bo->base.resv);
+		if (r)
+			return r;
+	}
+
 	/* One for TTM and one for the CS job */
 	amdgpu_bo_list_for_each_entry(e, p->bo_list)
 		e->tv.num_shared = 2;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 9a2f811450ed..3edd6dbae71f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -181,6 +181,12 @@ int amdgpu_display_crtc_page_flip_target(struct drm_crtc *crtc,
 	obj = fb->obj[0];
 	new_abo = gem_to_amdgpu_bo(obj);
 
+	r = dma_resv_sync_user_fence(obj->resv);
+	if (unlikely(r)) {
+		DRM_ERROR("failed to wait for user fence before flip\n");
+		goto cleanup;
+	}
+
 	/* pin the new buffer */
 	r = amdgpu_bo_reserve(new_abo, false);
 	if (unlikely(r != 0)) {
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (2 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 03/12] drm/amdgpu: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 05/12] drm/etnaviv: add DMA-buf user fence support Christian König
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_gem_atomic_helper.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c
index a005c5a0ba46..fe0d18486643 100644
--- a/drivers/gpu/drm/drm_gem_atomic_helper.c
+++ b/drivers/gpu/drm/drm_gem_atomic_helper.c
@@ -142,11 +142,15 @@ int drm_gem_plane_helper_prepare_fb(struct drm_plane *plane, struct drm_plane_st
 {
 	struct drm_gem_object *obj;
 	struct dma_fence *fence;
+	int ret;
 
 	if (!state->fb)
 		return 0;
 
 	obj = drm_gem_fb_get_obj(state->fb, 0);
+	ret = dma_resv_sync_user_fence(obj->resv);
+	if (ret)
+		return ret;
 	fence = dma_resv_get_excl_rcu(obj->resv);
 	drm_atomic_set_fence_for_plane(state, fence);
 
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/12] drm/etnaviv: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (3 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 06/12] drm/i915: " Christian König
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 23 ++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index d05c35994579..2e440674ca5b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -108,6 +108,21 @@ static int submit_lookup_objects(struct etnaviv_gem_submit *submit,
 	return ret;
 }
 
+static int submit_sync_user(struct etnaviv_gem_submit *submit)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < submit->nr_bos; i++) {
+		struct drm_gem_object *obj = &submit->bos[i].obj->base;
+
+		ret = dma_resv_sync_user_fence(obj->resv);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static void submit_unlock_object(struct etnaviv_gem_submit *submit, int i)
 {
 	if (submit->bos[i].flags & BO_LOCKED) {
@@ -518,8 +533,6 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	ww_acquire_init(&ticket, &reservation_ww_class);
-
 	submit = submit_create(dev, gpu, args->nr_bos, args->nr_pmrs);
 	if (!submit) {
 		ret = -ENOMEM;
@@ -541,6 +554,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	if (ret)
 		goto err_submit_objects;
 
+	ret = submit_sync_user(submit);
+	if (ret)
+		goto err_submit_objects;
+
+	ww_acquire_init(&ticket, &reservation_ww_class);
+
 	if ((priv->mmu_global->version != ETNAVIV_IOMMU_V2) &&
 	    !etnaviv_cmd_validate_one(gpu, stream, args->stream_size / 4,
 				      relocs, args->nr_relocs)) {
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/12] drm/i915: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (4 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 05/12] drm/etnaviv: add DMA-buf user fence support Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 07/12] drm/lima: " Christian König
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5964e67c7d36..24c575d762db 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -872,6 +872,12 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 			goto err;
 		}
 
+		err = dma_resv_sync_user_fence(vma->obj->base.resv);
+		if (unlikely(err)) {
+			i915_vma_put(vma);
+			goto err;
+		}
+
 		eb_add_vma(eb, i, batch, vma);
 
 		if (i915_gem_object_is_userptr(vma->obj)) {
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/12] drm/lima: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (5 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 06/12] drm/i915: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 08/12] drm/msm: " Christian König
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/lima/lima_gem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index de62966243cd..d3d68218568d 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -321,6 +321,12 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 			goto err_out0;
 		}
 
+		err = dma_resv_sync_user_fence(obj->resv);
+		if (err) {
+			drm_gem_object_put(obj);
+			goto err_out0;
+		}
+
 		bo = to_lima_bo(obj);
 
 		/* increase refcnt of gpu va map to prevent unmapped when executing,
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/12] drm/msm: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (6 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 07/12] drm/lima: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 09/12] drm/nouveau: " Christian König
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5480852bdeda..a77389ce23d0 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -285,6 +285,20 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
 	return ret;
 }
 
+static int submit_sync_user_fence(struct msm_gem_submit *submit)
+{
+	int i, ret;
+
+	for (i = 0; i < submit->nr_bos; i++) {
+		struct msm_gem_object *msm_obj = submit->bos[i].obj;
+
+		ret = dma_resv_sync_user_fence(msm_obj->base.resv);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
 {
 	int i, ret = 0;
@@ -769,6 +783,10 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	 */
 	pm_runtime_get_sync(&gpu->pdev->dev);
 
+	ret = submit_sync_user_fence(submit);
+	if (ret)
+		goto out;
+
 	/* copy_*_user while holding a ww ticket upsets lockdep */
 	ww_acquire_init(&submit->ticket, &reservation_ww_class);
 	has_ww_ticket = true;
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/12] drm/nouveau: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (7 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 08/12] drm/msm: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 10/12] drm/panfrost: " Christian König
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a70e82413fa7..e349a8b32549 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -552,6 +552,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 			     struct validate_op *op, bool *apply_relocs)
 {
 	struct nouveau_cli *cli = nouveau_cli(file_priv);
+	unsigned int i;
 	int ret;
 
 	INIT_LIST_HEAD(&op->list);
@@ -559,6 +560,17 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	if (nr_buffers == 0)
 		return 0;
 
+	for (i = 0; i < nr_buffers; i++) {
+		struct drm_nouveau_gem_pushbuf_bo *b = &pbbo[i];
+		struct drm_gem_object *gem;
+
+		gem = drm_gem_object_lookup(file_priv, b->handle);
+		if (!gem)
+			return -ENOENT;
+		dma_resv_sync_user_fence(gem->resv);
+		drm_gem_object_put(gem);
+	}
+
 	ret = validate_init(chan, file_priv, pbbo, nr_buffers, op);
 	if (unlikely(ret)) {
 		if (ret != -ERESTARTSYS)
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/12] drm/panfrost: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (8 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 09/12] drm/nouveau: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 11/12] drm/radeon: " Christian König
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..9174ceb1d16d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -216,6 +216,18 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos,
 		dma_resv_add_excl_fence(bos[i]->resv, fence);
 }
 
+static int panfrost_sync_user_fences(struct drm_gem_object **bos, int bo_count)
+{
+	int i, ret;
+
+	for (i = 0; i < bo_count; i++) {
+		ret = dma_resv_sync_user_fence(bos[i]->resv);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 int panfrost_job_push(struct panfrost_job *job)
 {
 	struct panfrost_device *pfdev = job->pfdev;
@@ -224,6 +236,10 @@ int panfrost_job_push(struct panfrost_job *job)
 	struct ww_acquire_ctx acquire_ctx;
 	int ret = 0;
 
+	ret = panfrost_sync_user_fences(job->bos, job->bo_count);
+	if (ret)
+		return ret;
+
 	mutex_lock(&pfdev->sched_lock);
 
 	ret = drm_gem_lock_reservations(job->bos, job->bo_count,
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/12] drm/radeon: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (9 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 10/12] drm/panfrost: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 13:27 ` [PATCH 12/12] drm/v3d: " Christian König
  2021-05-04 14:15 ` [RFC] Implicit vs explicit user fence sync Daniel Vetter
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c      | 6 ++++++
 drivers/gpu/drm/radeon/radeon_display.c | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 059431689c2d..fb0e238535f3 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -189,6 +189,12 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 				      priority);
 	}
 
+	for (i = 0; i < p->nrelocs; i++) {
+		r = dma_resv_sync_user_fence(p->relocs[i].tv.bo->base.resv);
+		if (r)
+			return r;
+	}
+
 	radeon_cs_buckets_get_list(&buckets, &p->validated);
 
 	if (p->cs_flags & RADEON_CS_USE_VM)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 652af7a134bd..75ebd2338809 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -519,6 +519,10 @@ static int radeon_crtc_page_flip_target(struct drm_crtc *crtc,
 	DRM_DEBUG_DRIVER("flip-ioctl() cur_rbo = %p, new_rbo = %p\n",
 			 work->old_rbo, new_rbo);
 
+	r = dma_resv_sync_user_fence(new_rbo->tbo.base.resv);
+	if (unlikely(r != 0))
+		goto cleanup;
+
 	r = radeon_bo_reserve(new_rbo, false);
 	if (unlikely(r != 0)) {
 		DRM_ERROR("failed to reserve new rbo buffer before flip\n");
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/12] drm/v3d: add DMA-buf user fence support
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (10 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 11/12] drm/radeon: " Christian König
@ 2021-05-04 13:27 ` Christian König
  2021-05-04 14:15 ` [RFC] Implicit vs explicit user fence sync Daniel Vetter
  12 siblings, 0 replies; 26+ messages in thread
From: Christian König @ 2021-05-04 13:27 UTC (permalink / raw)
  To: dri-devel

Just add the call before taking locks.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 4eb354226972..7c45292c641c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -345,6 +345,12 @@ v3d_lookup_bos(struct drm_device *dev,
 	}
 	spin_unlock(&file_priv->table_lock);
 
+	for (i = 0; i < job->bo_count; i++) {
+		ret = dma_resv_sync_user_fence(job->bo[i]->resv);
+		if (ret)
+			break;
+	}
+
 fail:
 	kvfree(handles);
 	return ret;
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
                   ` (11 preceding siblings ...)
  2021-05-04 13:27 ` [PATCH 12/12] drm/v3d: " Christian König
@ 2021-05-04 14:15 ` Daniel Vetter
  2021-05-04 14:26   ` Christian König
  12 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-04 14:15 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> Hi guys,
> 
> with this patch set I want to look into how much more additional work it
> would be to support implicit sync compared to only explicit sync.
> 
> Turned out that this is much simpler than expected since the only
> addition is that before a command submission or flip the kernel and
> classic drivers would need to wait for the user fence to signal before
> taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow
- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
  wait_bo ioctl also need to keep working, including timeout and
  everything
- we can't stall in atomic kms where you're currently stalling, that's for
  sure. The uapi says "we're not stalling for fences in there", and you're
  breaking that.
- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis
   1a. including all the winsys/modeset stuff
2. roll out support for userspace fences to drm_syncobj timeline for
   interop, both across process/userspace and across drivers
   2a. including all the winsys/modeset stuff, but hopefully that's
       largely solved with 1. already.
3. only then try to figure out how to retroshoehorn this into implicit
   sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)

> For this prototype this patch set doesn't implement any user fence
> synchronization at all, but just assumes that faulting user pages is
> sufficient to make sure that we can wait for user space to finish
> submitting the work. If necessary this can be made even more strict, the
> only use case I could find which blocks this is the radeon driver and
> that should be handle able.
> 
> This of course doesn't give you the same semantic as the classic
> implicit sync to guarantee that you have exclusive access to a buffers,
> but this is also not necessary.
> 
> So I think the conclusion should be that we don't need to concentrate on
> implicit vs. explicit sync, but rather how to get the synchronization
> and timeout signalling figured out in general.

I'm not sure what exactly you're proving here aside from "it's possible to
roll out a function with ill-defined semantics to all drivers". This
really is a lot harder than just this one function and just this one patch
set.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-04 14:15 ` [RFC] Implicit vs explicit user fence sync Daniel Vetter
@ 2021-05-04 14:26   ` Christian König
  2021-05-04 15:11     ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-04 14:26 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Hi Daniel,

Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> Hi Christian,
>
> On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
>> Hi guys,
>>
>> with this patch set I want to look into how much more additional work it
>> would be to support implicit sync compared to only explicit sync.
>>
>> Turned out that this is much simpler than expected since the only
>> addition is that before a command submission or flip the kernel and
>> classic drivers would need to wait for the user fence to signal before
>> taking any locks.
> It's a lot more I think
> - sync_file/drm_syncobj still need to be supported somehow

You need that with explicit fences as well.

I'm just concentrating on what extra burden implicit sync would get us.

> - we need userspace to handle the stall in a submit thread at least
> - there's nothing here that sets the sync object
> - implicit sync isn't just execbuf, it's everything. E.g. the various
>    wait_bo ioctl also need to keep working, including timeout and
>    everything

Good point, but that should be relatively easily to add as well.

> - we can't stall in atomic kms where you're currently stalling, that's for
>    sure. The uapi says "we're not stalling for fences in there", and you're
>    breaking that.

Again as far as I can see we run into the same problem with explicit sync.

So the question is where could we block for atomic modeset for user 
fences in general?

> - ... at this point I stopped pondering but there's definitely more
>
> Imo the only way we'll even get the complete is if we do the following:
> 1. roll out implicit sync with userspace fences on a driver-by-driver basis
>     1a. including all the winsys/modeset stuff

Completely agree, that's why I've split that up into individual patches.

I'm also fine if drivers can just opt out of user fence based 
synchronization and we return an error from dma_buf_dynamic_attach() if 
some driver says it can't handle that.

> 2. roll out support for userspace fences to drm_syncobj timeline for
>     interop, both across process/userspace and across drivers
>     2a. including all the winsys/modeset stuff, but hopefully that's
>         largely solved with 1. already.

Correct, but again we need this for explicit fencing as well.

> 3. only then try to figure out how to retroshoehorn this into implicit
>     sync, and whether that even makes sense.
>
> Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
> interop fun across drivers) is just praying that this time around we're
> not collectively idiots and can correctly predict the future. That never
> worked :-)
>
>> For this prototype this patch set doesn't implement any user fence
>> synchronization at all, but just assumes that faulting user pages is
>> sufficient to make sure that we can wait for user space to finish
>> submitting the work. If necessary this can be made even more strict, the
>> only use case I could find which blocks this is the radeon driver and
>> that should be handle able.
>>
>> This of course doesn't give you the same semantic as the classic
>> implicit sync to guarantee that you have exclusive access to a buffers,
>> but this is also not necessary.
>>
>> So I think the conclusion should be that we don't need to concentrate on
>> implicit vs. explicit sync, but rather how to get the synchronization
>> and timeout signalling figured out in general.
> I'm not sure what exactly you're proving here aside from "it's possible to
> roll out a function with ill-defined semantics to all drivers". This
> really is a lot harder than just this one function and just this one patch
> set.

No it isn't. The hard part is getting the user sync stuff up in general.

Adding implicit synchronization on top of that is then rather trivial.

Christian.

> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-04 14:26   ` Christian König
@ 2021-05-04 15:11     ` Daniel Vetter
  2021-05-10 18:12       ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-04 15:11 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
> Hi Daniel,
> 
> Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> > Hi Christian,
> > 
> > On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> > > Hi guys,
> > > 
> > > with this patch set I want to look into how much more additional work it
> > > would be to support implicit sync compared to only explicit sync.
> > > 
> > > Turned out that this is much simpler than expected since the only
> > > addition is that before a command submission or flip the kernel and
> > > classic drivers would need to wait for the user fence to signal before
> > > taking any locks.
> > It's a lot more I think
> > - sync_file/drm_syncobj still need to be supported somehow
> 
> You need that with explicit fences as well.
> 
> I'm just concentrating on what extra burden implicit sync would get us.

It's not just implicit sync. Currently the best approach we have for
explicit sync is hiding them in drm_syncobj. Because for that all the work
with intentional stall points and userspace submit thread already exists.

None of this work has been done for sync_file. And looking at how much
work it was to get drm_syncobj going, that will be anything but easy.

> > - we need userspace to handle the stall in a submit thread at least
> > - there's nothing here that sets the sync object
> > - implicit sync isn't just execbuf, it's everything. E.g. the various
> >    wait_bo ioctl also need to keep working, including timeout and
> >    everything
> 
> Good point, but that should be relatively easily to add as well.
> 
> > - we can't stall in atomic kms where you're currently stalling, that's for
> >    sure. The uapi says "we're not stalling for fences in there", and you're
> >    breaking that.
> 
> Again as far as I can see we run into the same problem with explicit sync.
> 
> So the question is where could we block for atomic modeset for user fences
> in general?

Nah, I have an idea. But it only works if userspace is aware, because the
rules are essentialyl:

- when you supply a userspace in-fence, then you only get a userspace
  out-fence
- mixing in fences between dma-fence and user fence is ok
- mixing out fences isn't

And we currently do have sync_file out fence. So it's not possible to
support implicit user fence in atomic in a way which doesn't break the
uapi somewhere.

Doing the explicit user fence support first will make that very obvious.

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

Anotherone that looks very sketchy right now is buffer sharing between
different userspace drivers, like compute <-> media (if you have some
fancy AI pipeline in your media workload, as an example).

> > - ... at this point I stopped pondering but there's definitely more
> > 
> > Imo the only way we'll even get the complete is if we do the following:
> > 1. roll out implicit sync with userspace fences on a driver-by-driver basis
		s/implicit/explicit/

But I think you got that.

> >     1a. including all the winsys/modeset stuff
> 
> Completely agree, that's why I've split that up into individual patches.
> 
> I'm also fine if drivers can just opt out of user fence based
> synchronization and we return an error from dma_buf_dynamic_attach() if some
> driver says it can't handle that.

Yeah, but that boils down to us just breaking those use-cases. Which is
exactly what you're trying to avoid by rolling out implicit user fence I
think.

> > 2. roll out support for userspace fences to drm_syncobj timeline for
> >     interop, both across process/userspace and across drivers
> >     2a. including all the winsys/modeset stuff, but hopefully that's
> >         largely solved with 1. already.
> 
> Correct, but again we need this for explicit fencing as well.
> 
> > 3. only then try to figure out how to retroshoehorn this into implicit
> >     sync, and whether that even makes sense.
> > 
> > Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
> > interop fun across drivers) is just praying that this time around we're
> > not collectively idiots and can correctly predict the future. That never
> > worked :-)
> > 
> > > For this prototype this patch set doesn't implement any user fence
> > > synchronization at all, but just assumes that faulting user pages is
> > > sufficient to make sure that we can wait for user space to finish
> > > submitting the work. If necessary this can be made even more strict, the
> > > only use case I could find which blocks this is the radeon driver and
> > > that should be handle able.
> > > 
> > > This of course doesn't give you the same semantic as the classic
> > > implicit sync to guarantee that you have exclusive access to a buffers,
> > > but this is also not necessary.
> > > 
> > > So I think the conclusion should be that we don't need to concentrate on
> > > implicit vs. explicit sync, but rather how to get the synchronization
> > > and timeout signalling figured out in general.
> > I'm not sure what exactly you're proving here aside from "it's possible to
> > roll out a function with ill-defined semantics to all drivers". This
> > really is a lot harder than just this one function and just this one patch
> > set.
> 
> No it isn't. The hard part is getting the user sync stuff up in general.
> 
> Adding implicit synchronization on top of that is then rather trivial.

Well that's what I disagree with, since I already see some problems that I
don't think we can overcome (the atomic ioctl is one). And that's with us
only having a fairly theoretical understanding of the overall situation.

Like here at intel we have internal code for compute, and we're starting
to hit some interesting cases with interop with media already, but that's
it. Nothing even close to desktop/winsys/kms, and that's where I expect
will all the pain be at.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-04 15:11     ` Daniel Vetter
@ 2021-05-10 18:12       ` Christian König
  2021-05-11  7:31         ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-10 18:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Am 04.05.21 um 17:11 schrieb Daniel Vetter:
> On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
>> Hi Daniel,
>>
>> Am 04.05.21 um 16:15 schrieb Daniel Vetter:
>>> Hi Christian,
>>>
>>> On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
>>>> Hi guys,
>>>>
>>>> with this patch set I want to look into how much more additional work it
>>>> would be to support implicit sync compared to only explicit sync.
>>>>
>>>> Turned out that this is much simpler than expected since the only
>>>> addition is that before a command submission or flip the kernel and
>>>> classic drivers would need to wait for the user fence to signal before
>>>> taking any locks.
>>> It's a lot more I think
>>> - sync_file/drm_syncobj still need to be supported somehow
>> You need that with explicit fences as well.
>>
>> I'm just concentrating on what extra burden implicit sync would get us.
> It's not just implicit sync. Currently the best approach we have for
> explicit sync is hiding them in drm_syncobj. Because for that all the work
> with intentional stall points and userspace submit thread already exists.
>
> None of this work has been done for sync_file. And looking at how much
> work it was to get drm_syncobj going, that will be anything but easy.

I don't think we will want this for sync_file in the first place.

>>> - we need userspace to handle the stall in a submit thread at least
>>> - there's nothing here that sets the sync object
>>> - implicit sync isn't just execbuf, it's everything. E.g. the various
>>>     wait_bo ioctl also need to keep working, including timeout and
>>>     everything
>> Good point, but that should be relatively easily to add as well.
>>
>>> - we can't stall in atomic kms where you're currently stalling, that's for
>>>     sure. The uapi says "we're not stalling for fences in there", and you're
>>>     breaking that.
>> Again as far as I can see we run into the same problem with explicit sync.
>>
>> So the question is where could we block for atomic modeset for user fences
>> in general?
> Nah, I have an idea. But it only works if userspace is aware, because the
> rules are essentialyl:
>
> - when you supply a userspace in-fence, then you only get a userspace
>    out-fence
> - mixing in fences between dma-fence and user fence is ok
> - mixing out fences isn't
>
> And we currently do have sync_file out fence. So it's not possible to
> support implicit user fence in atomic in a way which doesn't break the
> uapi somewhere.
>
> Doing the explicit user fence support first will make that very obvious.
>
> And that's just the one ioctl I know is big trouble, I'm sure we'll find
> more funny corner cases when we roll out explicit user fencing.

I think we can just ignore sync_file. As far as it concerns me that UAPI 
is pretty much dead.

What we should support is drm_syncobj, but that also only as an in-fence 
since that's what our hardware supports.

> Anotherone that looks very sketchy right now is buffer sharing between
> different userspace drivers, like compute <-> media (if you have some
> fancy AI pipeline in your media workload, as an example).

Yeah, we are certainly going to get that. But only inside the same 
driver, so not much of a problem.

>
>>> - ... at this point I stopped pondering but there's definitely more
>>>
>>> Imo the only way we'll even get the complete is if we do the following:
>>> 1. roll out implicit sync with userspace fences on a driver-by-driver basis
> 		s/implicit/explicit/
>
> But I think you got that.
>
>>>      1a. including all the winsys/modeset stuff
>> Completely agree, that's why I've split that up into individual patches.
>>
>> I'm also fine if drivers can just opt out of user fence based
>> synchronization and we return an error from dma_buf_dynamic_attach() if some
>> driver says it can't handle that.
> Yeah, but that boils down to us just breaking those use-cases. Which is
> exactly what you're trying to avoid by rolling out implicit user fence I
> think.

But we can add support to all drivers as necessary.

>
>>> 2. roll out support for userspace fences to drm_syncobj timeline for
>>>      interop, both across process/userspace and across drivers
>>>      2a. including all the winsys/modeset stuff, but hopefully that's
>>>          largely solved with 1. already.
>> Correct, but again we need this for explicit fencing as well.
>>
>>> 3. only then try to figure out how to retroshoehorn this into implicit
>>>      sync, and whether that even makes sense.
>>>
>>> Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
>>> interop fun across drivers) is just praying that this time around we're
>>> not collectively idiots and can correctly predict the future. That never
>>> worked :-)
>>>
>>>> For this prototype this patch set doesn't implement any user fence
>>>> synchronization at all, but just assumes that faulting user pages is
>>>> sufficient to make sure that we can wait for user space to finish
>>>> submitting the work. If necessary this can be made even more strict, the
>>>> only use case I could find which blocks this is the radeon driver and
>>>> that should be handle able.
>>>>
>>>> This of course doesn't give you the same semantic as the classic
>>>> implicit sync to guarantee that you have exclusive access to a buffers,
>>>> but this is also not necessary.
>>>>
>>>> So I think the conclusion should be that we don't need to concentrate on
>>>> implicit vs. explicit sync, but rather how to get the synchronization
>>>> and timeout signalling figured out in general.
>>> I'm not sure what exactly you're proving here aside from "it's possible to
>>> roll out a function with ill-defined semantics to all drivers". This
>>> really is a lot harder than just this one function and just this one patch
>>> set.
>> No it isn't. The hard part is getting the user sync stuff up in general.
>>
>> Adding implicit synchronization on top of that is then rather trivial.
> Well that's what I disagree with, since I already see some problems that I
> don't think we can overcome (the atomic ioctl is one). And that's with us
> only having a fairly theoretical understanding of the overall situation.

But how should we then ever support user fences with the atomic IOCTL?

We can't wait in user space since that will disable the support for 
waiting in the hardware.

Regards,
Christian.

>
> Like here at intel we have internal code for compute, and we're starting
> to hit some interesting cases with interop with media already, but that's
> it. Nothing even close to desktop/winsys/kms, and that's where I expect
> will all the pain be at.
>
> Cheers, Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-10 18:12       ` Christian König
@ 2021-05-11  7:31         ` Daniel Vetter
  2021-05-11  7:47           ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-11  7:31 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, May 10, 2021 at 08:12:31PM +0200, Christian König wrote:
> Am 04.05.21 um 17:11 schrieb Daniel Vetter:
> > On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
> > > Hi Daniel,
> > > 
> > > Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> > > > Hi Christian,
> > > > 
> > > > On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> > > > > Hi guys,
> > > > > 
> > > > > with this patch set I want to look into how much more additional work it
> > > > > would be to support implicit sync compared to only explicit sync.
> > > > > 
> > > > > Turned out that this is much simpler than expected since the only
> > > > > addition is that before a command submission or flip the kernel and
> > > > > classic drivers would need to wait for the user fence to signal before
> > > > > taking any locks.
> > > > It's a lot more I think
> > > > - sync_file/drm_syncobj still need to be supported somehow
> > > You need that with explicit fences as well.
> > > 
> > > I'm just concentrating on what extra burden implicit sync would get us.
> > It's not just implicit sync. Currently the best approach we have for
> > explicit sync is hiding them in drm_syncobj. Because for that all the work
> > with intentional stall points and userspace submit thread already exists.
> > 
> > None of this work has been done for sync_file. And looking at how much
> > work it was to get drm_syncobj going, that will be anything but easy.
> 
> I don't think we will want this for sync_file in the first place.
> 
> > > > - we need userspace to handle the stall in a submit thread at least
> > > > - there's nothing here that sets the sync object
> > > > - implicit sync isn't just execbuf, it's everything. E.g. the various
> > > >     wait_bo ioctl also need to keep working, including timeout and
> > > >     everything
> > > Good point, but that should be relatively easily to add as well.
> > > 
> > > > - we can't stall in atomic kms where you're currently stalling, that's for
> > > >     sure. The uapi says "we're not stalling for fences in there", and you're
> > > >     breaking that.
> > > Again as far as I can see we run into the same problem with explicit sync.
> > > 
> > > So the question is where could we block for atomic modeset for user fences
> > > in general?
> > Nah, I have an idea. But it only works if userspace is aware, because the
> > rules are essentialyl:
> > 
> > - when you supply a userspace in-fence, then you only get a userspace
> >    out-fence
> > - mixing in fences between dma-fence and user fence is ok
> > - mixing out fences isn't
> > 
> > And we currently do have sync_file out fence. So it's not possible to
> > support implicit user fence in atomic in a way which doesn't break the
> > uapi somewhere.
> > 
> > Doing the explicit user fence support first will make that very obvious.
> > 
> > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > more funny corner cases when we roll out explicit user fencing.
> 
> I think we can just ignore sync_file. As far as it concerns me that UAPI is
> pretty much dead.

Uh that's rather bold. Android is built on it. Currently atomic kms is
built on it.

> What we should support is drm_syncobj, but that also only as an in-fence
> since that's what our hardware supports.

Convince Android folks, minimally. Probably a lot more. Yes with hindsight
we should have just gone for drm_syncobj instead of the sync_file thing,
but hindsight and all that.

This is kinda why I don't think trying to support the existing uapi with
userspace fences underneath with some magic tricks is a good idea. It's
just a pile of work, plus it's not really architecturally clean.

> > Anotherone that looks very sketchy right now is buffer sharing between
> > different userspace drivers, like compute <-> media (if you have some
> > fancy AI pipeline in your media workload, as an example).
> 
> Yeah, we are certainly going to get that. But only inside the same driver,
> so not much of a problem.

Why is this not much of a problem if it's just within one driver?

> > > > - ... at this point I stopped pondering but there's definitely more
> > > > 
> > > > Imo the only way we'll even get the complete is if we do the following:
> > > > 1. roll out implicit sync with userspace fences on a driver-by-driver basis
> > 		s/implicit/explicit/
> > 
> > But I think you got that.
> > 
> > > >      1a. including all the winsys/modeset stuff
> > > Completely agree, that's why I've split that up into individual patches.
> > > 
> > > I'm also fine if drivers can just opt out of user fence based
> > > synchronization and we return an error from dma_buf_dynamic_attach() if some
> > > driver says it can't handle that.
> > Yeah, but that boils down to us just breaking those use-cases. Which is
> > exactly what you're trying to avoid by rolling out implicit user fence I
> > think.
> 
> But we can add support to all drivers as necessary.
> 
> > 
> > > > 2. roll out support for userspace fences to drm_syncobj timeline for
> > > >      interop, both across process/userspace and across drivers
> > > >      2a. including all the winsys/modeset stuff, but hopefully that's
> > > >          largely solved with 1. already.
> > > Correct, but again we need this for explicit fencing as well.
> > > 
> > > > 3. only then try to figure out how to retroshoehorn this into implicit
> > > >      sync, and whether that even makes sense.
> > > > 
> > > > Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
> > > > interop fun across drivers) is just praying that this time around we're
> > > > not collectively idiots and can correctly predict the future. That never
> > > > worked :-)
> > > > 
> > > > > For this prototype this patch set doesn't implement any user fence
> > > > > synchronization at all, but just assumes that faulting user pages is
> > > > > sufficient to make sure that we can wait for user space to finish
> > > > > submitting the work. If necessary this can be made even more strict, the
> > > > > only use case I could find which blocks this is the radeon driver and
> > > > > that should be handle able.
> > > > > 
> > > > > This of course doesn't give you the same semantic as the classic
> > > > > implicit sync to guarantee that you have exclusive access to a buffers,
> > > > > but this is also not necessary.
> > > > > 
> > > > > So I think the conclusion should be that we don't need to concentrate on
> > > > > implicit vs. explicit sync, but rather how to get the synchronization
> > > > > and timeout signalling figured out in general.
> > > > I'm not sure what exactly you're proving here aside from "it's possible to
> > > > roll out a function with ill-defined semantics to all drivers". This
> > > > really is a lot harder than just this one function and just this one patch
> > > > set.
> > > No it isn't. The hard part is getting the user sync stuff up in general.
> > > 
> > > Adding implicit synchronization on top of that is then rather trivial.
> > Well that's what I disagree with, since I already see some problems that I
> > don't think we can overcome (the atomic ioctl is one). And that's with us
> > only having a fairly theoretical understanding of the overall situation.
> 
> But how should we then ever support user fences with the atomic IOCTL?
> 
> We can't wait in user space since that will disable the support for waiting
> in the hardware.

Well, figure it out :-)

This is exactly why I'm not seeing anything solved with just rolling a
function call to a bunch of places, because it's pretending all things are
solved when clearly that's not the case.

I really think what we need is to first figure out how to support
userspace fences as explicit entities across the stack, maybe with
something like this order:
1. enable them purely within a single userspace driver (like vk with
winsys disabled, or something else like that except not amd because
there's this amdkfd split for "real" compute)
1a. including atomic ioctl, e.g. for vk direct display support this can be
used without cross-process sharing, new winsys protocols and all that fun
2. figure out how to transport these userspace fences with something like
drm_syncobj
2a. figure out the compat story for drivers which dont do userspace fences
2b. figure out how to absorb the overhead if the winsys/compositor doesn't
support explicit sync
3. maybe figure out how to make this all happen magically with implicit
sync, if we really, really care

If we do 3 before we've nailed all these problems, we're just guaranteeing
we'll get the wrong solutions and so we'll then have 3 ways of doing
userspace fences
- the butchered implicit one that didn't quite work
- the explicit one
- the not-so-butchered implicit one with the lessons from the properly
  done explicit one

The thing is, if you have no idea how to integrate userspace fences
explicitly into atomic ioctl, then you definitely have no idea how to do
it implicitly :-)

And "just block" might be good enough for a quick demo, it still breaks
the contract. Same holds for a bunch of the winsys problems we'll have to
deal with here.
-Daniel

> Regards,
> Christian.
> 
> > 
> > Like here at intel we have internal code for compute, and we're starting
> > to hit some interesting cases with interop with media already, but that's
> > it. Nothing even close to desktop/winsys/kms, and that's where I expect
> > will all the pain be at.
> > 
> > Cheers, Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11  7:31         ` Daniel Vetter
@ 2021-05-11  7:47           ` Christian König
  2021-05-11 14:23             ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-11  7:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> [SNIP]
>>> And that's just the one ioctl I know is big trouble, I'm sure we'll find
>>> more funny corner cases when we roll out explicit user fencing.
>> I think we can just ignore sync_file. As far as it concerns me that UAPI is
>> pretty much dead.
> Uh that's rather bold. Android is built on it. Currently atomic kms is
> built on it.

To be honest I don't think we care about Android at all.

>> What we should support is drm_syncobj, but that also only as an in-fence
>> since that's what our hardware supports.
> Convince Android folks, minimally. Probably a lot more. Yes with hindsight
> we should have just gone for drm_syncobj instead of the sync_file thing,
> but hindsight and all that.
>
> This is kinda why I don't think trying to support the existing uapi with
> userspace fences underneath with some magic tricks is a good idea. It's
> just a pile of work, plus it's not really architecturally clean.
>
>>> Anotherone that looks very sketchy right now is buffer sharing between
>>> different userspace drivers, like compute <-> media (if you have some
>>> fancy AI pipeline in your media workload, as an example).
>> Yeah, we are certainly going to get that. But only inside the same driver,
>> so not much of a problem.
> Why is this not much of a problem if it's just within one driver?

Because inside the same driver I can easily add the waits before 
submitting the MM work as necessary.

>>>> Adding implicit synchronization on top of that is then rather trivial.
>>> Well that's what I disagree with, since I already see some problems that I
>>> don't think we can overcome (the atomic ioctl is one). And that's with us
>>> only having a fairly theoretical understanding of the overall situation.
>> But how should we then ever support user fences with the atomic IOCTL?
>>
>> We can't wait in user space since that will disable the support for waiting
>> in the hardware.
> Well, figure it out :-)
>
> This is exactly why I'm not seeing anything solved with just rolling a
> function call to a bunch of places, because it's pretending all things are
> solved when clearly that's not the case.
>
> I really think what we need is to first figure out how to support
> userspace fences as explicit entities across the stack, maybe with
> something like this order:
> 1. enable them purely within a single userspace driver (like vk with
> winsys disabled, or something else like that except not amd because
> there's this amdkfd split for "real" compute)
> 1a. including atomic ioctl, e.g. for vk direct display support this can be
> used without cross-process sharing, new winsys protocols and all that fun
> 2. figure out how to transport these userspace fences with something like
> drm_syncobj
> 2a. figure out the compat story for drivers which dont do userspace fences
> 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> support explicit sync
> 3. maybe figure out how to make this all happen magically with implicit
> sync, if we really, really care
>
> If we do 3 before we've nailed all these problems, we're just guaranteeing
> we'll get the wrong solutions and so we'll then have 3 ways of doing
> userspace fences
> - the butchered implicit one that didn't quite work
> - the explicit one
> - the not-so-butchered implicit one with the lessons from the properly
>    done explicit one
>
> The thing is, if you have no idea how to integrate userspace fences
> explicitly into atomic ioctl, then you definitely have no idea how to do
> it implicitly :-)

Well I agree on that. But the question is still how would you do 
explicit with atomic?

Transporting fences between processes is not the fundamental problem 
here, but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it 
from the wrong side again. Instead of looking what the kernel needs to 
support this you take a look at userspace and the requirements there.

Regards,
Christian.

>
> And "just block" might be good enough for a quick demo, it still breaks
> the contract. Same holds for a bunch of the winsys problems we'll have to
> deal with here.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> Like here at intel we have internal code for compute, and we're starting
>>> to hit some interesting cases with interop with media already, but that's
>>> it. Nothing even close to desktop/winsys/kms, and that's where I expect
>>> will all the pain be at.
>>>
>>> Cheers, Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11  7:47           ` Christian König
@ 2021-05-11 14:23             ` Daniel Vetter
  2021-05-11 15:32               ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-11 14:23 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > [SNIP]
> > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > > > more funny corner cases when we roll out explicit user fencing.
> > > I think we can just ignore sync_file. As far as it concerns me that UAPI is
> > > pretty much dead.
> > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > built on it.
> 
> To be honest I don't think we care about Android at all.

we = amd or we = upstream here?

> > > What we should support is drm_syncobj, but that also only as an in-fence
> > > since that's what our hardware supports.
> > Convince Android folks, minimally. Probably a lot more. Yes with hindsight
> > we should have just gone for drm_syncobj instead of the sync_file thing,
> > but hindsight and all that.
> > 
> > This is kinda why I don't think trying to support the existing uapi with
> > userspace fences underneath with some magic tricks is a good idea. It's
> > just a pile of work, plus it's not really architecturally clean.
> > 
> > > > Anotherone that looks very sketchy right now is buffer sharing between
> > > > different userspace drivers, like compute <-> media (if you have some
> > > > fancy AI pipeline in your media workload, as an example).
> > > Yeah, we are certainly going to get that. But only inside the same driver,
> > > so not much of a problem.
> > Why is this not much of a problem if it's just within one driver?
> 
> Because inside the same driver I can easily add the waits before submitting
> the MM work as necessary.

What is MM work here now?

> > > > > Adding implicit synchronization on top of that is then rather trivial.
> > > > Well that's what I disagree with, since I already see some problems that I
> > > > don't think we can overcome (the atomic ioctl is one). And that's with us
> > > > only having a fairly theoretical understanding of the overall situation.
> > > But how should we then ever support user fences with the atomic IOCTL?
> > > 
> > > We can't wait in user space since that will disable the support for waiting
> > > in the hardware.
> > Well, figure it out :-)
> > 
> > This is exactly why I'm not seeing anything solved with just rolling a
> > function call to a bunch of places, because it's pretending all things are
> > solved when clearly that's not the case.
> > 
> > I really think what we need is to first figure out how to support
> > userspace fences as explicit entities across the stack, maybe with
> > something like this order:
> > 1. enable them purely within a single userspace driver (like vk with
> > winsys disabled, or something else like that except not amd because
> > there's this amdkfd split for "real" compute)
> > 1a. including atomic ioctl, e.g. for vk direct display support this can be
> > used without cross-process sharing, new winsys protocols and all that fun
> > 2. figure out how to transport these userspace fences with something like
> > drm_syncobj
> > 2a. figure out the compat story for drivers which dont do userspace fences
> > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> > support explicit sync
> > 3. maybe figure out how to make this all happen magically with implicit
> > sync, if we really, really care
> > 
> > If we do 3 before we've nailed all these problems, we're just guaranteeing
> > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > userspace fences
> > - the butchered implicit one that didn't quite work
> > - the explicit one
> > - the not-so-butchered implicit one with the lessons from the properly
> >    done explicit one
> > 
> > The thing is, if you have no idea how to integrate userspace fences
> > explicitly into atomic ioctl, then you definitely have no idea how to do
> > it implicitly :-)
> 
> Well I agree on that. But the question is still how would you do explicit
> with atomic?

If you supply an userpace fence (is that what we call them now) as
in-fence, then your only allowed to get a userspace fence as out-fence.
That way we
- don't block anywhere we shouldn't
- don't create a dma_fence out of a userspace fence

The problem is this completely breaks your "magically make implicit
fencing with userspace fences" plan.

So I have a plan here, what was yours?

> Transporting fences between processes is not the fundamental problem here,
> but rather the question how we represent all this in the kernel?
> 
> In other words I think what you outlined above is just approaching it from
> the wrong side again. Instead of looking what the kernel needs to support
> this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > And "just block" might be good enough for a quick demo, it still breaks
> > the contract. Same holds for a bunch of the winsys problems we'll have to
> > deal with here.
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Like here at intel we have internal code for compute, and we're starting
> > > > to hit some interesting cases with interop with media already, but that's
> > > > it. Nothing even close to desktop/winsys/kms, and that's where I expect
> > > > will all the pain be at.
> > > > 
> > > > Cheers, Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11 14:23             ` Daniel Vetter
@ 2021-05-11 15:32               ` Christian König
  2021-05-11 16:48                 ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-11 15:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
>> Am 11.05.21 um 09:31 schrieb Daniel Vetter:
>>> [SNIP]
>>>>> And that's just the one ioctl I know is big trouble, I'm sure we'll find
>>>>> more funny corner cases when we roll out explicit user fencing.
>>>> I think we can just ignore sync_file. As far as it concerns me that UAPI is
>>>> pretty much dead.
>>> Uh that's rather bold. Android is built on it. Currently atomic kms is
>>> built on it.
>> To be honest I don't think we care about Android at all.
> we = amd or we = upstream here?

we = amd, for everybody else that is certainly a different topic.

But for now AMD is the only one running into this problem.

Could be that Nouveau sees this as well with the next hw generation, but 
who knows?

>>> Why is this not much of a problem if it's just within one driver?
>> Because inside the same driver I can easily add the waits before submitting
>> the MM work as necessary.
> What is MM work here now?

MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.

>>>>>> Adding implicit synchronization on top of that is then rather trivial.
>>>>> Well that's what I disagree with, since I already see some problems that I
>>>>> don't think we can overcome (the atomic ioctl is one). And that's with us
>>>>> only having a fairly theoretical understanding of the overall situation.
>>>> But how should we then ever support user fences with the atomic IOCTL?
>>>>
>>>> We can't wait in user space since that will disable the support for waiting
>>>> in the hardware.
>>> Well, figure it out :-)
>>>
>>> This is exactly why I'm not seeing anything solved with just rolling a
>>> function call to a bunch of places, because it's pretending all things are
>>> solved when clearly that's not the case.
>>>
>>> I really think what we need is to first figure out how to support
>>> userspace fences as explicit entities across the stack, maybe with
>>> something like this order:
>>> 1. enable them purely within a single userspace driver (like vk with
>>> winsys disabled, or something else like that except not amd because
>>> there's this amdkfd split for "real" compute)
>>> 1a. including atomic ioctl, e.g. for vk direct display support this can be
>>> used without cross-process sharing, new winsys protocols and all that fun
>>> 2. figure out how to transport these userspace fences with something like
>>> drm_syncobj
>>> 2a. figure out the compat story for drivers which dont do userspace fences
>>> 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
>>> support explicit sync
>>> 3. maybe figure out how to make this all happen magically with implicit
>>> sync, if we really, really care
>>>
>>> If we do 3 before we've nailed all these problems, we're just guaranteeing
>>> we'll get the wrong solutions and so we'll then have 3 ways of doing
>>> userspace fences
>>> - the butchered implicit one that didn't quite work
>>> - the explicit one
>>> - the not-so-butchered implicit one with the lessons from the properly
>>>     done explicit one
>>>
>>> The thing is, if you have no idea how to integrate userspace fences
>>> explicitly into atomic ioctl, then you definitely have no idea how to do
>>> it implicitly :-)
>> Well I agree on that. But the question is still how would you do explicit
>> with atomic?
> If you supply an userpace fence (is that what we call them now) as
> in-fence, then your only allowed to get a userspace fence as out-fence.

Yeah, that part makes perfectly sense. But I don't see the problem with 
that?

> That way we
> - don't block anywhere we shouldn't
> - don't create a dma_fence out of a userspace fence
>
> The problem is this completely breaks your "magically make implicit
> fencing with userspace fences" plan.

Why?

> So I have a plan here, what was yours?

As far as I see that should still work perfectly fine and I have the 
strong feeling I'm missing something here.

>> Transporting fences between processes is not the fundamental problem here,
>> but rather the question how we represent all this in the kernel?
>>
>> In other words I think what you outlined above is just approaching it from
>> the wrong side again. Instead of looking what the kernel needs to support
>> this you take a look at userspace and the requirements there.
> Uh ... that was my idea here? That's why I put "build userspace fences in
> userspace only" as the very first thing. Then extend to winsys and
> atomic/display and all these cases where things get more tricky.
>
> I agree that transporting the fences is easy, which is why it's not
> interesting trying to solve that problem first. Which is kinda what you're
> trying to do here by adding implicit userspace fences (well not even that,
> just a bunch of function calls without any semantics attached to them).
>
> So if there's more here, you need to flesh it out more or I just dont get
> what you're actually trying to demonstrate.

Well I'm trying to figure out why you see it as such a problem to keep 
implicit sync around.

As far as I can tell it is completely octagonal if we use 
implicit/explicit and dma_fence/user_fence.

It's just a different implementation inside the kernel.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>> And "just block" might be good enough for a quick demo, it still breaks
>>> the contract. Same holds for a bunch of the winsys problems we'll have to
>>> deal with here.
>>> -Daniel
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Like here at intel we have internal code for compute, and we're starting
>>>>> to hit some interesting cases with interop with media already, but that's
>>>>> it. Nothing even close to desktop/winsys/kms, and that's where I expect
>>>>> will all the pain be at.
>>>>>
>>>>> Cheers, Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11 15:32               ` Christian König
@ 2021-05-11 16:48                 ` Daniel Vetter
  2021-05-11 19:34                   ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-11 16:48 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, May 11, 2021 at 05:32:29PM +0200, Christian König wrote:
> Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> > > Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > > > > > more funny corner cases when we roll out explicit user fencing.
> > > > > I think we can just ignore sync_file. As far as it concerns me that UAPI is
> > > > > pretty much dead.
> > > > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > > > built on it.
> > > To be honest I don't think we care about Android at all.
> > we = amd or we = upstream here?
> 
> we = amd, for everybody else that is certainly a different topic.
> 
> But for now AMD is the only one running into this problem.
> 
> Could be that Nouveau sees this as well with the next hw generation, but who
> knows?
> 
> > > > Why is this not much of a problem if it's just within one driver?
> > > Because inside the same driver I can easily add the waits before submitting
> > > the MM work as necessary.
> > What is MM work here now?
> 
> MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.
> 
> > > > > > > Adding implicit synchronization on top of that is then rather trivial.
> > > > > > Well that's what I disagree with, since I already see some problems that I
> > > > > > don't think we can overcome (the atomic ioctl is one). And that's with us
> > > > > > only having a fairly theoretical understanding of the overall situation.
> > > > > But how should we then ever support user fences with the atomic IOCTL?
> > > > > 
> > > > > We can't wait in user space since that will disable the support for waiting
> > > > > in the hardware.
> > > > Well, figure it out :-)
> > > > 
> > > > This is exactly why I'm not seeing anything solved with just rolling a
> > > > function call to a bunch of places, because it's pretending all things are
> > > > solved when clearly that's not the case.
> > > > 
> > > > I really think what we need is to first figure out how to support
> > > > userspace fences as explicit entities across the stack, maybe with
> > > > something like this order:
> > > > 1. enable them purely within a single userspace driver (like vk with
> > > > winsys disabled, or something else like that except not amd because
> > > > there's this amdkfd split for "real" compute)
> > > > 1a. including atomic ioctl, e.g. for vk direct display support this can be
> > > > used without cross-process sharing, new winsys protocols and all that fun
> > > > 2. figure out how to transport these userspace fences with something like
> > > > drm_syncobj
> > > > 2a. figure out the compat story for drivers which dont do userspace fences
> > > > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> > > > support explicit sync
> > > > 3. maybe figure out how to make this all happen magically with implicit
> > > > sync, if we really, really care
> > > > 
> > > > If we do 3 before we've nailed all these problems, we're just guaranteeing
> > > > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > > > userspace fences
> > > > - the butchered implicit one that didn't quite work
> > > > - the explicit one
> > > > - the not-so-butchered implicit one with the lessons from the properly
> > > >     done explicit one
> > > > 
> > > > The thing is, if you have no idea how to integrate userspace fences
> > > > explicitly into atomic ioctl, then you definitely have no idea how to do
> > > > it implicitly :-)
> > > Well I agree on that. But the question is still how would you do explicit
> > > with atomic?
> > If you supply an userpace fence (is that what we call them now) as
> > in-fence, then your only allowed to get a userspace fence as out-fence.
> 
> Yeah, that part makes perfectly sense. But I don't see the problem with
> that?
> 
> > That way we
> > - don't block anywhere we shouldn't
> > - don't create a dma_fence out of a userspace fence
> > 
> > The problem is this completely breaks your "magically make implicit
> > fencing with userspace fences" plan.
> 
> Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.

So you have to do an uapi change here. At that point we might as well do
it right.

Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.

> > So I have a plan here, what was yours?
> 
> As far as I see that should still work perfectly fine and I have the strong
> feeling I'm missing something here.
> 
> > > Transporting fences between processes is not the fundamental problem here,
> > > but rather the question how we represent all this in the kernel?
> > > 
> > > In other words I think what you outlined above is just approaching it from
> > > the wrong side again. Instead of looking what the kernel needs to support
> > > this you take a look at userspace and the requirements there.
> > Uh ... that was my idea here? That's why I put "build userspace fences in
> > userspace only" as the very first thing. Then extend to winsys and
> > atomic/display and all these cases where things get more tricky.
> > 
> > I agree that transporting the fences is easy, which is why it's not
> > interesting trying to solve that problem first. Which is kinda what you're
> > trying to do here by adding implicit userspace fences (well not even that,
> > just a bunch of function calls without any semantics attached to them).
> > 
> > So if there's more here, you need to flesh it out more or I just dont get
> > what you're actually trying to demonstrate.
> 
> Well I'm trying to figure out why you see it as such a problem to keep
> implicit sync around.
> 
> As far as I can tell it is completely octagonal if we use implicit/explicit
> and dma_fence/user_fence.
> 
> It's just a different implementation inside the kernel.

See above. It falls apart with the atomic ioctl.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > And "just block" might be good enough for a quick demo, it still breaks
> > > > the contract. Same holds for a bunch of the winsys problems we'll have to
> > > > deal with here.
> > > > -Daniel
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Like here at intel we have internal code for compute, and we're starting
> > > > > > to hit some interesting cases with interop with media already, but that's
> > > > > > it. Nothing even close to desktop/winsys/kms, and that's where I expect
> > > > > > will all the pain be at.
> > > > > > 
> > > > > > Cheers, Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11 16:48                 ` Daniel Vetter
@ 2021-05-11 19:34                   ` Christian König
  2021-05-12  8:13                     ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-11 19:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Am 11.05.21 um 18:48 schrieb Daniel Vetter:
> [SNIP]
>> Why?
> If you allow implicit fencing then you can end up with
> - an implicit userspace fence as the in-fence
> - but an explicit dma_fence as the out fence
>
> Which is not allowed. So there's really no way to make this work, except
> if you stall in the ioctl, which also doesn't work.

Ok, wait a second. I really don't understand what's going on here.

The out fence is just to let the userspace know when the frame is 
displayed. Or rather when the old frame is no longer displayed so that 
it can be reused, right?

Then why does that need to be a dma_fence? We don't use that for memory 
management anywhere, don't we?

> So you have to do an uapi change here. At that point we might as well do
> it right.

I mean in the worst case we might need to allow user fences with 
sync_files as well when that is really used outside of Android.

But I still don't see the fundamental problem here.

Regards,
Christian.

> Of course if you only care about some specific compositors (or maybe only
> the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
> cross-driver so we can't do that. Or at least I don't see a way how to do
> this without causing endless amounts of fun down the road.
>
>>> So I have a plan here, what was yours?
>> As far as I see that should still work perfectly fine and I have the strong
>> feeling I'm missing something here.
>>
>>>> Transporting fences between processes is not the fundamental problem here,
>>>> but rather the question how we represent all this in the kernel?
>>>>
>>>> In other words I think what you outlined above is just approaching it from
>>>> the wrong side again. Instead of looking what the kernel needs to support
>>>> this you take a look at userspace and the requirements there.
>>> Uh ... that was my idea here? That's why I put "build userspace fences in
>>> userspace only" as the very first thing. Then extend to winsys and
>>> atomic/display and all these cases where things get more tricky.
>>>
>>> I agree that transporting the fences is easy, which is why it's not
>>> interesting trying to solve that problem first. Which is kinda what you're
>>> trying to do here by adding implicit userspace fences (well not even that,
>>> just a bunch of function calls without any semantics attached to them).
>>>
>>> So if there's more here, you need to flesh it out more or I just dont get
>>> what you're actually trying to demonstrate.
>> Well I'm trying to figure out why you see it as such a problem to keep
>> implicit sync around.
>>
>> As far as I can tell it is completely octagonal if we use implicit/explicit
>> and dma_fence/user_fence.
>>
>> It's just a different implementation inside the kernel.
> See above. It falls apart with the atomic ioctl.
> -Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-11 19:34                   ` Christian König
@ 2021-05-12  8:13                     ` Daniel Vetter
  2021-05-12  8:23                       ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Vetter @ 2021-05-12  8:13 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
> Am 11.05.21 um 18:48 schrieb Daniel Vetter:
> > [SNIP]
> > > Why?
> > If you allow implicit fencing then you can end up with
> > - an implicit userspace fence as the in-fence
> > - but an explicit dma_fence as the out fence
> > 
> > Which is not allowed. So there's really no way to make this work, except
> > if you stall in the ioctl, which also doesn't work.
> 
> Ok, wait a second. I really don't understand what's going on here.
> 
> The out fence is just to let the userspace know when the frame is displayed.
> Or rather when the old frame is no longer displayed so that it can be
> reused, right?
> 
> Then why does that need to be a dma_fence? We don't use that for memory
> management anywhere, don't we?

It can be a sync_file, so you can queue up the next rendering targeting
the old backbuffer right away. On memory constraint devices where
triple-buffering is prohibitive this is apparently a pretty cool trick or
something like that.

> > So you have to do an uapi change here. At that point we might as well do
> > it right.
> 
> I mean in the worst case we might need to allow user fences with sync_files
> as well when that is really used outside of Android.
> 
> But I still don't see the fundamental problem here.

Making sync_file use implicit is more pain, it becomes sprawling pretty
quickly.

Anyway I think we're just turning in circles. My take is that your patch
series here demonstrates nothing beyond "adding function calls that do
nothing is easy", the real deal is in making it work. And I think it's
easier to make this all work with explicit userspace fencing first and
then worry about how we maybe make this work implicitly. Instead of what
you propose, which is rig up a lot of scaffolding without having any idea
whether it's even in the right place. That seems very backwards to me.

Plus I really don't like retro-fitting userspace fences into implicit sync
and sync_file and everything. But that's not the main issue I have here.
-Daniel

> 
> Regards,
> Christian.
> 
> > Of course if you only care about some specific compositors (or maybe only
> > the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
> > cross-driver so we can't do that. Or at least I don't see a way how to do
> > this without causing endless amounts of fun down the road.
> > 
> > > > So I have a plan here, what was yours?
> > > As far as I see that should still work perfectly fine and I have the strong
> > > feeling I'm missing something here.
> > > 
> > > > > Transporting fences between processes is not the fundamental problem here,
> > > > > but rather the question how we represent all this in the kernel?
> > > > > 
> > > > > In other words I think what you outlined above is just approaching it from
> > > > > the wrong side again. Instead of looking what the kernel needs to support
> > > > > this you take a look at userspace and the requirements there.
> > > > Uh ... that was my idea here? That's why I put "build userspace fences in
> > > > userspace only" as the very first thing. Then extend to winsys and
> > > > atomic/display and all these cases where things get more tricky.
> > > > 
> > > > I agree that transporting the fences is easy, which is why it's not
> > > > interesting trying to solve that problem first. Which is kinda what you're
> > > > trying to do here by adding implicit userspace fences (well not even that,
> > > > just a bunch of function calls without any semantics attached to them).
> > > > 
> > > > So if there's more here, you need to flesh it out more or I just dont get
> > > > what you're actually trying to demonstrate.
> > > Well I'm trying to figure out why you see it as such a problem to keep
> > > implicit sync around.
> > > 
> > > As far as I can tell it is completely octagonal if we use implicit/explicit
> > > and dma_fence/user_fence.
> > > 
> > > It's just a different implementation inside the kernel.
> > See above. It falls apart with the atomic ioctl.
> > -Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-12  8:13                     ` Daniel Vetter
@ 2021-05-12  8:23                       ` Christian König
  2021-05-12  8:50                         ` Daniel Vetter
  0 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2021-05-12  8:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Am 12.05.21 um 10:13 schrieb Daniel Vetter:
> On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
>> Am 11.05.21 um 18:48 schrieb Daniel Vetter:
>>> [SNIP]
>>>> Why?
>>> If you allow implicit fencing then you can end up with
>>> - an implicit userspace fence as the in-fence
>>> - but an explicit dma_fence as the out fence
>>>
>>> Which is not allowed. So there's really no way to make this work, except
>>> if you stall in the ioctl, which also doesn't work.
>> Ok, wait a second. I really don't understand what's going on here.
>>
>> The out fence is just to let the userspace know when the frame is displayed.
>> Or rather when the old frame is no longer displayed so that it can be
>> reused, right?
>>
>> Then why does that need to be a dma_fence? We don't use that for memory
>> management anywhere, don't we?
> It can be a sync_file, so you can queue up the next rendering targeting
> the old backbuffer right away. On memory constraint devices where
> triple-buffering is prohibitive this is apparently a pretty cool trick or
> something like that.

Yeah, I'm aware of that.

But we don't really need that for device we want to interop with 
userspace queues, don't we?

>>> So you have to do an uapi change here. At that point we might as well do
>>> it right.
>> I mean in the worst case we might need to allow user fences with sync_files
>> as well when that is really used outside of Android.
>>
>> But I still don't see the fundamental problem here.
> Making sync_file use implicit is more pain, it becomes sprawling pretty
> quickly.

Agreed, but I don't see supporting sync_file and out fences as something 
necessarily.

As far as I can see we don't need to support that burden for the use 
cases we have for implicit sync.

And even if we have to it is possible to implement.

> Anyway I think we're just turning in circles. My take is that your patch
> series here demonstrates nothing beyond "adding function calls that do
> nothing is easy", the real deal is in making it work. And I think it's
> easier to make this all work with explicit userspace fencing first and
> then worry about how we maybe make this work implicitly. Instead of what
> you propose, which is rig up a lot of scaffolding without having any idea
> whether it's even in the right place. That seems very backwards to me.

And that's what I disagree on.

Supporting implicit sync in the kernel for the functionality we need to 
amdgpu is relatively easily.

Change all of userspace to not rely on implicit sync any more is really 
hard compared to that.

Dropping implicit sync in userspace has a lot of advantage and should be 
pushed for, but not because it is a prerequisite of user fences.

Regards,
Christian.

>
> Plus I really don't like retro-fitting userspace fences into implicit sync
> and sync_file and everything. But that's not the main issue I have here.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> Of course if you only care about some specific compositors (or maybe only
>>> the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
>>> cross-driver so we can't do that. Or at least I don't see a way how to do
>>> this without causing endless amounts of fun down the road.
>>>
>>>>> So I have a plan here, what was yours?
>>>> As far as I see that should still work perfectly fine and I have the strong
>>>> feeling I'm missing something here.
>>>>
>>>>>> Transporting fences between processes is not the fundamental problem here,
>>>>>> but rather the question how we represent all this in the kernel?
>>>>>>
>>>>>> In other words I think what you outlined above is just approaching it from
>>>>>> the wrong side again. Instead of looking what the kernel needs to support
>>>>>> this you take a look at userspace and the requirements there.
>>>>> Uh ... that was my idea here? That's why I put "build userspace fences in
>>>>> userspace only" as the very first thing. Then extend to winsys and
>>>>> atomic/display and all these cases where things get more tricky.
>>>>>
>>>>> I agree that transporting the fences is easy, which is why it's not
>>>>> interesting trying to solve that problem first. Which is kinda what you're
>>>>> trying to do here by adding implicit userspace fences (well not even that,
>>>>> just a bunch of function calls without any semantics attached to them).
>>>>>
>>>>> So if there's more here, you need to flesh it out more or I just dont get
>>>>> what you're actually trying to demonstrate.
>>>> Well I'm trying to figure out why you see it as such a problem to keep
>>>> implicit sync around.
>>>>
>>>> As far as I can tell it is completely octagonal if we use implicit/explicit
>>>> and dma_fence/user_fence.
>>>>
>>>> It's just a different implementation inside the kernel.
>>> See above. It falls apart with the atomic ioctl.
>>> -Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC] Implicit vs explicit user fence sync
  2021-05-12  8:23                       ` Christian König
@ 2021-05-12  8:50                         ` Daniel Vetter
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Vetter @ 2021-05-12  8:50 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Wed, May 12, 2021 at 10:23:04AM +0200, Christian König wrote:
> Am 12.05.21 um 10:13 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
> > > Am 11.05.21 um 18:48 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > Why?
> > > > If you allow implicit fencing then you can end up with
> > > > - an implicit userspace fence as the in-fence
> > > > - but an explicit dma_fence as the out fence
> > > > 
> > > > Which is not allowed. So there's really no way to make this work, except
> > > > if you stall in the ioctl, which also doesn't work.
> > > Ok, wait a second. I really don't understand what's going on here.
> > > 
> > > The out fence is just to let the userspace know when the frame is displayed.
> > > Or rather when the old frame is no longer displayed so that it can be
> > > reused, right?
> > > 
> > > Then why does that need to be a dma_fence? We don't use that for memory
> > > management anywhere, don't we?
> > It can be a sync_file, so you can queue up the next rendering targeting
> > the old backbuffer right away. On memory constraint devices where
> > triple-buffering is prohibitive this is apparently a pretty cool trick or
> > something like that.
> 
> Yeah, I'm aware of that.
> 
> But we don't really need that for device we want to interop with userspace
> queues, don't we?
> 
> > > > So you have to do an uapi change here. At that point we might as well do
> > > > it right.
> > > I mean in the worst case we might need to allow user fences with sync_files
> > > as well when that is really used outside of Android.
> > > 
> > > But I still don't see the fundamental problem here.
> > Making sync_file use implicit is more pain, it becomes sprawling pretty
> > quickly.
> 
> Agreed, but I don't see supporting sync_file and out fences as something
> necessarily.
> 
> As far as I can see we don't need to support that burden for the use cases
> we have for implicit sync.
> 
> And even if we have to it is possible to implement.
> 
> > Anyway I think we're just turning in circles. My take is that your patch
> > series here demonstrates nothing beyond "adding function calls that do
> > nothing is easy", the real deal is in making it work. And I think it's
> > easier to make this all work with explicit userspace fencing first and
> > then worry about how we maybe make this work implicitly. Instead of what
> > you propose, which is rig up a lot of scaffolding without having any idea
> > whether it's even in the right place. That seems very backwards to me.
> 
> And that's what I disagree on.
> 
> Supporting implicit sync in the kernel for the functionality we need to
> amdgpu is relatively easily.
> 
> Change all of userspace to not rely on implicit sync any more is really hard
> compared to that.
> 
> Dropping implicit sync in userspace has a lot of advantage and should be
> pushed for, but not because it is a prerequisite of user fences.

Yeah but if we support userspace fences with implicit sync only, because
that's right now the only thing amdgpu needs, then we make really, really
sure the transition to explicit sync will never happen.

Also if the goal really is to only do the minimal thing to keep amdgpu
working on existing compositors, then imo the right solution is to just do
the kernel-augmented userspace submit I've proposed. Not this.

The kernel augmented userspace submit keeps all dma_fence stuff working
_exactly_ like it currently does, including drm/scheduler sorting
depenendencies and everything. So from a "will it work" point of view a
much more solid solution.

And it doesn't require us to create an extremely warty uapi where if you
happen to use implicit userspace fences alot of things kinda stop working,
for no reason.

So if youre overall goal is to just not touch any compositor/winsys
protocol at all, there _is_ a very clean solution imo.
-Daniel

> Regards,
> Christian.
> 
> > 
> > Plus I really don't like retro-fitting userspace fences into implicit sync
> > and sync_file and everything. But that's not the main issue I have here.
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Of course if you only care about some specific compositors (or maybe only
> > > > the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
> > > > cross-driver so we can't do that. Or at least I don't see a way how to do
> > > > this without causing endless amounts of fun down the road.
> > > > 
> > > > > > So I have a plan here, what was yours?
> > > > > As far as I see that should still work perfectly fine and I have the strong
> > > > > feeling I'm missing something here.
> > > > > 
> > > > > > > Transporting fences between processes is not the fundamental problem here,
> > > > > > > but rather the question how we represent all this in the kernel?
> > > > > > > 
> > > > > > > In other words I think what you outlined above is just approaching it from
> > > > > > > the wrong side again. Instead of looking what the kernel needs to support
> > > > > > > this you take a look at userspace and the requirements there.
> > > > > > Uh ... that was my idea here? That's why I put "build userspace fences in
> > > > > > userspace only" as the very first thing. Then extend to winsys and
> > > > > > atomic/display and all these cases where things get more tricky.
> > > > > > 
> > > > > > I agree that transporting the fences is easy, which is why it's not
> > > > > > interesting trying to solve that problem first. Which is kinda what you're
> > > > > > trying to do here by adding implicit userspace fences (well not even that,
> > > > > > just a bunch of function calls without any semantics attached to them).
> > > > > > 
> > > > > > So if there's more here, you need to flesh it out more or I just dont get
> > > > > > what you're actually trying to demonstrate.
> > > > > Well I'm trying to figure out why you see it as such a problem to keep
> > > > > implicit sync around.
> > > > > 
> > > > > As far as I can tell it is completely octagonal if we use implicit/explicit
> > > > > and dma_fence/user_fence.
> > > > > 
> > > > > It's just a different implementation inside the kernel.
> > > > See above. It falls apart with the atomic ioctl.
> > > > -Daniel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-05-12  8:50 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 13:27 [RFC] Implicit vs explicit user fence sync Christian König
2021-05-04 13:27 ` [PATCH 01/12] dma-buf: add interface for user fence synchronization Christian König
2021-05-04 13:27 ` [PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support Christian König
2021-05-04 13:27 ` [PATCH 03/12] drm/amdgpu: " Christian König
2021-05-04 13:27 ` [PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper Christian König
2021-05-04 13:27 ` [PATCH 05/12] drm/etnaviv: add DMA-buf user fence support Christian König
2021-05-04 13:27 ` [PATCH 06/12] drm/i915: " Christian König
2021-05-04 13:27 ` [PATCH 07/12] drm/lima: " Christian König
2021-05-04 13:27 ` [PATCH 08/12] drm/msm: " Christian König
2021-05-04 13:27 ` [PATCH 09/12] drm/nouveau: " Christian König
2021-05-04 13:27 ` [PATCH 10/12] drm/panfrost: " Christian König
2021-05-04 13:27 ` [PATCH 11/12] drm/radeon: " Christian König
2021-05-04 13:27 ` [PATCH 12/12] drm/v3d: " Christian König
2021-05-04 14:15 ` [RFC] Implicit vs explicit user fence sync Daniel Vetter
2021-05-04 14:26   ` Christian König
2021-05-04 15:11     ` Daniel Vetter
2021-05-10 18:12       ` Christian König
2021-05-11  7:31         ` Daniel Vetter
2021-05-11  7:47           ` Christian König
2021-05-11 14:23             ` Daniel Vetter
2021-05-11 15:32               ` Christian König
2021-05-11 16:48                 ` Daniel Vetter
2021-05-11 19:34                   ` Christian König
2021-05-12  8:13                     ` Daniel Vetter
2021-05-12  8:23                       ` Christian König
2021-05-12  8:50                         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.