[PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
@ 2016-08-29  7:08 Chris Wilson
  2016-08-29  7:08 ` [PATCH 02/11] drm/etnaviv: Remove manual " Chris Wilson
                   ` (12 more replies)
  0 siblings, 13 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: Alex Deucher, intel-gfx, Christian König

Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
need to handle such conversion in the caller. The only challenge are
those callers that wish to differentiate the error code between the
nonblocking busy check and potentially blocking wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 88fbed2389c0..a3e6f883ac2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -407,10 +407,8 @@ int amdgpu_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 	}
 	robj = gem_to_amdgpu_bo(gobj);
-	if (timeout == 0)
-		ret = reservation_object_test_signaled_rcu(robj->tbo.resv, true);
-	else
-		ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, timeout);
+	ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true,
+						  timeout);
 
 	/* ret == 0 means not signaled,
 	 * ret > 0 means signaled
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/11] drm/etnaviv: Remove manual call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08 ` Chris Wilson
  2016-09-23 12:55   ` Daniel Vetter
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx, Lucas Stach, Russell King

Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
need to handle such conversion in the caller. The only challenge are
those callers that wish to differentiate the error code between the
nonblocking busy check and potentially blocking wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 5ce3603e6eac..9ffca2478e02 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -409,20 +409,16 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
 	struct drm_device *dev = obj->dev;
 	bool write = !!(op & ETNA_PREP_WRITE);
-	int ret;
-
-	if (op & ETNA_PREP_NOSYNC) {
-		if (!reservation_object_test_signaled_rcu(etnaviv_obj->resv,
-							  write))
-			return -EBUSY;
-	} else {
-		unsigned long remain = etnaviv_timeout_to_jiffies(timeout);
-
-		ret = reservation_object_wait_timeout_rcu(etnaviv_obj->resv,
-							  write, true, remain);
-		if (ret <= 0)
-			return ret == 0 ? -ETIMEDOUT : ret;
-	}
+	unsigned long remain =
+		op & ETNA_PREP_NOSYNC ? 0 : etnaviv_timeout_to_jiffies(timeout);
+	long lret;
+
+	lret = reservation_object_wait_timeout_rcu(etnaviv_obj->resv,
+						   write, true, remain);
+	if (lret < 0)
+		return lret;
+	else if (lret == 0)
+		return remain == 0 ? -EBUSY : -ETIMEDOUT;
 
 	if (etnaviv_obj->flags & ETNA_BO_CACHED) {
 		if (!etnaviv_obj->sgt) {
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/11] drm/msm: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
  2016-08-29  7:08 ` [PATCH 02/11] drm/etnaviv: Remove manual " Chris Wilson
@ 2016-08-29  7:08 ` Chris Wilson
  2016-09-23 12:55   ` Daniel Vetter
  2016-08-29  7:08 ` [PATCH 04/11] drm/nouveau: " Chris Wilson
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx

Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
need to handle such conversion in the caller. The only challenge are
those callers that wish to differentiate the error code between the
nonblocking busy check and potentially blocking wait.

v2: 9 is only 0 in German.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rob Clark <robdclark@gmail.com>
---
 drivers/gpu/drm/msm/msm_gem.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 6cd4af443139..45796a88d353 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -584,18 +584,16 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	bool write = !!(op & MSM_PREP_WRITE);
-
-	if (op & MSM_PREP_NOSYNC) {
-		if (!reservation_object_test_signaled_rcu(msm_obj->resv, write))
-			return -EBUSY;
-	} else {
-		int ret;
-
-		ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
-				true, timeout_to_jiffies(timeout));
-		if (ret <= 0)
-			return ret == 0 ? -ETIMEDOUT : ret;
-	}
+	unsigned long remain =
+		op & MSG_PREP_NOSYNC ? 0 : timeout_to_jiffies(timeout);
+	long ret;
+
+	ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
+						  true,  remain);
+	if (ret == 0)
+		return remain == 0 ? -EBUSY : -ETIMEDOUT;
+	else if (ret < 0)
+		return ret;
 
 	/* TODO cache maintenance */
 
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/11] drm/nouveau: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
  2016-08-29  7:08 ` [PATCH 02/11] drm/etnaviv: Remove manual " Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
@ 2016-08-29  7:08 ` Chris Wilson
  2016-09-23 12:55   ` Daniel Vetter
  2016-08-29  7:08 ` [PATCH 05/11] drm/vmwgfx: " Chris Wilson
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx, Ben Skeggs

Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
need to handle such conversion in the caller. The only challenge are
those callers that wish to differentiate the error code between the
nonblocking busy check and potentially blocking wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Skeggs <bskeggs@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 72e2399bce39..b90e21ff1ed8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -861,6 +861,7 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
 	bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
+	long lret;
 	int ret;
 
 	gem = drm_gem_object_lookup(file_priv, req->handle);
@@ -868,19 +869,15 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 		return -ENOENT;
 	nvbo = nouveau_gem_object(gem);
 
-	if (no_wait)
-		ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, write) ? 0 : -EBUSY;
-	else {
-		long lret;
+	lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true,
+						   no_wait ? 0 :30 * HZ);
+	if (!lret)
+		ret = -EBUSY;
+	else if (lret > 0)
+		ret = 0;
+	else
+		ret = lret;
 
-		lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true, 30 * HZ);
-		if (!lret)
-			ret = -EBUSY;
-		else if (lret > 0)
-			ret = 0;
-		else
-			ret = lret;
-	}
 	nouveau_bo_sync_for_cpu(nvbo);
 	drm_gem_object_unreference_unlocked(gem);
 
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/11] drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
                   ` (2 preceding siblings ...)
  2016-08-29  7:08 ` [PATCH 04/11] drm/nouveau: " Chris Wilson
@ 2016-08-29  7:08 ` Chris Wilson
  2016-09-23 12:56   ` Daniel Vetter
  2016-08-29  7:08   ` Chris Wilson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx, Thomas Hellstrom

Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
need to handle such conversion in the caller. The only challenge are
those callers that wish to differentiate the error code between the
nonblocking busy check and potentially blocking wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 6a328d507a28..1a85fb2d4dc6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -574,10 +574,8 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
 		bool nonblock = !!(flags & drm_vmw_synccpu_dontblock);
 		long lret;
 
-		if (nonblock)
-			return reservation_object_test_signaled_rcu(bo->resv, true) ? 0 : -EBUSY;
-
-		lret = reservation_object_wait_timeout_rcu(bo->resv, true, true, MAX_SCHEDULE_TIMEOUT);
+		lret = reservation_object_wait_timeout_rcu(bo->resv, true, true,
+							   nonblock ? 0 : MAX_SCHEDULE_TIMEOUT);
 		if (!lret)
 			return -EBUSY;
 		else if (lret < 0)
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig

This variant of fence_get_rcu() takes an RCU protected pointer to a
fence and carefully returns a reference to the fence ensuring that it is
not reallocated as it does. This is required when mixing fences and
SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d763053f97a..c9c5ba98c302 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
 void fence_free(struct fence *fence);
 
 /**
+ * fence_put - decreases refcount of the fence
+ * @fence:	[in]	fence to reduce refcount of
+ */
+static inline void fence_put(struct fence *fence)
+{
+	if (fence)
+		kref_put(&fence->refcount, fence_release);
+}
+
+/**
  * fence_get - increases refcount of the fence
  * @fence:	[in]	fence to increase refcount of
  *
@@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
 }
 
 /**
- * fence_put - decreases refcount of the fence
- * @fence:	[in]	fence to reduce refcount of
+ * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
+ * @fence:	[in]	pointer to fence to increase refcount of
+ *
+ * Function returns NULL if no refcount could be obtained, or the fence.
+ * This function handles acquiring a reference to a fence that may be
+ * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
+ * so long as the caller is using RCU on the pointer to the fence.
+ *
+ * An alternative mechanism is to employ a seqlock to protect a bunch of
+ * fences, such as used by struct reservation_object. When using a seqlock,
+ * the seqlock must be taken before and checked after a reference to the
+ * fence is acquired (as shown here).
+ *
+ * The caller is required to hold the RCU read lock.
  */
-static inline void fence_put(struct fence *fence)
+static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
 {
-	if (fence)
-		kref_put(&fence->refcount, fence_release);
+	do {
+		struct fence *fence;
+
+		fence = rcu_dereference(*fencep);
+		if (!fence || !fence_get_rcu(fence))
+			return NULL;
+
+		/* The atomic_inc_not_zero() inside fence_get_rcu()
+		 * provides a full memory barrier upon success (such as now).
+		 * This is paired with the write barrier from assigning
+		 * to the __rcu protected fence pointer so that if that
+		 * pointer still matches the current fence, we know we
+		 * have successfully acquire a reference to it. If it no
+		 * longer matches, we are holding a reference to some other
+		 * reallocated pointer. This is possible if the allocator
+		 * is using a freelist like SLAB_DESTROY_BY_RCU where the
+		 * fence remains valid for the RCU grace period, but it
+		 * may be reallocated. When using such allocators, we are
+		 * responsible for ensuring the reference we get is to
+		 * the right fence, as below.
+		 */
+		if (fence == rcu_access_pointer(*fencep))
+			return rcu_pointer_handoff(fence);
+
+		fence_put(fence);
+	} while (1);
 }
 
 int fence_signal(struct fence *fence);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, intel-gfx, linaro-mm-sig, Sumit Semwal, linux-media

This variant of fence_get_rcu() takes an RCU protected pointer to a
fence and carefully returns a reference to the fence ensuring that it is
not reallocated as it does. This is required when mixing fences and
SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d763053f97a..c9c5ba98c302 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
 void fence_free(struct fence *fence);
 
 /**
+ * fence_put - decreases refcount of the fence
+ * @fence:	[in]	fence to reduce refcount of
+ */
+static inline void fence_put(struct fence *fence)
+{
+	if (fence)
+		kref_put(&fence->refcount, fence_release);
+}
+
+/**
  * fence_get - increases refcount of the fence
  * @fence:	[in]	fence to increase refcount of
  *
@@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
 }
 
 /**
- * fence_put - decreases refcount of the fence
- * @fence:	[in]	fence to reduce refcount of
+ * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
+ * @fence:	[in]	pointer to fence to increase refcount of
+ *
+ * Function returns NULL if no refcount could be obtained, or the fence.
+ * This function handles acquiring a reference to a fence that may be
+ * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
+ * so long as the caller is using RCU on the pointer to the fence.
+ *
+ * An alternative mechanism is to employ a seqlock to protect a bunch of
+ * fences, such as used by struct reservation_object. When using a seqlock,
+ * the seqlock must be taken before and checked after a reference to the
+ * fence is acquired (as shown here).
+ *
+ * The caller is required to hold the RCU read lock.
  */
-static inline void fence_put(struct fence *fence)
+static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
 {
-	if (fence)
-		kref_put(&fence->refcount, fence_release);
+	do {
+		struct fence *fence;
+
+		fence = rcu_dereference(*fencep);
+		if (!fence || !fence_get_rcu(fence))
+			return NULL;
+
+		/* The atomic_inc_not_zero() inside fence_get_rcu()
+		 * provides a full memory barrier upon success (such as now).
+		 * This is paired with the write barrier from assigning
+		 * to the __rcu protected fence pointer so that if that
+		 * pointer still matches the current fence, we know we
+		 * have successfully acquire a reference to it. If it no
+		 * longer matches, we are holding a reference to some other
+		 * reallocated pointer. This is possible if the allocator
+		 * is using a freelist like SLAB_DESTROY_BY_RCU where the
+		 * fence remains valid for the RCU grace period, but it
+		 * may be reallocated. When using such allocators, we are
+		 * responsible for ensuring the reference we get is to
+		 * the right fence, as below.
+		 */
+		if (fence == rcu_access_pointer(*fencep))
+			return rcu_pointer_handoff(fence);
+
+		fence_put(fence);
+	} while (1);
 }
 
 int fence_signal(struct fence *fence);
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 40 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 723d8af988e5..10fd441dd4ed 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
 				      unsigned *pshared_count,
 				      struct fence ***pshared)
 {
-	unsigned shared_count = 0;
-	unsigned retry = 1;
-	struct fence **shared = NULL, *fence_excl = NULL;
-	int ret = 0;
+	struct fence **shared = NULL;
+	struct fence *fence_excl;
+	unsigned shared_count;
+	int ret = 1;
 
-	while (retry) {
+	do {
 		struct reservation_object_list *fobj;
 		unsigned seq;
+		unsigned i;
 
-		seq = read_seqcount_begin(&obj->seq);
+		shared_count = i = 0;
 
 		rcu_read_lock();
+		seq = read_seqcount_begin(&obj->seq);
+
+		fence_excl = rcu_dereference(obj->fence_excl);
+		if (fence_excl && !fence_get_rcu(fence_excl))
+			goto unlock;
 
 		fobj = rcu_dereference(obj->fence);
 		if (fobj) {
@@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
 				}
 
 				ret = -ENOMEM;
-				shared_count = 0;
 				break;
 			}
 			shared = nshared;
-			memcpy(shared, fobj->shared, sz);
 			shared_count = fobj->shared_count;
-		} else
-			shared_count = 0;
-		fence_excl = rcu_dereference(obj->fence_excl);
-
-		retry = read_seqcount_retry(&obj->seq, seq);
-		if (retry)
-			goto unlock;
-
-		if (!fence_excl || fence_get_rcu(fence_excl)) {
-			unsigned i;
 
 			for (i = 0; i < shared_count; ++i) {
-				if (fence_get_rcu(shared[i]))
-					continue;
-
-				/* uh oh, refcount failed, abort and retry */
-				while (i--)
-					fence_put(shared[i]);
-
-				if (fence_excl) {
-					fence_put(fence_excl);
-					fence_excl = NULL;
-				}
-
-				retry = 1;
-				break;
+				shared[i] = rcu_dereference(fobj->shared[i]);
+				if (!fence_get_rcu(shared[i]))
+					break;
 			}
-		} else
-			retry = 1;
+		}
+
+		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
+			while (i--)
+				fence_put(shared[i]);
+			fence_put(fence_excl);
+			goto unlock;
+		}
 
+		ret = 0;
 unlock:
 		rcu_read_unlock();
-	}
-	*pshared_count = shared_count;
-	if (shared_count)
-		*pshared = shared;
-	else {
-		*pshared = NULL;
+	} while (ret);
+
+	if (!shared_count) {
 		kfree(shared);
+		shared = NULL;
 	}
+
+	*pshared_count = shared_count;
+	*pshared = shared;
 	*pfence_excl = fence_excl;
 
 	return ret;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, intel-gfx, linaro-mm-sig, Alex Deucher,
	Christian König, linux-media

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
 1 file changed, 31 insertions(+), 40 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 723d8af988e5..10fd441dd4ed 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
 				      unsigned *pshared_count,
 				      struct fence ***pshared)
 {
-	unsigned shared_count = 0;
-	unsigned retry = 1;
-	struct fence **shared = NULL, *fence_excl = NULL;
-	int ret = 0;
+	struct fence **shared = NULL;
+	struct fence *fence_excl;
+	unsigned shared_count;
+	int ret = 1;
 
-	while (retry) {
+	do {
 		struct reservation_object_list *fobj;
 		unsigned seq;
+		unsigned i;
 
-		seq = read_seqcount_begin(&obj->seq);
+		shared_count = i = 0;
 
 		rcu_read_lock();
+		seq = read_seqcount_begin(&obj->seq);
+
+		fence_excl = rcu_dereference(obj->fence_excl);
+		if (fence_excl && !fence_get_rcu(fence_excl))
+			goto unlock;
 
 		fobj = rcu_dereference(obj->fence);
 		if (fobj) {
@@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
 				}
 
 				ret = -ENOMEM;
-				shared_count = 0;
 				break;
 			}
 			shared = nshared;
-			memcpy(shared, fobj->shared, sz);
 			shared_count = fobj->shared_count;
-		} else
-			shared_count = 0;
-		fence_excl = rcu_dereference(obj->fence_excl);
-
-		retry = read_seqcount_retry(&obj->seq, seq);
-		if (retry)
-			goto unlock;
-
-		if (!fence_excl || fence_get_rcu(fence_excl)) {
-			unsigned i;
 
 			for (i = 0; i < shared_count; ++i) {
-				if (fence_get_rcu(shared[i]))
-					continue;
-
-				/* uh oh, refcount failed, abort and retry */
-				while (i--)
-					fence_put(shared[i]);
-
-				if (fence_excl) {
-					fence_put(fence_excl);
-					fence_excl = NULL;
-				}
-
-				retry = 1;
-				break;
+				shared[i] = rcu_dereference(fobj->shared[i]);
+				if (!fence_get_rcu(shared[i]))
+					break;
 			}
-		} else
-			retry = 1;
+		}
+
+		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
+			while (i--)
+				fence_put(shared[i]);
+			fence_put(fence_excl);
+			goto unlock;
+		}
 
+		ret = 0;
 unlock:
 		rcu_read_unlock();
-	}
-	*pshared_count = shared_count;
-	if (shared_count)
-		*pshared = shared;
-	else {
-		*pshared = NULL;
+	} while (ret);
+
+	if (!shared_count) {
 		kfree(shared);
+		shared = NULL;
 	}
+
+	*pshared_count = shared_count;
+	*pshared = shared;
 	*pfence_excl = fence_excl;
 
 	return ret;
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 10fd441dd4ed..3369e4668e96 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -388,9 +388,6 @@ retry:
 		if (fobj)
 			shared_count = fobj->shared_count;
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *lfence = rcu_dereference(fobj->shared[i]);
 
@@ -413,9 +410,6 @@ retry:
 	if (!shared_count) {
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		if (fence_excl &&
 		    !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
 			if (!fence_get_rcu(fence_excl))
@@ -430,6 +424,11 @@ retry:
 
 	rcu_read_unlock();
 	if (fence) {
+		if (read_seqcount_retry(&obj->seq, seq)) {
+			fence_put(fence);
+			goto retry;
+		}
+
 		ret = fence_wait_timeout(fence, intr, ret);
 		fence_put(fence);
 		if (ret > 0 && wait_all && (i + 1 < shared_count))
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, intel-gfx, linaro-mm-sig, Alex Deucher,
	Christian König, linux-media

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 10fd441dd4ed..3369e4668e96 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -388,9 +388,6 @@ retry:
 		if (fobj)
 			shared_count = fobj->shared_count;
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *lfence = rcu_dereference(fobj->shared[i]);
 
@@ -413,9 +410,6 @@ retry:
 	if (!shared_count) {
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		if (fence_excl &&
 		    !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
 			if (!fence_get_rcu(fence_excl))
@@ -430,6 +424,11 @@ retry:
 
 	rcu_read_unlock();
 	if (fence) {
+		if (read_seqcount_retry(&obj->seq, seq)) {
+			fence_put(fence);
+			goto retry;
+		}
+
 		ret = fence_wait_timeout(fence, intr, ret);
 		fence_put(fence);
 		if (ret > 0 && wait_all && (i + 1 < shared_count))
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 3369e4668e96..e74493e7332b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
 					  bool test_all)
 {
 	unsigned seq, shared_count;
-	int ret = true;
+	int ret;
 
+	rcu_read_lock();
 retry:
+	ret = true;
 	shared_count = 0;
 	seq = read_seqcount_begin(&obj->seq);
-	rcu_read_lock();
 
 	if (test_all) {
 		unsigned i;
@@ -490,46 +491,35 @@ retry:
 		if (fobj)
 			shared_count = fobj->shared_count;
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *fence = rcu_dereference(fobj->shared[i]);
 
 			ret = reservation_object_test_signaled_single(fence);
 			if (ret < 0)
-				goto unlock_retry;
+				goto retry;
 			else if (!ret)
 				break;
 		}
 
-		/*
-		 * There could be a read_seqcount_retry here, but nothing cares
-		 * about whether it's the old or newer fence pointers that are
-		 * signaled. That race could still have happened after checking
-		 * read_seqcount_retry. If you care, use ww_mutex_lock.
-		 */
+		if (read_seqcount_retry(&obj->seq, seq))
+			goto retry;
 	}
 
 	if (!shared_count) {
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		if (fence_excl) {
 			ret = reservation_object_test_signaled_single(
 								fence_excl);
 			if (ret < 0)
-				goto unlock_retry;
+				goto retry;
+
+			if (read_seqcount_retry(&obj->seq, seq))
+				goto retry;
 		}
 	}
 
 	rcu_read_unlock();
 	return ret;
-
-unlock_retry:
-	rcu_read_unlock();
-	goto retry;
 }
 EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, intel-gfx, linaro-mm-sig, Alex Deucher,
	Christian König, linux-media

In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 3369e4668e96..e74493e7332b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
 					  bool test_all)
 {
 	unsigned seq, shared_count;
-	int ret = true;
+	int ret;
 
+	rcu_read_lock();
 retry:
+	ret = true;
 	shared_count = 0;
 	seq = read_seqcount_begin(&obj->seq);
-	rcu_read_lock();
 
 	if (test_all) {
 		unsigned i;
@@ -490,46 +491,35 @@ retry:
 		if (fobj)
 			shared_count = fobj->shared_count;
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *fence = rcu_dereference(fobj->shared[i]);
 
 			ret = reservation_object_test_signaled_single(fence);
 			if (ret < 0)
-				goto unlock_retry;
+				goto retry;
 			else if (!ret)
 				break;
 		}
 
-		/*
-		 * There could be a read_seqcount_retry here, but nothing cares
-		 * about whether it's the old or newer fence pointers that are
-		 * signaled. That race could still have happened after checking
-		 * read_seqcount_retry. If you care, use ww_mutex_lock.
-		 */
+		if (read_seqcount_retry(&obj->seq, seq))
+			goto retry;
 	}
 
 	if (!shared_count) {
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
-		if (read_seqcount_retry(&obj->seq, seq))
-			goto unlock_retry;
-
 		if (fence_excl) {
 			ret = reservation_object_test_signaled_single(
 								fence_excl);
 			if (ret < 0)
-				goto unlock_retry;
+				goto retry;
+
+			if (read_seqcount_retry(&obj->seq, seq))
+				goto retry;
 		}
 	}
 
 	rcu_read_unlock();
 	return ret;
-
-unlock_retry:
-	rcu_read_unlock();
-	goto retry;
 }
 EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Sumit Semwal, linux-media, linaro-mm-sig

With the seqlock now extended to cover the lookup of the fence and its
testing, we can perform that testing solely under the seqlock guard and
avoid the effective locking and serialisation of acquiring a reference to
the request.  As the fence is RCU protected we know it cannot disappear
as we test it, the same guarantee that made it safe to acquire the
reference previously.  The seqlock tests whether the fence was replaced
as we are testing it telling us whether or not we can trust the result
(if not, we just repeat the test until stable).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 32 ++++----------------------------
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index e74493e7332b..1ddffa5adb5a 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -442,24 +442,6 @@ unlock_retry:
 }
 EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
 
-
-static inline int
-reservation_object_test_signaled_single(struct fence *passed_fence)
-{
-	struct fence *fence, *lfence = passed_fence;
-	int ret = 1;
-
-	if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
-		fence = fence_get_rcu(lfence);
-		if (!fence)
-			return -1;
-
-		ret = !!fence_is_signaled(fence);
-		fence_put(fence);
-	}
-	return ret;
-}
-
 /**
  * reservation_object_test_signaled_rcu - Test if a reservation object's
  * fences have been signaled.
@@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
 					  bool test_all)
 {
 	unsigned seq, shared_count;
-	int ret;
+	bool ret;
 
 	rcu_read_lock();
 retry:
@@ -494,10 +476,8 @@ retry:
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *fence = rcu_dereference(fobj->shared[i]);
 
-			ret = reservation_object_test_signaled_single(fence);
-			if (ret < 0)
-				goto retry;
-			else if (!ret)
+			ret = fence_is_signaled(fence);
+			if (!ret)
 				break;
 		}
 
@@ -509,11 +489,7 @@ retry:
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
 		if (fence_excl) {
-			ret = reservation_object_test_signaled_single(
-								fence_excl);
-			if (ret < 0)
-				goto retry;
-
+			ret = fence_is_signaled(fence_excl);
 			if (read_seqcount_retry(&obj->seq, seq))
 				goto retry;
 		}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal

With the seqlock now extended to cover the lookup of the fence and its
testing, we can perform that testing solely under the seqlock guard and
avoid the effective locking and serialisation of acquiring a reference to
the request.  As the fence is RCU protected we know it cannot disappear
as we test it, the same guarantee that made it safe to acquire the
reference previously.  The seqlock tests whether the fence was replaced
as we are testing it telling us whether or not we can trust the result
(if not, we just repeat the test until stable).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/reservation.c | 32 ++++----------------------------
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index e74493e7332b..1ddffa5adb5a 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -442,24 +442,6 @@ unlock_retry:
 }
 EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
 
-
-static inline int
-reservation_object_test_signaled_single(struct fence *passed_fence)
-{
-	struct fence *fence, *lfence = passed_fence;
-	int ret = 1;
-
-	if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
-		fence = fence_get_rcu(lfence);
-		if (!fence)
-			return -1;
-
-		ret = !!fence_is_signaled(fence);
-		fence_put(fence);
-	}
-	return ret;
-}
-
 /**
  * reservation_object_test_signaled_rcu - Test if a reservation object's
  * fences have been signaled.
@@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
 					  bool test_all)
 {
 	unsigned seq, shared_count;
-	int ret;
+	bool ret;
 
 	rcu_read_lock();
 retry:
@@ -494,10 +476,8 @@ retry:
 		for (i = 0; i < shared_count; ++i) {
 			struct fence *fence = rcu_dereference(fobj->shared[i]);
 
-			ret = reservation_object_test_signaled_single(fence);
-			if (ret < 0)
-				goto retry;
-			else if (!ret)
+			ret = fence_is_signaled(fence);
+			if (!ret)
 				break;
 		}
 
@@ -509,11 +489,7 @@ retry:
 		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
 
 		if (fence_excl) {
-			ret = reservation_object_test_signaled_single(
-								fence_excl);
-			if (ret < 0)
-				goto retry;
-
+			ret = fence_is_signaled(fence_excl);
 			if (read_seqcount_retry(&obj->seq, seq))
 				goto retry;
 		}
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
@ 2016-08-29  7:08   ` Chris Wilson
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel
  Cc: intel-gfx, Chris Wilson, Sumit Semwal, linux-media, linaro-mm-sig

Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.

We can query whether the poll will block prior to installing the
callback to make the busy-query fast.

Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster

Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-buf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
 	if (!events)
 		return 0;
 
+	if (poll_does_not_wait(poll)) {
+		if (events & POLLOUT &&
+		    !reservation_object_test_signaled_rcu(resv, true))
+			events &= ~(POLLOUT | POLLIN);
+
+		if (events & POLLIN &&
+		    !reservation_object_test_signaled_rcu(resv, false))
+			events &= ~POLLIN;
+
+		return events;
+	}
+
 retry:
 	seq = read_seqcount_begin(&resv->seq);
 	rcu_read_lock();
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-08-29  7:08   ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29  7:08 UTC (permalink / raw)
  To: dri-devel; +Cc: linaro-mm-sig, intel-gfx, linux-media

Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.

We can query whether the poll will block prior to installing the
callback to make the busy-query fast.

Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster

Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/dma-buf/dma-buf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
 	if (!events)
 		return 0;
 
+	if (poll_does_not_wait(poll)) {
+		if (events & POLLOUT &&
+		    !reservation_object_test_signaled_rcu(resv, true))
+			events &= ~(POLLOUT | POLLIN);
+
+		if (events & POLLIN &&
+		    !reservation_object_test_signaled_rcu(resv, false))
+			events &= ~POLLIN;
+
+		return events;
+	}
+
 retry:
 	seq = read_seqcount_begin(&resv->seq);
 	rcu_read_lock();
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
                   ` (9 preceding siblings ...)
  2016-08-29  7:08   ` Chris Wilson
@ 2016-08-29  7:50 ` Patchwork
  2016-08-29  8:20 ` [PATCH 01/11] " Christian König
  2016-09-23 12:54 ` Daniel Vetter
  12 siblings, 0 replies; 63+ messages in thread
From: Patchwork @ 2016-08-29  7:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
URL   : https://patchwork.freedesktop.org/series/11705/
State : warning

== Summary ==

Series 11705v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/11705/revisions/1/mbox/

Test drv_module_reload_basic:
                pass       -> SKIP       (fi-skl-6260u)
Test kms_cursor_legacy:
        Subgroup basic-cursor-vs-flip-varying-size:
                fail       -> PASS       (fi-bsw-n3050)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-hsw-4770k)

fi-bdw-5557u     total:252  pass:235  dwarn:0   dfail:0   fail:2   skip:15 
fi-bsw-n3050     total:252  pass:205  dwarn:0   dfail:0   fail:1   skip:46 
fi-hsw-4770k     total:252  pass:228  dwarn:0   dfail:0   fail:2   skip:22 
fi-hsw-4770r     total:252  pass:224  dwarn:0   dfail:0   fail:2   skip:26 
fi-ivb-3520m     total:252  pass:220  dwarn:0   dfail:0   fail:1   skip:31 
fi-skl-6260u     total:252  pass:235  dwarn:0   dfail:0   fail:2   skip:15 
fi-skl-6700k     total:252  pass:222  dwarn:0   dfail:0   fail:2   skip:28 
fi-snb-2520m     total:252  pass:207  dwarn:0   dfail:0   fail:2   skip:43 
fi-snb-2600      total:252  pass:207  dwarn:0   dfail:0   fail:2   skip:43 

Results at /archive/results/CI_IGT_test/Patchwork_2444/

a72d2489965e2ae2d83cbdd784722396138378fc drm-intel-nightly: 2016y-08m-29d-06h-05m-38s UTC integration manifest
dc106f2 dma-buf: Do a fast lockless check for poll with timeout=0
5a2b906 dma-buf: Use seqlock to close RCU race in test_signaled_single
80cd4d6 dma-buf: Restart reservation_object_test_signaled_rcu() after writes
75f4406 dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
244eae2 dma-buf: Restart reservation_object_get_fences_rcu() after writes
d9ce65b dma-buf: Introduce fence_get_rcu_safe()
90c14ac drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
e90bdbf drm/nouveau: Remove call to reservation_object_test_signaled_rcu before wait
96ef037 drm/msm: Remove call to reservation_object_test_signaled_rcu before wait
462131d drm/etnaviv: Remove manual call to reservation_object_test_signaled_rcu before wait
52e40e9 drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
                   ` (10 preceding siblings ...)
  2016-08-29  7:50 ` ✗ Fi.CI.BAT: warning for series starting with [01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Patchwork
@ 2016-08-29  8:20 ` Christian König
  2016-09-23 12:54 ` Daniel Vetter
  12 siblings, 0 replies; 63+ messages in thread
From: Christian König @ 2016-08-29  8:20 UTC (permalink / raw)
  To: Chris Wilson, dri-devel; +Cc: Alex Deucher, intel-gfx

Am 29.08.2016 um 09:08 schrieb Chris Wilson:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>

Mhm, actually it was one of our developers who added the shortcut into 
reservation_object_wait_timeout_rcu().

But it looks like we forgot to clean this up in the amdgpu driver after 
doing this. So thanks for taking care of this.

Patch #1 as well as patch #6 are Reviewed-by: Christian König 
<christian.koenig@amd.com>.

For patch #6 there is also an use case for this in TTM 
ttm_bo_add_move_fence().

We could change the lock into an RCU if we can make sure that we get an 
up to date RCU protected fence and not NULL if the fence becomes freed 
right at the wrong time.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 88fbed2389c0..a3e6f883ac2c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -407,10 +407,8 @@ int amdgpu_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
>   		return -ENOENT;
>   	}
>   	robj = gem_to_amdgpu_bo(gobj);
> -	if (timeout == 0)
> -		ret = reservation_object_test_signaled_rcu(robj->tbo.resv, true);
> -	else
> -		ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, timeout);
> +	ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true,
> +						  timeout);
>   
>   	/* ret == 0 means not signaled,
>   	 * ret > 0 means signaled


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-08-29  7:08   ` Chris Wilson
@ 2016-08-29 18:16     ` Chris Wilson
  -1 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29 18:16 UTC (permalink / raw)
  To: dri-devel
  Cc: Chris Wilson, Sumit Semwal, Gustavo Padovan, linux-media, linaro-mm-sig

If we being polled with a timeout of zero, a nonblocking busy query,
we don't need to install any fence callbacks as we will not be waiting.
As we only install the callback once, the overhead comes from the atomic
bit test that also causes serialisation between threads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/sync_file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 486d29c1a830..abb5fdab75fd 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &sync_file->wq, wait);
 
-	if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
+	if (!poll_does_not_wait(wait) &&
+	    !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
 		if (fence_add_callback(sync_file->fence, &sync_file->cb,
 				       fence_check_cb_func) < 0)
 			wake_up_all(&sync_file->wq);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
@ 2016-08-29 18:16     ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-08-29 18:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linaro-mm-sig, linux-media

If we being polled with a timeout of zero, a nonblocking busy query,
we don't need to install any fence callbacks as we will not be waiting.
As we only install the callback once, the overhead comes from the atomic
bit test that also causes serialisation between threads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/sync_file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 486d29c1a830..abb5fdab75fd 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &sync_file->wq, wait);
 
-	if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
+	if (!poll_does_not_wait(wait) &&
+	    !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
 		if (fence_add_callback(sync_file->fence, &sync_file->cb,
 				       fence_check_cb_func) < 0)
 			wake_up_all(&sync_file->wq);
-- 
2.9.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-08-29 18:16     ` Chris Wilson
  (?)
@ 2016-08-29 18:26     ` Gustavo Padovan
  2016-09-13 14:46         ` Sumit Semwal
  -1 siblings, 1 reply; 63+ messages in thread
From: Gustavo Padovan @ 2016-08-29 18:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: dri-devel, Sumit Semwal, linux-media, linaro-mm-sig

Hi Chris,

2016-08-29 Chris Wilson <chris@chris-wilson.co.uk>:

> If we being polled with a timeout of zero, a nonblocking busy query,
> we don't need to install any fence callbacks as we will not be waiting.
> As we only install the callback once, the overhead comes from the atomic
> bit test that also causes serialisation between threads.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Gustavo Padovan <gustavo@padovan.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/sync_file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Indeed, we can shortcut this.

Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

Gustavo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-08-29 18:26     ` Gustavo Padovan
@ 2016-09-13 14:46         ` Sumit Semwal
  0 siblings, 0 replies; 63+ messages in thread
From: Sumit Semwal @ 2016-09-13 14:46 UTC (permalink / raw)
  To: Gustavo Padovan, Chris Wilson, DRI mailing list, Sumit Semwal,
	linux-media, Linaro MM SIG Mailman List

Hi Chris,

On 29 August 2016 at 23:56, Gustavo Padovan <gustavo@padovan.org> wrote:
> Hi Chris,
>
> 2016-08-29 Chris Wilson <chris@chris-wilson.co.uk>:
>
>> If we being polled with a timeout of zero, a nonblocking busy query,
>> we don't need to install any fence callbacks as we will not be waiting.
>> As we only install the callback once, the overhead comes from the atomic
>> bit test that also causes serialisation between threads.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>> Cc: Gustavo Padovan <gustavo@padovan.org>
>> Cc: linux-media@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: linaro-mm-sig@lists.linaro.org
>> ---
>>  drivers/dma-buf/sync_file.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Indeed, we can shortcut this.
>
> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>
> Gustavo

Thanks; pushed to drm-misc.

Best,
Sumit.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
@ 2016-09-13 14:46         ` Sumit Semwal
  0 siblings, 0 replies; 63+ messages in thread
From: Sumit Semwal @ 2016-09-13 14:46 UTC (permalink / raw)
  To: Gustavo Padovan, Chris Wilson, DRI mailing list, Sumit Semwal,
	linux-media, Linaro MM SIG Mailman List

Hi Chris,

On 29 August 2016 at 23:56, Gustavo Padovan <gustavo@padovan.org> wrote:
> Hi Chris,
>
> 2016-08-29 Chris Wilson <chris@chris-wilson.co.uk>:
>
>> If we being polled with a timeout of zero, a nonblocking busy query,
>> we don't need to install any fence callbacks as we will not be waiting.
>> As we only install the callback once, the overhead comes from the atomic
>> bit test that also causes serialisation between threads.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>> Cc: Gustavo Padovan <gustavo@padovan.org>
>> Cc: linux-media@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: linaro-mm-sig@lists.linaro.org
>> ---
>>  drivers/dma-buf/sync_file.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Indeed, we can shortcut this.
>
> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
>
> Gustavo

Thanks; pushed to drm-misc.

Best,
Sumit.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-08-29 18:16     ` Chris Wilson
  (?)
  (?)
@ 2016-09-15  0:00     ` Rafael Antognolli
  2016-09-21  7:26       ` Gustavo Padovan
  -1 siblings, 1 reply; 63+ messages in thread
From: Rafael Antognolli @ 2016-09-15  0:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: dri-devel

Hi Chris and Gustavo,

On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> If we being polled with a timeout of zero, a nonblocking busy query,
> we don't need to install any fence callbacks as we will not be waiting.
> As we only install the callback once, the overhead comes from the atomic
> bit test that also causes serialisation between threads.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Gustavo Padovan <gustavo@padovan.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/sync_file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index 486d29c1a830..abb5fdab75fd 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
>  
>  	poll_wait(file, &sync_file->wq, wait);
>  
> -	if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
> +	if (!poll_does_not_wait(wait) &&
> +	    !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
>  		if (fence_add_callback(sync_file->fence, &sync_file->cb,
>  				       fence_check_cb_func) < 0)
>  			wake_up_all(&sync_file->wq);

This commit is causing an error on one of the tests that Robert Foss
submitted for i-g-t. The one that does random merge of fences from
different timelines. A simple version of the test that still triggers
this is:

static void test_sync_simple_merge(void)
{
        int fence1, fence2, fence_merge, timeline1, timeline2;
        int ret;

        timeline1 = sw_sync_timeline_create();
        timeline2 = sw_sync_timeline_create();
        fence1 = sw_sync_fence_create(timeline1, 1);
        fence2 = sw_sync_fence_create(timeline2, 2);
        fence_merge = sw_sync_merge(fence1, fence2);
        sw_sync_timeline_inc(timeline1, 5);
        sw_sync_timeline_inc(timeline2, 5);

        ret = sw_sync_wait(fence_merge, 0);
        igt_assert_f(ret > 0, "Failure triggering fence\n");

        sw_sync_fence_destroy(fence_merge);
        sw_sync_fence_destroy(fence1);
        sw_sync_fence_destroy(fence2);
        sw_sync_timeline_destroy(timeline1);
        sw_sync_timeline_destroy(timeline2);
}

It looks like you cannot trust fence_is_signaled() without a
fence_add_callback(). I think the fence_array->num_pending won't get
updated. Although I couldn't figure out why it only happens if you merge
fences from different timelines.

Regards,
Rafael
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-09-15  0:00     ` Rafael Antognolli
@ 2016-09-21  7:26       ` Gustavo Padovan
  2016-09-21 11:08         ` Chris Wilson
  0 siblings, 1 reply; 63+ messages in thread
From: Gustavo Padovan @ 2016-09-21  7:26 UTC (permalink / raw)
  To: Rafael Antognolli; +Cc: dri-devel

Hi Rafael,

2016-09-14 Rafael Antognolli <rafael.antognolli@intel.com>:

> Hi Chris and Gustavo,
> 
> On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> > If we being polled with a timeout of zero, a nonblocking busy query,
> > we don't need to install any fence callbacks as we will not be waiting.
> > As we only install the callback once, the overhead comes from the atomic
> > bit test that also causes serialisation between threads.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Gustavo Padovan <gustavo@padovan.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > ---
> >  drivers/dma-buf/sync_file.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index 486d29c1a830..abb5fdab75fd 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
> >  
> >  	poll_wait(file, &sync_file->wq, wait);
> >  
> > -	if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
> > +	if (!poll_does_not_wait(wait) &&
> > +	    !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
> >  		if (fence_add_callback(sync_file->fence, &sync_file->cb,
> >  				       fence_check_cb_func) < 0)
> >  			wake_up_all(&sync_file->wq);
> 
> This commit is causing an error on one of the tests that Robert Foss
> submitted for i-g-t. The one that does random merge of fences from
> different timelines. A simple version of the test that still triggers
> this is:
> 
> static void test_sync_simple_merge(void)
> {
>         int fence1, fence2, fence_merge, timeline1, timeline2;
>         int ret;
> 
>         timeline1 = sw_sync_timeline_create();
>         timeline2 = sw_sync_timeline_create();
>         fence1 = sw_sync_fence_create(timeline1, 1);
>         fence2 = sw_sync_fence_create(timeline2, 2);
>         fence_merge = sw_sync_merge(fence1, fence2);
>         sw_sync_timeline_inc(timeline1, 5);
>         sw_sync_timeline_inc(timeline2, 5);
> 
>         ret = sw_sync_wait(fence_merge, 0);
>         igt_assert_f(ret > 0, "Failure triggering fence\n");
> 
>         sw_sync_fence_destroy(fence_merge);
>         sw_sync_fence_destroy(fence1);
>         sw_sync_fence_destroy(fence2);
>         sw_sync_timeline_destroy(timeline1);
>         sw_sync_timeline_destroy(timeline2);
> }
> 
> It looks like you cannot trust fence_is_signaled() without a
> fence_add_callback(). I think the fence_array->num_pending won't get
> updated. Although I couldn't figure out why it only happens if you merge
> fences from different timelines.

Yes, num_pending is only updated when signaling is enabled. It only
happens with different timelines because when you merge fences that are
on the same timeline your final sync_file has only one fence and thus 
a fence_array is not created.

If we want to keep the poll_does_not_wait optimization we need a way
to count the pending fences during fence_is_signaled(). I'd propose
something like this:


Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date:   Tue Sep 20 16:43:06 2016 +0200

    dma-buf/fence-array: get signaled state when signaling is disabled
    
    If the fences in the fence_array signal on the fence_array does not have
    signalling enabled num_pending will not be updated accordingly.
    
    So when signaling is disabled check the signal of every fence with
    fence_is_signaled() and then compare with num_pending to learn if the
    fence_array was signalled or not.
    
    If we want to keep the poll_does_not_wait optimization I think we need
    something like this. It keeps the same behaviour if signalling is enabled
    but tries to calculated the state otherwise.
    
    Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c
index f1989fc..34c9209 100644
--- a/drivers/dma-buf/fence-array.c
+++ b/drivers/dma-buf/fence-array.c
@@ -75,8 +75,18 @@ static bool fence_array_enable_signaling(struct fence *fence)
 static bool fence_array_signaled(struct fence *fence)
 {
        struct fence_array *array = to_fence_array(fence);
+       int i, num_pending;
 
-       return atomic_read(&array->num_pending) <= 0;
+       num_pending = atomic_read(&array->num_pending);
+
+       if (!test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
+               for (i = 0 ; i < array->num_fences; ++i) {
+                       if (fence_is_signaled(array->fences[i]))
+                               num_pending--;
+               }
+       }
+
+       return num_pending <= 0;
 }
 
 static void fence_array_release(struct fence *fence)


Gustavo
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)
  2016-09-21  7:26       ` Gustavo Padovan
@ 2016-09-21 11:08         ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-09-21 11:08 UTC (permalink / raw)
  To: Gustavo Padovan, Rafael Antognolli, dri-devel

On Wed, Sep 21, 2016 at 10:26:25AM +0300, Gustavo Padovan wrote:
> Hi Rafael,
> 
> 2016-09-14 Rafael Antognolli <rafael.antognolli@intel.com>:
> 
> > Hi Chris and Gustavo,
> > 
> > On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> > > If we being polled with a timeout of zero, a nonblocking busy query,
> > > we don't need to install any fence callbacks as we will not be waiting.
> > > As we only install the callback once, the overhead comes from the atomic
> > > bit test that also causes serialisation between threads.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: Gustavo Padovan <gustavo@padovan.org>
> > > Cc: linux-media@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > ---
> > >  drivers/dma-buf/sync_file.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index 486d29c1a830..abb5fdab75fd 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, poll_table *wait)
> > >  
> > >  	poll_wait(file, &sync_file->wq, wait);
> > >  
> > > -	if (!test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
> > > +	if (!poll_does_not_wait(wait) &&
> > > +	    !test_and_set_bit(POLL_ENABLED, &sync_file->fence->flags)) {
> > >  		if (fence_add_callback(sync_file->fence, &sync_file->cb,
> > >  				       fence_check_cb_func) < 0)
> > >  			wake_up_all(&sync_file->wq);
> > 
> > This commit is causing an error on one of the tests that Robert Foss
> > submitted for i-g-t. The one that does random merge of fences from
> > different timelines. A simple version of the test that still triggers
> > this is:
> > 
> > static void test_sync_simple_merge(void)
> > {
> >         int fence1, fence2, fence_merge, timeline1, timeline2;
> >         int ret;
> > 
> >         timeline1 = sw_sync_timeline_create();
> >         timeline2 = sw_sync_timeline_create();
> >         fence1 = sw_sync_fence_create(timeline1, 1);
> >         fence2 = sw_sync_fence_create(timeline2, 2);
> >         fence_merge = sw_sync_merge(fence1, fence2);
> >         sw_sync_timeline_inc(timeline1, 5);
> >         sw_sync_timeline_inc(timeline2, 5);
> > 
> >         ret = sw_sync_wait(fence_merge, 0);
> >         igt_assert_f(ret > 0, "Failure triggering fence\n");
> > 
> >         sw_sync_fence_destroy(fence_merge);
> >         sw_sync_fence_destroy(fence1);
> >         sw_sync_fence_destroy(fence2);
> >         sw_sync_timeline_destroy(timeline1);
> >         sw_sync_timeline_destroy(timeline2);
> > }
> > 
> > It looks like you cannot trust fence_is_signaled() without a
> > fence_add_callback(). I think the fence_array->num_pending won't get
> > updated. Although I couldn't figure out why it only happens if you merge
> > fences from different timelines.
> 
> Yes, num_pending is only updated when signaling is enabled. It only
> happens with different timelines because when you merge fences that are
> on the same timeline your final sync_file has only one fence and thus 
> a fence_array is not created.
> 
> If we want to keep the poll_does_not_wait optimization we need a way
> to count the pending fences during fence_is_signaled(). I'd propose
> something like this:
> 
> 
> Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
> Date:   Tue Sep 20 16:43:06 2016 +0200
> 
>     dma-buf/fence-array: get signaled state when signaling is disabled
>     
>     If the fences in the fence_array signal on the fence_array does not have
>     signalling enabled num_pending will not be updated accordingly.
>     
>     So when signaling is disabled check the signal of every fence with
>     fence_is_signaled() and then compare with num_pending to learn if the
>     fence_array was signalled or not.
>     
>     If we want to keep the poll_does_not_wait optimization I think we need
>     something like this. It keeps the same behaviour if signalling is enabled
>     but tries to calculated the state otherwise.
>     
>     Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

We need this regardless, so yay for uncovering a bug!
> 
> diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c
> index f1989fc..34c9209 100644
> --- a/drivers/dma-buf/fence-array.c
> +++ b/drivers/dma-buf/fence-array.c
> @@ -75,8 +75,18 @@ static bool fence_array_enable_signaling(struct fence *fence)
>  static bool fence_array_signaled(struct fence *fence)
>  {
>         struct fence_array *array = to_fence_array(fence);
> +       int i, num_pending;
>  
> -       return atomic_read(&array->num_pending) <= 0;
> +       num_pending = atomic_read(&array->num_pending);
> +
> +       if (!test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {

Oh, very sneaky. I thought this was FENCE_FLAG_SIGNALED_BIT!

Throw in a comment like:

/* Before signaling is enabled, num_pending is static (set during array
 * construction as a count of *all* fences. To ensure forward progress,
 * i.e. a while (!fence_is_signaled()) ; busy-loop eventually proceeds,
 * we need to check the current status of our fences.
 */

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
                   ` (11 preceding siblings ...)
  2016-08-29  8:20 ` [PATCH 01/11] " Christian König
@ 2016-09-23 12:54 ` Daniel Vetter
  2016-10-05 16:03   ` Sumit Semwal
  12 siblings, 1 reply; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Alex Deucher, intel-gfx, Christian König, dri-devel

On Mon, Aug 29, 2016 at 08:08:24AM +0100, Chris Wilson wrote:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 88fbed2389c0..a3e6f883ac2c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -407,10 +407,8 @@ int amdgpu_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
>  		return -ENOENT;
>  	}
>  	robj = gem_to_amdgpu_bo(gobj);
> -	if (timeout == 0)
> -		ret = reservation_object_test_signaled_rcu(robj->tbo.resv, true);
> -	else
> -		ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, timeout);
> +	ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true,
> +						  timeout);
>  
>  	/* ret == 0 means not signaled,
>  	 * ret > 0 means signaled
> -- 
> 2.9.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/11] drm/etnaviv: Remove manual call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 ` [PATCH 02/11] drm/etnaviv: Remove manual " Chris Wilson
@ 2016-09-23 12:55   ` Daniel Vetter
  2016-10-05 16:15     ` Sumit Semwal
  0 siblings, 1 reply; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, dri-devel, Russell King

On Mon, Aug 29, 2016 at 08:08:25AM +0100, Chris Wilson wrote:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index 5ce3603e6eac..9ffca2478e02 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -409,20 +409,16 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>  	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>  	struct drm_device *dev = obj->dev;
>  	bool write = !!(op & ETNA_PREP_WRITE);
> -	int ret;
> -
> -	if (op & ETNA_PREP_NOSYNC) {
> -		if (!reservation_object_test_signaled_rcu(etnaviv_obj->resv,
> -							  write))
> -			return -EBUSY;
> -	} else {
> -		unsigned long remain = etnaviv_timeout_to_jiffies(timeout);
> -
> -		ret = reservation_object_wait_timeout_rcu(etnaviv_obj->resv,
> -							  write, true, remain);
> -		if (ret <= 0)
> -			return ret == 0 ? -ETIMEDOUT : ret;
> -	}
> +	unsigned long remain =
> +		op & ETNA_PREP_NOSYNC ? 0 : etnaviv_timeout_to_jiffies(timeout);
> +	long lret;
> +
> +	lret = reservation_object_wait_timeout_rcu(etnaviv_obj->resv,
> +						   write, true, remain);
> +	if (lret < 0)
> +		return lret;
> +	else if (lret == 0)
> +		return remain == 0 ? -EBUSY : -ETIMEDOUT;
>  
>  	if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>  		if (!etnaviv_obj->sgt) {
> -- 
> 2.9.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/11] drm/msm: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
@ 2016-09-23 12:55   ` Daniel Vetter
  2016-09-23 13:07     ` [Intel-gfx] " Rob Clark
  0 siblings, 1 reply; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, dri-devel

On Mon, Aug 29, 2016 at 08:08:26AM +0100, Chris Wilson wrote:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
> 
> v2: 9 is only 0 in German.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Rob Clark <robdclark@gmail.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/msm/msm_gem.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 6cd4af443139..45796a88d353 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -584,18 +584,16 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout)
>  {
>  	struct msm_gem_object *msm_obj = to_msm_bo(obj);
>  	bool write = !!(op & MSM_PREP_WRITE);
> -
> -	if (op & MSM_PREP_NOSYNC) {
> -		if (!reservation_object_test_signaled_rcu(msm_obj->resv, write))
> -			return -EBUSY;
> -	} else {
> -		int ret;
> -
> -		ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
> -				true, timeout_to_jiffies(timeout));
> -		if (ret <= 0)
> -			return ret == 0 ? -ETIMEDOUT : ret;
> -	}
> +	unsigned long remain =
> +		op & MSG_PREP_NOSYNC ? 0 : timeout_to_jiffies(timeout);
> +	long ret;
> +
> +	ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
> +						  true,  remain);
> +	if (ret == 0)
> +		return remain == 0 ? -EBUSY : -ETIMEDOUT;
> +	else if (ret < 0)
> +		return ret;
>  
>  	/* TODO cache maintenance */
>  
> -- 
> 2.9.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/11] drm/nouveau: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 ` [PATCH 04/11] drm/nouveau: " Chris Wilson
@ 2016-09-23 12:55   ` Daniel Vetter
  2016-10-05 16:05     ` Sumit Semwal
  0 siblings, 1 reply; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Ben Skeggs, dri-devel

On Mon, Aug 29, 2016 at 08:08:27AM +0100, Chris Wilson wrote:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ben Skeggs <bskeggs@redhat.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 72e2399bce39..b90e21ff1ed8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -861,6 +861,7 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
>  	struct nouveau_bo *nvbo;
>  	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
>  	bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
> +	long lret;
>  	int ret;
>  
>  	gem = drm_gem_object_lookup(file_priv, req->handle);
> @@ -868,19 +869,15 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
>  		return -ENOENT;
>  	nvbo = nouveau_gem_object(gem);
>  
> -	if (no_wait)
> -		ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, write) ? 0 : -EBUSY;
> -	else {
> -		long lret;
> +	lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true,
> +						   no_wait ? 0 :30 * HZ);
> +	if (!lret)
> +		ret = -EBUSY;
> +	else if (lret > 0)
> +		ret = 0;
> +	else
> +		ret = lret;
>  
> -		lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true, 30 * HZ);
> -		if (!lret)
> -			ret = -EBUSY;
> -		else if (lret > 0)
> -			ret = 0;
> -		else
> -			ret = lret;
> -	}
>  	nouveau_bo_sync_for_cpu(nvbo);
>  	drm_gem_object_unreference_unlocked(gem);
>  
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/11] drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
  2016-08-29  7:08 ` [PATCH 05/11] drm/vmwgfx: " Chris Wilson
@ 2016-09-23 12:56   ` Daniel Vetter
  2016-10-05 16:11     ` [Intel-gfx] " Sumit Semwal
  0 siblings, 1 reply; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Thomas Hellstrom, dri-devel, Sinclair Yeh

On Mon, Aug 29, 2016 at 08:08:28AM +0100, Chris Wilson wrote:
> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> need to handle such conversion in the caller. The only challenge are
> those callers that wish to differentiate the error code between the
> nonblocking busy check and potentially blocking wait.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sinclair Yeh <syeh@vmware.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Reviewed-by: Sinclair Yeh <syeh@vmware.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> index 6a328d507a28..1a85fb2d4dc6 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> @@ -574,10 +574,8 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
>  		bool nonblock = !!(flags & drm_vmw_synccpu_dontblock);
>  		long lret;
>  
> -		if (nonblock)
> -			return reservation_object_test_signaled_rcu(bo->resv, true) ? 0 : -EBUSY;
> -
> -		lret = reservation_object_wait_timeout_rcu(bo->resv, true, true, MAX_SCHEDULE_TIMEOUT);
> +		lret = reservation_object_wait_timeout_rcu(bo->resv, true, true,
> +							   nonblock ? 0 : MAX_SCHEDULE_TIMEOUT);
>  		if (!lret)
>  			return -EBUSY;
>  		else if (lret < 0)
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
  2016-08-29  7:08   ` Chris Wilson
@ 2016-09-23 12:59     ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:59 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, intel-gfx, Daniel Vetter, Sumit Semwal, linux-media,
	linaro-mm-sig

On Mon, Aug 29, 2016 at 08:08:29AM +0100, Chris Wilson wrote:
> This variant of fence_get_rcu() takes an RCU protected pointer to a
> fence and carefully returns a reference to the fence ensuring that it is
> not reallocated as it does. This is required when mixing fences and
> SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 51 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index 0d763053f97a..c9c5ba98c302 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
>  void fence_free(struct fence *fence);
>  
>  /**
> + * fence_put - decreases refcount of the fence
> + * @fence:	[in]	fence to reduce refcount of
> + */
> +static inline void fence_put(struct fence *fence)
> +{
> +	if (fence)
> +		kref_put(&fence->refcount, fence_release);
> +}
> +
> +/**
>   * fence_get - increases refcount of the fence
>   * @fence:	[in]	fence to increase refcount of
>   *
> @@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
>  }
>  
>  /**
> - * fence_put - decreases refcount of the fence
> - * @fence:	[in]	fence to reduce refcount of
> + * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
> + * @fence:	[in]	pointer to fence to increase refcount of
> + *
> + * Function returns NULL if no refcount could be obtained, or the fence.
> + * This function handles acquiring a reference to a fence that may be
> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
> + * so long as the caller is using RCU on the pointer to the fence.
> + *
> + * An alternative mechanism is to employ a seqlock to protect a bunch of
> + * fences, such as used by struct reservation_object. When using a seqlock,
> + * the seqlock must be taken before and checked after a reference to the
> + * fence is acquired (as shown here).
> + *
> + * The caller is required to hold the RCU read lock.

Would be good to cross reference the various fence_get functions a bit
better in the docs. But since the docs aren't yet pulled into the rst/html
output, that doesn't matter that much. Hence as-is:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>   */
> -static inline void fence_put(struct fence *fence)
> +static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
>  {
> -	if (fence)
> -		kref_put(&fence->refcount, fence_release);
> +	do {
> +		struct fence *fence;
> +
> +		fence = rcu_dereference(*fencep);
> +		if (!fence || !fence_get_rcu(fence))
> +			return NULL;
> +
> +		/* The atomic_inc_not_zero() inside fence_get_rcu()
> +		 * provides a full memory barrier upon success (such as now).
> +		 * This is paired with the write barrier from assigning
> +		 * to the __rcu protected fence pointer so that if that
> +		 * pointer still matches the current fence, we know we
> +		 * have successfully acquire a reference to it. If it no
> +		 * longer matches, we are holding a reference to some other
> +		 * reallocated pointer. This is possible if the allocator
> +		 * is using a freelist like SLAB_DESTROY_BY_RCU where the
> +		 * fence remains valid for the RCU grace period, but it
> +		 * may be reallocated. When using such allocators, we are
> +		 * responsible for ensuring the reference we get is to
> +		 * the right fence, as below.
> +		 */
> +		if (fence == rcu_access_pointer(*fencep))
> +			return rcu_pointer_handoff(fence);
> +
> +		fence_put(fence);
> +	} while (1);
>  }
>  
>  int fence_signal(struct fence *fence);
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
@ 2016-09-23 12:59     ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 12:59 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, intel-gfx, dri-devel, linaro-mm-sig, linux-media

On Mon, Aug 29, 2016 at 08:08:29AM +0100, Chris Wilson wrote:
> This variant of fence_get_rcu() takes an RCU protected pointer to a
> fence and carefully returns a reference to the fence ensuring that it is
> not reallocated as it does. This is required when mixing fences and
> SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 51 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index 0d763053f97a..c9c5ba98c302 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
>  void fence_free(struct fence *fence);
>  
>  /**
> + * fence_put - decreases refcount of the fence
> + * @fence:	[in]	fence to reduce refcount of
> + */
> +static inline void fence_put(struct fence *fence)
> +{
> +	if (fence)
> +		kref_put(&fence->refcount, fence_release);
> +}
> +
> +/**
>   * fence_get - increases refcount of the fence
>   * @fence:	[in]	fence to increase refcount of
>   *
> @@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
>  }
>  
>  /**
> - * fence_put - decreases refcount of the fence
> - * @fence:	[in]	fence to reduce refcount of
> + * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
> + * @fence:	[in]	pointer to fence to increase refcount of
> + *
> + * Function returns NULL if no refcount could be obtained, or the fence.
> + * This function handles acquiring a reference to a fence that may be
> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
> + * so long as the caller is using RCU on the pointer to the fence.
> + *
> + * An alternative mechanism is to employ a seqlock to protect a bunch of
> + * fences, such as used by struct reservation_object. When using a seqlock,
> + * the seqlock must be taken before and checked after a reference to the
> + * fence is acquired (as shown here).
> + *
> + * The caller is required to hold the RCU read lock.

Would be good to cross reference the various fence_get functions a bit
better in the docs. But since the docs aren't yet pulled into the rst/html
output, that doesn't matter that much. Hence as-is:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>   */
> -static inline void fence_put(struct fence *fence)
> +static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
>  {
> -	if (fence)
> -		kref_put(&fence->refcount, fence_release);
> +	do {
> +		struct fence *fence;
> +
> +		fence = rcu_dereference(*fencep);
> +		if (!fence || !fence_get_rcu(fence))
> +			return NULL;
> +
> +		/* The atomic_inc_not_zero() inside fence_get_rcu()
> +		 * provides a full memory barrier upon success (such as now).
> +		 * This is paired with the write barrier from assigning
> +		 * to the __rcu protected fence pointer so that if that
> +		 * pointer still matches the current fence, we know we
> +		 * have successfully acquire a reference to it. If it no
> +		 * longer matches, we are holding a reference to some other
> +		 * reallocated pointer. This is possible if the allocator
> +		 * is using a freelist like SLAB_DESTROY_BY_RCU where the
> +		 * fence remains valid for the RCU grace period, but it
> +		 * may be reallocated. When using such allocators, we are
> +		 * responsible for ensuring the reference we get is to
> +		 * the right fence, as below.
> +		 */
> +		if (fence == rcu_access_pointer(*fencep))
> +			return rcu_pointer_handoff(fence);
> +
> +		fence_put(fence);
> +	} while (1);
>  }
>  
>  int fence_signal(struct fence *fence);
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
  2016-08-29  7:08   ` Chris Wilson
@ 2016-09-23 13:03     ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:03 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

On Mon, Aug 29, 2016 at 08:08:30AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
>  1 file changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 723d8af988e5..10fd441dd4ed 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
>  				      unsigned *pshared_count,
>  				      struct fence ***pshared)
>  {
> -	unsigned shared_count = 0;
> -	unsigned retry = 1;
> -	struct fence **shared = NULL, *fence_excl = NULL;
> -	int ret = 0;
> +	struct fence **shared = NULL;
> +	struct fence *fence_excl;
> +	unsigned shared_count;
> +	int ret = 1;

Personally I'd go with ret = -EBUSY here, but that's a bikeshed.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>  
> -	while (retry) {
> +	do {
>  		struct reservation_object_list *fobj;
>  		unsigned seq;
> +		unsigned i;
>  
> -		seq = read_seqcount_begin(&obj->seq);
> +		shared_count = i = 0;
>  
>  		rcu_read_lock();
> +		seq = read_seqcount_begin(&obj->seq);
> +
> +		fence_excl = rcu_dereference(obj->fence_excl);
> +		if (fence_excl && !fence_get_rcu(fence_excl))
> +			goto unlock;
>  
>  		fobj = rcu_dereference(obj->fence);
>  		if (fobj) {
> @@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
>  				}
>  
>  				ret = -ENOMEM;
> -				shared_count = 0;
>  				break;
>  			}
>  			shared = nshared;
> -			memcpy(shared, fobj->shared, sz);
>  			shared_count = fobj->shared_count;
> -		} else
> -			shared_count = 0;
> -		fence_excl = rcu_dereference(obj->fence_excl);
> -
> -		retry = read_seqcount_retry(&obj->seq, seq);
> -		if (retry)
> -			goto unlock;
> -
> -		if (!fence_excl || fence_get_rcu(fence_excl)) {
> -			unsigned i;
>  
>  			for (i = 0; i < shared_count; ++i) {
> -				if (fence_get_rcu(shared[i]))
> -					continue;
> -
> -				/* uh oh, refcount failed, abort and retry */
> -				while (i--)
> -					fence_put(shared[i]);
> -
> -				if (fence_excl) {
> -					fence_put(fence_excl);
> -					fence_excl = NULL;
> -				}
> -
> -				retry = 1;
> -				break;
> +				shared[i] = rcu_dereference(fobj->shared[i]);
> +				if (!fence_get_rcu(shared[i]))
> +					break;
>  			}
> -		} else
> -			retry = 1;
> +		}
> +
> +		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
> +			while (i--)
> +				fence_put(shared[i]);
> +			fence_put(fence_excl);
> +			goto unlock;
> +		}
>  
> +		ret = 0;
>  unlock:
>  		rcu_read_unlock();
> -	}
> -	*pshared_count = shared_count;
> -	if (shared_count)
> -		*pshared = shared;
> -	else {
> -		*pshared = NULL;
> +	} while (ret);
> +
> +	if (!shared_count) {
>  		kfree(shared);
> +		shared = NULL;
>  	}
> +
> +	*pshared_count = shared_count;
> +	*pshared = shared;
>  	*pfence_excl = fence_excl;
>  
>  	return ret;
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes
@ 2016-09-23 13:03     ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:03 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, intel-gfx, dri-devel, Sumit Semwal, linaro-mm-sig,
	Alex Deucher, Christian König, linux-media

On Mon, Aug 29, 2016 at 08:08:30AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/reservation.c | 71 +++++++++++++++++++------------------------
>  1 file changed, 31 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 723d8af988e5..10fd441dd4ed 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -280,18 +280,24 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
>  				      unsigned *pshared_count,
>  				      struct fence ***pshared)
>  {
> -	unsigned shared_count = 0;
> -	unsigned retry = 1;
> -	struct fence **shared = NULL, *fence_excl = NULL;
> -	int ret = 0;
> +	struct fence **shared = NULL;
> +	struct fence *fence_excl;
> +	unsigned shared_count;
> +	int ret = 1;

Personally I'd go with ret = -EBUSY here, but that's a bikeshed.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>  
> -	while (retry) {
> +	do {
>  		struct reservation_object_list *fobj;
>  		unsigned seq;
> +		unsigned i;
>  
> -		seq = read_seqcount_begin(&obj->seq);
> +		shared_count = i = 0;
>  
>  		rcu_read_lock();
> +		seq = read_seqcount_begin(&obj->seq);
> +
> +		fence_excl = rcu_dereference(obj->fence_excl);
> +		if (fence_excl && !fence_get_rcu(fence_excl))
> +			goto unlock;
>  
>  		fobj = rcu_dereference(obj->fence);
>  		if (fobj) {
> @@ -309,52 +315,37 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,
>  				}
>  
>  				ret = -ENOMEM;
> -				shared_count = 0;
>  				break;
>  			}
>  			shared = nshared;
> -			memcpy(shared, fobj->shared, sz);
>  			shared_count = fobj->shared_count;
> -		} else
> -			shared_count = 0;
> -		fence_excl = rcu_dereference(obj->fence_excl);
> -
> -		retry = read_seqcount_retry(&obj->seq, seq);
> -		if (retry)
> -			goto unlock;
> -
> -		if (!fence_excl || fence_get_rcu(fence_excl)) {
> -			unsigned i;
>  
>  			for (i = 0; i < shared_count; ++i) {
> -				if (fence_get_rcu(shared[i]))
> -					continue;
> -
> -				/* uh oh, refcount failed, abort and retry */
> -				while (i--)
> -					fence_put(shared[i]);
> -
> -				if (fence_excl) {
> -					fence_put(fence_excl);
> -					fence_excl = NULL;
> -				}
> -
> -				retry = 1;
> -				break;
> +				shared[i] = rcu_dereference(fobj->shared[i]);
> +				if (!fence_get_rcu(shared[i]))
> +					break;
>  			}
> -		} else
> -			retry = 1;
> +		}
> +
> +		if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
> +			while (i--)
> +				fence_put(shared[i]);
> +			fence_put(fence_excl);
> +			goto unlock;
> +		}
>  
> +		ret = 0;
>  unlock:
>  		rcu_read_unlock();
> -	}
> -	*pshared_count = shared_count;
> -	if (shared_count)
> -		*pshared = shared;
> -	else {
> -		*pshared = NULL;
> +	} while (ret);
> +
> +	if (!shared_count) {
>  		kfree(shared);
> +		shared = NULL;
>  	}
> +
> +	*pshared_count = shared_count;
> +	*pshared = shared;
>  	*pfence_excl = fence_excl;
>  
>  	return ret;
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 03/11] drm/msm: Remove call to reservation_object_test_signaled_rcu before wait
  2016-09-23 12:55   ` Daniel Vetter
@ 2016-09-23 13:07     ` Rob Clark
  0 siblings, 0 replies; 63+ messages in thread
From: Rob Clark @ 2016-09-23 13:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, dri-devel

On Fri, Sep 23, 2016 at 8:55 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Aug 29, 2016 at 08:08:26AM +0100, Chris Wilson wrote:
>> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
>> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
>> need to handle such conversion in the caller. The only challenge are
>> those callers that wish to differentiate the error code between the
>> nonblocking busy check and potentially blocking wait.
>>
>> v2: 9 is only 0 in German.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Rob Clark <robdclark@gmail.com>
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

fyi, this is in my msm-next pull-req sent last week..  (with one small
s/MSG_PREP_NOSYNC/MSM_PREP_NOSYNC/ fixup ;-))

BR,
-R


>> ---
>>  drivers/gpu/drm/msm/msm_gem.c | 22 ++++++++++------------
>>  1 file changed, 10 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
>> index 6cd4af443139..45796a88d353 100644
>> --- a/drivers/gpu/drm/msm/msm_gem.c
>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>> @@ -584,18 +584,16 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout)
>>  {
>>       struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>       bool write = !!(op & MSM_PREP_WRITE);
>> -
>> -     if (op & MSM_PREP_NOSYNC) {
>> -             if (!reservation_object_test_signaled_rcu(msm_obj->resv, write))
>> -                     return -EBUSY;
>> -     } else {
>> -             int ret;
>> -
>> -             ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
>> -                             true, timeout_to_jiffies(timeout));
>> -             if (ret <= 0)
>> -                     return ret == 0 ? -ETIMEDOUT : ret;
>> -     }
>> +     unsigned long remain =
>> +             op & MSG_PREP_NOSYNC ? 0 : timeout_to_jiffies(timeout);
>> +     long ret;
>> +
>> +     ret = reservation_object_wait_timeout_rcu(msm_obj->resv, write,
>> +                                               true,  remain);
>> +     if (ret == 0)
>> +             return remain == 0 ? -EBUSY : -ETIMEDOUT;
>> +     else if (ret < 0)
>> +             return ret;
>>
>>       /* TODO cache maintenance */
>>
>> --
>> 2.9.3
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() after writes
  2016-08-29  7:08   ` Chris Wilson
  (?)
@ 2016-09-23 13:18   ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:18 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

On Mon, Aug 29, 2016 at 08:08:31AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/dma-buf/reservation.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 10fd441dd4ed..3369e4668e96 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -388,9 +388,6 @@ retry:
>  		if (fobj)
>  			shared_count = fobj->shared_count;
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		for (i = 0; i < shared_count; ++i) {
>  			struct fence *lfence = rcu_dereference(fobj->shared[i]);
>  
> @@ -413,9 +410,6 @@ retry:
>  	if (!shared_count) {
>  		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		if (fence_excl &&
>  		    !test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
>  			if (!fence_get_rcu(fence_excl))
> @@ -430,6 +424,11 @@ retry:
>  
>  	rcu_read_unlock();
>  	if (fence) {
> +		if (read_seqcount_retry(&obj->seq, seq)) {
> +			fence_put(fence);
> +			goto retry;
> +		}
> +
>  		ret = fence_wait_timeout(fence, intr, ret);
>  		fence_put(fence);
>  		if (ret > 0 && wait_all && (i + 1 < shared_count))
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
  2016-09-23 12:59     ` Daniel Vetter
@ 2016-09-23 13:34       ` Markus Heiser
  -1 siblings, 0 replies; 63+ messages in thread
From: Markus Heiser @ 2016-09-23 13:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Chris Wilson, dri-devel, intel-gfx, Daniel Vetter, Sumit Semwal,
	linux-media, linaro-mm-sig


Am 23.09.2016 um 14:59 schrieb Daniel Vetter <daniel@ffwll.ch>:

>> 
>> /**
>> - * fence_put - decreases refcount of the fence
>> - * @fence:	[in]	fence to reduce refcount of
>> + * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
>> + * @fence:	[in]	pointer to fence to increase refcount of
>> + *
>> + * Function returns NULL if no refcount could be obtained, or the fence.
>> + * This function handles acquiring a reference to a fence that may be
>> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
>> + * so long as the caller is using RCU on the pointer to the fence.
>> + *
>> + * An alternative mechanism is to employ a seqlock to protect a bunch of
>> + * fences, such as used by struct reservation_object. When using a seqlock,
>> + * the seqlock must be taken before and checked after a reference to the
>> + * fence is acquired (as shown here).
>> + *
>> + * The caller is required to hold the RCU read lock.
> 
> Would be good to cross reference the various fence_get functions a bit
> better in the docs. But since the docs aren't yet pulled into the rst/html
> output, that doesn't matter that much

Hi Daniel ... I'am working on ;-)

* http://return42.github.io/sphkerneldoc/linux_src_doc/include/linux/fence_h.html

-- Markus 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe()
@ 2016-09-23 13:34       ` Markus Heiser
  0 siblings, 0 replies; 63+ messages in thread
From: Markus Heiser @ 2016-09-23 13:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, intel-gfx, dri-devel, linaro-mm-sig, linux-media


Am 23.09.2016 um 14:59 schrieb Daniel Vetter <daniel@ffwll.ch>:

>> 
>> /**
>> - * fence_put - decreases refcount of the fence
>> - * @fence:	[in]	fence to reduce refcount of
>> + * fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
>> + * @fence:	[in]	pointer to fence to increase refcount of
>> + *
>> + * Function returns NULL if no refcount could be obtained, or the fence.
>> + * This function handles acquiring a reference to a fence that may be
>> + * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
>> + * so long as the caller is using RCU on the pointer to the fence.
>> + *
>> + * An alternative mechanism is to employ a seqlock to protect a bunch of
>> + * fences, such as used by struct reservation_object. When using a seqlock,
>> + * the seqlock must be taken before and checked after a reference to the
>> + * fence is acquired (as shown here).
>> + *
>> + * The caller is required to hold the RCU read lock.
> 
> Would be good to cross reference the various fence_get functions a bit
> better in the docs. But since the docs aren't yet pulled into the rst/html
> output, that doesn't matter that much

Hi Daniel ... I'am working on ;-)

* http://return42.github.io/sphkerneldoc/linux_src_doc/include/linux/fence_h.html

-- Markus --
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
  2016-08-29  7:08   ` Chris Wilson
@ 2016-09-23 13:43     ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:43 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, intel-gfx, Daniel Vetter, Maarten Lankhorst,
	Christian König, Alex Deucher, Sumit Semwal, linux-media,
	linaro-mm-sig

On Mon, Aug 29, 2016 at 08:08:32AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
>  1 file changed, 10 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 3369e4668e96..e74493e7332b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
>  					  bool test_all)
>  {
>  	unsigned seq, shared_count;
> -	int ret = true;
> +	int ret;
>  
> +	rcu_read_lock();
>  retry:
> +	ret = true;
>  	shared_count = 0;
>  	seq = read_seqcount_begin(&obj->seq);
> -	rcu_read_lock();
>  
>  	if (test_all) {
>  		unsigned i;
> @@ -490,46 +491,35 @@ retry:
>  		if (fobj)
>  			shared_count = fobj->shared_count;
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		for (i = 0; i < shared_count; ++i) {
>  			struct fence *fence = rcu_dereference(fobj->shared[i]);
>  
>  			ret = reservation_object_test_signaled_single(fence);
>  			if (ret < 0)
> -				goto unlock_retry;
> +				goto retry;
>  			else if (!ret)
>  				break;
>  		}
>  
> -		/*
> -		 * There could be a read_seqcount_retry here, but nothing cares
> -		 * about whether it's the old or newer fence pointers that are
> -		 * signaled. That race could still have happened after checking
> -		 * read_seqcount_retry. If you care, use ww_mutex_lock.
> -		 */
> +		if (read_seqcount_retry(&obj->seq, seq))
> +			goto retry;
>  	}
>  
>  	if (!shared_count) {
>  		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		if (fence_excl) {
>  			ret = reservation_object_test_signaled_single(
>  								fence_excl);
>  			if (ret < 0)
> -				goto unlock_retry;
> +				goto retry;
> +
> +			if (read_seqcount_retry(&obj->seq, seq))
> +				goto retry;
>  		}
>  	}
>  
>  	rcu_read_unlock();
>  	return ret;
> -
> -unlock_retry:
> -	rcu_read_unlock();
> -	goto retry;
>  }
>  EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() after writes
@ 2016-09-23 13:43     ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:43 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, intel-gfx, dri-devel, linaro-mm-sig, Alex Deucher,
	Christian König, linux-media

On Mon, Aug 29, 2016 at 08:08:32AM +0100, Chris Wilson wrote:
> In order to be completely generic, we have to double check the read
> seqlock after acquiring a reference to the fence. If the driver is
> allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
> within an RCU grace period a fence may be freed and reallocated. The RCU
> read side critical section does not prevent this reallocation, instead
> we have to inspect the reservation's seqlock to double check if the
> fences have been reassigned as we were acquiring our reference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
>  1 file changed, 10 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 3369e4668e96..e74493e7332b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
>  					  bool test_all)
>  {
>  	unsigned seq, shared_count;
> -	int ret = true;
> +	int ret;
>  
> +	rcu_read_lock();
>  retry:
> +	ret = true;
>  	shared_count = 0;
>  	seq = read_seqcount_begin(&obj->seq);
> -	rcu_read_lock();
>  
>  	if (test_all) {
>  		unsigned i;
> @@ -490,46 +491,35 @@ retry:
>  		if (fobj)
>  			shared_count = fobj->shared_count;
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		for (i = 0; i < shared_count; ++i) {
>  			struct fence *fence = rcu_dereference(fobj->shared[i]);
>  
>  			ret = reservation_object_test_signaled_single(fence);
>  			if (ret < 0)
> -				goto unlock_retry;
> +				goto retry;
>  			else if (!ret)
>  				break;
>  		}
>  
> -		/*
> -		 * There could be a read_seqcount_retry here, but nothing cares
> -		 * about whether it's the old or newer fence pointers that are
> -		 * signaled. That race could still have happened after checking
> -		 * read_seqcount_retry. If you care, use ww_mutex_lock.
> -		 */
> +		if (read_seqcount_retry(&obj->seq, seq))
> +			goto retry;
>  	}
>  
>  	if (!shared_count) {
>  		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>  
> -		if (read_seqcount_retry(&obj->seq, seq))
> -			goto unlock_retry;
> -
>  		if (fence_excl) {
>  			ret = reservation_object_test_signaled_single(
>  								fence_excl);
>  			if (ret < 0)
> -				goto unlock_retry;
> +				goto retry;
> +
> +			if (read_seqcount_retry(&obj->seq, seq))
> +				goto retry;
>  		}
>  	}
>  
>  	rcu_read_unlock();
>  	return ret;
> -
> -unlock_retry:
> -	rcu_read_unlock();
> -	goto retry;
>  }
>  EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
> -- 
> 2.9.3
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
  2016-08-29  7:08   ` Chris Wilson
@ 2016-09-23 13:49     ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media

On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> With the seqlock now extended to cover the lookup of the fence and its
> testing, we can perform that testing solely under the seqlock guard and
> avoid the effective locking and serialisation of acquiring a reference to
> the request.  As the fence is RCU protected we know it cannot disappear
> as we test it, the same guarantee that made it safe to acquire the
> reference previously.  The seqlock tests whether the fence was replaced
> as we are testing it telling us whether or not we can trust the result
> (if not, we just repeat the test until stable).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Not entirely sure this is safe for non-i915 drivers. We might now call
->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
i915 can do that, but other drivers might go boom.

I think in generic code we can't do these kind of tricks unfortunately.
-Daniel

> ---
>  drivers/dma-buf/reservation.c | 32 ++++----------------------------
>  1 file changed, 4 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index e74493e7332b..1ddffa5adb5a 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -442,24 +442,6 @@ unlock_retry:
>  }
>  EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
>  
> -
> -static inline int
> -reservation_object_test_signaled_single(struct fence *passed_fence)
> -{
> -	struct fence *fence, *lfence = passed_fence;
> -	int ret = 1;
> -
> -	if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
> -		fence = fence_get_rcu(lfence);
> -		if (!fence)
> -			return -1;
> -
> -		ret = !!fence_is_signaled(fence);
> -		fence_put(fence);
> -	}
> -	return ret;
> -}
> -
>  /**
>   * reservation_object_test_signaled_rcu - Test if a reservation object's
>   * fences have been signaled.
> @@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
>  					  bool test_all)
>  {
>  	unsigned seq, shared_count;
> -	int ret;
> +	bool ret;
>  
>  	rcu_read_lock();
>  retry:
> @@ -494,10 +476,8 @@ retry:
>  		for (i = 0; i < shared_count; ++i) {
>  			struct fence *fence = rcu_dereference(fobj->shared[i]);
>  
> -			ret = reservation_object_test_signaled_single(fence);
> -			if (ret < 0)
> -				goto retry;
> -			else if (!ret)
> +			ret = fence_is_signaled(fence);
> +			if (!ret)
>  				break;
>  		}
>  
> @@ -509,11 +489,7 @@ retry:
>  		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>  
>  		if (fence_excl) {
> -			ret = reservation_object_test_signaled_single(
> -								fence_excl);
> -			if (ret < 0)
> -				goto retry;
> -
> +			ret = fence_is_signaled(fence_excl);
>  			if (read_seqcount_retry(&obj->seq, seq))
>  				goto retry;
>  		}
> -- 
> 2.9.3
> 
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
@ 2016-09-23 13:49     ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linaro-mm-sig, intel-gfx, dri-devel, linux-media

On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> With the seqlock now extended to cover the lookup of the fence and its
> testing, we can perform that testing solely under the seqlock guard and
> avoid the effective locking and serialisation of acquiring a reference to
> the request.  As the fence is RCU protected we know it cannot disappear
> as we test it, the same guarantee that made it safe to acquire the
> reference previously.  The seqlock tests whether the fence was replaced
> as we are testing it telling us whether or not we can trust the result
> (if not, we just repeat the test until stable).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org

Not entirely sure this is safe for non-i915 drivers. We might now call
->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
i915 can do that, but other drivers might go boom.

I think in generic code we can't do these kind of tricks unfortunately.
-Daniel

> ---
>  drivers/dma-buf/reservation.c | 32 ++++----------------------------
>  1 file changed, 4 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index e74493e7332b..1ddffa5adb5a 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -442,24 +442,6 @@ unlock_retry:
>  }
>  EXPORT_SYMBOL_GPL(reservation_object_wait_timeout_rcu);
>  
> -
> -static inline int
> -reservation_object_test_signaled_single(struct fence *passed_fence)
> -{
> -	struct fence *fence, *lfence = passed_fence;
> -	int ret = 1;
> -
> -	if (!test_bit(FENCE_FLAG_SIGNALED_BIT, &lfence->flags)) {
> -		fence = fence_get_rcu(lfence);
> -		if (!fence)
> -			return -1;
> -
> -		ret = !!fence_is_signaled(fence);
> -		fence_put(fence);
> -	}
> -	return ret;
> -}
> -
>  /**
>   * reservation_object_test_signaled_rcu - Test if a reservation object's
>   * fences have been signaled.
> @@ -474,7 +456,7 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
>  					  bool test_all)
>  {
>  	unsigned seq, shared_count;
> -	int ret;
> +	bool ret;
>  
>  	rcu_read_lock();
>  retry:
> @@ -494,10 +476,8 @@ retry:
>  		for (i = 0; i < shared_count; ++i) {
>  			struct fence *fence = rcu_dereference(fobj->shared[i]);
>  
> -			ret = reservation_object_test_signaled_single(fence);
> -			if (ret < 0)
> -				goto retry;
> -			else if (!ret)
> +			ret = fence_is_signaled(fence);
> +			if (!ret)
>  				break;
>  		}
>  
> @@ -509,11 +489,7 @@ retry:
>  		struct fence *fence_excl = rcu_dereference(obj->fence_excl);
>  
>  		if (fence_excl) {
> -			ret = reservation_object_test_signaled_single(
> -								fence_excl);
> -			if (ret < 0)
> -				goto retry;
> -
> +			ret = fence_is_signaled(fence_excl);
>  			if (read_seqcount_retry(&obj->seq, seq))
>  				goto retry;
>  		}
> -- 
> 2.9.3
> 
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-08-29  7:08   ` Chris Wilson
@ 2016-09-23 13:50     ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:50 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal

On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
> 
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
> 
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
> 
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Need to strike the r-b here, since Christian König pointed out that
objects won't magically switch signalling on.
-Daniel

> ---
>  drivers/dma-buf/dma-buf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
>  	if (!events)
>  		return 0;
>  
> +	if (poll_does_not_wait(poll)) {
> +		if (events & POLLOUT &&
> +		    !reservation_object_test_signaled_rcu(resv, true))
> +			events &= ~(POLLOUT | POLLIN);
> +
> +		if (events & POLLIN &&
> +		    !reservation_object_test_signaled_rcu(resv, false))
> +			events &= ~POLLIN;
> +
> +		return events;
> +	}
> +
>  retry:
>  	seq = read_seqcount_begin(&resv->seq);
>  	rcu_read_lock();
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-09-23 13:50     ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-23 13:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linaro-mm-sig, intel-gfx, dri-devel, linux-media

On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
> 
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
> 
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
> 
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linaro-mm-sig@lists.linaro.org
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Need to strike the r-b here, since Christian König pointed out that
objects won't magically switch signalling on.
-Daniel

> ---
>  drivers/dma-buf/dma-buf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
>  	if (!events)
>  		return 0;
>  
> +	if (poll_does_not_wait(poll)) {
> +		if (events & POLLOUT &&
> +		    !reservation_object_test_signaled_rcu(resv, true))
> +			events &= ~(POLLOUT | POLLIN);
> +
> +		if (events & POLLIN &&
> +		    !reservation_object_test_signaled_rcu(resv, false))
> +			events &= ~POLLIN;
> +
> +		return events;
> +	}
> +
>  retry:
>  	seq = read_seqcount_begin(&resv->seq);
>  	rcu_read_lock();
> -- 
> 2.9.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
  2016-09-23 13:49     ` Daniel Vetter
  (?)
@ 2016-09-23 14:02     ` Chris Wilson
  2016-09-25 20:43         ` Daniel Vetter
  -1 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-09-23 14:02 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media

On Fri, Sep 23, 2016 at 03:49:26PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> > With the seqlock now extended to cover the lookup of the fence and its
> > testing, we can perform that testing solely under the seqlock guard and
> > avoid the effective locking and serialisation of acquiring a reference to
> > the request.  As the fence is RCU protected we know it cannot disappear
> > as we test it, the same guarantee that made it safe to acquire the
> > reference previously.  The seqlock tests whether the fence was replaced
> > as we are testing it telling us whether or not we can trust the result
> > (if not, we just repeat the test until stable).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> 
> Not entirely sure this is safe for non-i915 drivers. We might now call
> ->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
> i915 can do that, but other drivers might go boom.

All fences must be under RCU guard, or is that the sticking point? Given
that, the problem is fence reallocation within the RCU grace period. If
we can mandate that fence reallocation must be safe for concurrent
fence->ops->*(), we can use this technique to avoid the serialisation
barrier otherwise required. In the simple stress test, the difference is
an order of magnitude, and test_signaled_rcu is often on a path where
every memory barrier quickly adds up (at least for us).

So is it just that you worry that others using SLAB_DESTROY_BY_RCU won't
ensure their fence is safe during the reallocation?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-09-23 13:50     ` Daniel Vetter
  (?)
@ 2016-09-23 14:15     ` Chris Wilson
  -1 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-09-23 14:15 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal

On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

The point being here that we don't even want to switch signaling on! :)

Christian's point was that not all fences guarantee forward progress
irrespective of whether signaling is enabled or not, and fences are not
required to guarantee forward progress without signaling even if they
provide an ops->signaled().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-09-23 13:50     ` Daniel Vetter
@ 2016-09-23 15:06       ` Chris Wilson
  -1 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-09-23 15:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal

On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

Propagating a flag through to sync_file is trivial, but not through to
the dma_buf->resv. Looks like dma-buf will be without a fast busy query,
which I guess in the grand scheme of things (i.e. dma-buf itself is not
intended to be used as a fence) is not that important.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-09-23 15:06       ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2016-09-23 15:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linaro-mm-sig, intel-gfx, Sumit Semwal, dri-devel, linux-media

On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

Propagating a flag through to sync_file is trivial, but not through to
the dma_buf->resv. Looks like dma-buf will be without a fast busy query,
which I guess in the grand scheme of things (i.e. dma-buf itself is not
intended to be used as a fence) is not that important.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-09-23 13:50     ` Daniel Vetter
                       ` (2 preceding siblings ...)
  (?)
@ 2016-09-23 15:20     ` Chris Wilson
  2016-09-23 17:59         ` Christian König
  -1 siblings, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2016-09-23 15:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, linaro-mm-sig, intel-gfx, linux-media, Sumit Semwal

On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

Oh, it also means that

commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
Author: Jammy Zhou <Jammy.Zhou@amd.com>
Date:   Wed Jan 21 18:35:47 2015 +0800

    reservation: wait only with non-zero timeout specified (v3)
    
    When the timeout value passed to reservation_object_wait_timeout_rcu
    is zero, no wait should be done if the fences are not signaled.
    
    Return '1' for idle and '0' for busy if the specified timeout is '0'
    to keep consistent with the case of non-zero timeout.
    
    v2: call fence_put if not signaled in the case of timeout==0
    
    v3: switch to reservation_object_test_signaled_rcu
    
    Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
    Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>

is wrong. And reservation_object_test_signaled_rcu() is unreliable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-09-23 15:20     ` [Intel-gfx] " Chris Wilson
@ 2016-09-23 17:59         ` Christian König
  0 siblings, 0 replies; 63+ messages in thread
From: Christian König @ 2016-09-23 17:59 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
	linux-media, Sumit Semwal

Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
>>> Currently we install a callback for performing poll on a dma-buf,
>>> irrespective of the timeout. This involves taking a spinlock, as well as
>>> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
>>> multiple threads.
>>>
>>> We can query whether the poll will block prior to installing the
>>> callback to make the busy-query fast.
>>>
>>> Single thread: 60% faster
>>> 8 threads on 4 (+4 HT) cores: 600% faster
>>>
>>> Still not quite the perfect scaling we get with a native busy ioctl, but
>>> poll(dmabuf) is faster due to the quicker lookup of the object and
>>> avoiding drm_ioctl().
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: linux-media@vger.kernel.org
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Need to strike the r-b here, since Christian König pointed out that
>> objects won't magically switch signalling on.
> Oh, it also means that
>
> commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> Author: Jammy Zhou <Jammy.Zhou@amd.com>
> Date:   Wed Jan 21 18:35:47 2015 +0800
>
>      reservation: wait only with non-zero timeout specified (v3)
>      
>      When the timeout value passed to reservation_object_wait_timeout_rcu
>      is zero, no wait should be done if the fences are not signaled.
>      
>      Return '1' for idle and '0' for busy if the specified timeout is '0'
>      to keep consistent with the case of non-zero timeout.
>      
>      v2: call fence_put if not signaled in the case of timeout==0
>      
>      v3: switch to reservation_object_test_signaled_rcu
>      
>      Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
>      Reviewed-by: Christian König <christian.koenig@amd.com>
>      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>      Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>      Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
>
> is wrong. And reservation_object_test_signaled_rcu() is unreliable.

Ups indeed, that patch is wrong as well.

I suggest that we just enable the signaling in this case as well.

Regards,
Christian.

> -Chris
>


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-09-23 17:59         ` Christian König
  0 siblings, 0 replies; 63+ messages in thread
From: Christian König @ 2016-09-23 17:59 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
	linux-media, Sumit Semwal

Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
>>> Currently we install a callback for performing poll on a dma-buf,
>>> irrespective of the timeout. This involves taking a spinlock, as well as
>>> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
>>> multiple threads.
>>>
>>> We can query whether the poll will block prior to installing the
>>> callback to make the busy-query fast.
>>>
>>> Single thread: 60% faster
>>> 8 threads on 4 (+4 HT) cores: 600% faster
>>>
>>> Still not quite the perfect scaling we get with a native busy ioctl, but
>>> poll(dmabuf) is faster due to the quicker lookup of the object and
>>> avoiding drm_ioctl().
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: linux-media@vger.kernel.org
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Need to strike the r-b here, since Christian König pointed out that
>> objects won't magically switch signalling on.
> Oh, it also means that
>
> commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> Author: Jammy Zhou <Jammy.Zhou@amd.com>
> Date:   Wed Jan 21 18:35:47 2015 +0800
>
>      reservation: wait only with non-zero timeout specified (v3)
>      
>      When the timeout value passed to reservation_object_wait_timeout_rcu
>      is zero, no wait should be done if the fences are not signaled.
>      
>      Return '1' for idle and '0' for busy if the specified timeout is '0'
>      to keep consistent with the case of non-zero timeout.
>      
>      v2: call fence_put if not signaled in the case of timeout==0
>      
>      v3: switch to reservation_object_test_signaled_rcu
>      
>      Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
>      Reviewed-by: Christian König <christian.koenig@amd.com>
>      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>      Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>      Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
>
> is wrong. And reservation_object_test_signaled_rcu() is unreliable.

Ups indeed, that patch is wrong as well.

I suggest that we just enable the signaling in this case as well.

Regards,
Christian.

> -Chris
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
  2016-09-23 14:02     ` Chris Wilson
@ 2016-09-25 20:43         ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:43 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
	linux-media

On Fri, Sep 23, 2016 at 03:02:32PM +0100, Chris Wilson wrote:
> On Fri, Sep 23, 2016 at 03:49:26PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> > > With the seqlock now extended to cover the lookup of the fence and its
> > > testing, we can perform that testing solely under the seqlock guard and
> > > avoid the effective locking and serialisation of acquiring a reference to
> > > the request.  As the fence is RCU protected we know it cannot disappear
> > > as we test it, the same guarantee that made it safe to acquire the
> > > reference previously.  The seqlock tests whether the fence was replaced
> > > as we are testing it telling us whether or not we can trust the result
> > > (if not, we just repeat the test until stable).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: linux-media@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > 
> > Not entirely sure this is safe for non-i915 drivers. We might now call
> > ->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
> > i915 can do that, but other drivers might go boom.
> 
> All fences must be under RCU guard, or is that the sticking point? Given
> that, the problem is fence reallocation within the RCU grace period. If
> we can mandate that fence reallocation must be safe for concurrent
> fence->ops->*(), we can use this technique to avoid the serialisation
> barrier otherwise required. In the simple stress test, the difference is
> an order of magnitude, and test_signaled_rcu is often on a path where
> every memory barrier quickly adds up (at least for us).
> 
> So is it just that you worry that others using SLAB_DESTROY_BY_RCU won't
> ensure their fence is safe during the reallocation?

Before your patch the rcu-protected part was just the
kref_get_unless_zero. This was done before calling down into and
fenc->ops->* functions. Which means the code of these functions was
guaranteed to run on a real fence object, and not a zombie fence in the
process of getting destructed.

Afaiui with your patch we might call into fence->ops->* on these zombie
fences. That would be a behaviour change with rather big implications
(since we'd need to audit all existing implementations, and also make sure
all future ones will be ok too). Or I missed something again.

I think we could still to this trick, at least partially, by making sure
we only inspect generic fence state and never call into fence->ops before
we're guaranteed to have a real fence.

But atm (at least per Christian König) a fence won't eventually get
signalled without calling into ->ops functions, so there's a catch 22.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single
@ 2016-09-25 20:43         ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:43 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
	linux-media

On Fri, Sep 23, 2016 at 03:02:32PM +0100, Chris Wilson wrote:
> On Fri, Sep 23, 2016 at 03:49:26PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 29, 2016 at 08:08:33AM +0100, Chris Wilson wrote:
> > > With the seqlock now extended to cover the lookup of the fence and its
> > > testing, we can perform that testing solely under the seqlock guard and
> > > avoid the effective locking and serialisation of acquiring a reference to
> > > the request.  As the fence is RCU protected we know it cannot disappear
> > > as we test it, the same guarantee that made it safe to acquire the
> > > reference previously.  The seqlock tests whether the fence was replaced
> > > as we are testing it telling us whether or not we can trust the result
> > > (if not, we just repeat the test until stable).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: linux-media@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > 
> > Not entirely sure this is safe for non-i915 drivers. We might now call
> > ->signalled on a zombie fence (i.e. refcount == 0, but not yet kfreed).
> > i915 can do that, but other drivers might go boom.
> 
> All fences must be under RCU guard, or is that the sticking point? Given
> that, the problem is fence reallocation within the RCU grace period. If
> we can mandate that fence reallocation must be safe for concurrent
> fence->ops->*(), we can use this technique to avoid the serialisation
> barrier otherwise required. In the simple stress test, the difference is
> an order of magnitude, and test_signaled_rcu is often on a path where
> every memory barrier quickly adds up (at least for us).
> 
> So is it just that you worry that others using SLAB_DESTROY_BY_RCU won't
> ensure their fence is safe during the reallocation?

Before your patch the rcu-protected part was just the
kref_get_unless_zero. This was done before calling down into and
fenc->ops->* functions. Which means the code of these functions was
guaranteed to run on a real fence object, and not a zombie fence in the
process of getting destructed.

Afaiui with your patch we might call into fence->ops->* on these zombie
fences. That would be a behaviour change with rather big implications
(since we'd need to audit all existing implementations, and also make sure
all future ones will be ok too). Or I missed something again.

I think we could still to this trick, at least partially, by making sure
we only inspect generic fence state and never call into fence->ops before
we're guaranteed to have a real fence.

But atm (at least per Christian König) a fence won't eventually get
signalled without calling into ->ops functions, so there's a catch 22.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
  2016-09-23 17:59         ` Christian König
@ 2016-09-25 20:44           ` Daniel Vetter
  -1 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:44 UTC (permalink / raw)
  To: Christian König
  Cc: Chris Wilson, Daniel Vetter, dri-devel, linaro-mm-sig, intel-gfx,
	linux-media, Sumit Semwal

On Fri, Sep 23, 2016 at 07:59:44PM +0200, Christian König wrote:
> Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > > > Currently we install a callback for performing poll on a dma-buf,
> > > > irrespective of the timeout. This involves taking a spinlock, as well as
> > > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > > > multiple threads.
> > > > 
> > > > We can query whether the poll will block prior to installing the
> > > > callback to make the busy-query fast.
> > > > 
> > > > Single thread: 60% faster
> > > > 8 threads on 4 (+4 HT) cores: 600% faster
> > > > 
> > > > Still not quite the perfect scaling we get with a native busy ioctl, but
> > > > poll(dmabuf) is faster due to the quicker lookup of the object and
> > > > avoiding drm_ioctl().
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Need to strike the r-b here, since Christian König pointed out that
> > > objects won't magically switch signalling on.
> > Oh, it also means that
> > 
> > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> > Author: Jammy Zhou <Jammy.Zhou@amd.com>
> > Date:   Wed Jan 21 18:35:47 2015 +0800
> > 
> >      reservation: wait only with non-zero timeout specified (v3)
> >      When the timeout value passed to reservation_object_wait_timeout_rcu
> >      is zero, no wait should be done if the fences are not signaled.
> >      Return '1' for idle and '0' for busy if the specified timeout is '0'
> >      to keep consistent with the case of non-zero timeout.
> >      v2: call fence_put if not signaled in the case of timeout==0
> >      v3: switch to reservation_object_test_signaled_rcu
> >      Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
> >      Reviewed-by: Christian König <christian.koenig@amd.com>
> >      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> >      Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> >      Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> > 
> > is wrong. And reservation_object_test_signaled_rcu() is unreliable.
> 
> Ups indeed, that patch is wrong as well.
> 
> I suggest that we just enable the signaling in this case as well.

Will you/Zhou take care of this corner case? Just so I can't forget about
it ;-)

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0
@ 2016-09-25 20:44           ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2016-09-25 20:44 UTC (permalink / raw)
  To: Christian König
  Cc: intel-gfx, dri-devel, linaro-mm-sig, Sumit Semwal, linux-media

On Fri, Sep 23, 2016 at 07:59:44PM +0200, Christian König wrote:
> Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > > > Currently we install a callback for performing poll on a dma-buf,
> > > > irrespective of the timeout. This involves taking a spinlock, as well as
> > > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > > > multiple threads.
> > > > 
> > > > We can query whether the poll will block prior to installing the
> > > > callback to make the busy-query fast.
> > > > 
> > > > Single thread: 60% faster
> > > > 8 threads on 4 (+4 HT) cores: 600% faster
> > > > 
> > > > Still not quite the perfect scaling we get with a native busy ioctl, but
> > > > poll(dmabuf) is faster due to the quicker lookup of the object and
> > > > avoiding drm_ioctl().
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Need to strike the r-b here, since Christian König pointed out that
> > > objects won't magically switch signalling on.
> > Oh, it also means that
> > 
> > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> > Author: Jammy Zhou <Jammy.Zhou@amd.com>
> > Date:   Wed Jan 21 18:35:47 2015 +0800
> > 
> >      reservation: wait only with non-zero timeout specified (v3)
> >      When the timeout value passed to reservation_object_wait_timeout_rcu
> >      is zero, no wait should be done if the fences are not signaled.
> >      Return '1' for idle and '0' for busy if the specified timeout is '0'
> >      to keep consistent with the case of non-zero timeout.
> >      v2: call fence_put if not signaled in the case of timeout==0
> >      v3: switch to reservation_object_test_signaled_rcu
> >      Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
> >      Reviewed-by: Christian König <christian.koenig@amd.com>
> >      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> >      Reviewed-By: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> >      Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> > 
> > is wrong. And reservation_object_test_signaled_rcu() is unreliable.
> 
> Ups indeed, that patch is wrong as well.
> 
> I suggest that we just enable the signaling in this case as well.

Will you/Zhou take care of this corner case? Just so I can't forget about
it ;-)

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait
  2016-09-23 12:54 ` Daniel Vetter
@ 2016-10-05 16:03   ` Sumit Semwal
  0 siblings, 0 replies; 63+ messages in thread
From: Sumit Semwal @ 2016-10-05 16:03 UTC (permalink / raw)
  To: Alex Deucher; +Cc: intel-gfx, Christian König, DRI mailing list

Hi Alex,

On 23 September 2016 at 18:24, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Aug 29, 2016 at 08:08:24AM +0100, Chris Wilson wrote:
>> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
>> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
>> need to handle such conversion in the caller. The only challenge are
>> those callers that wish to differentiate the error code between the
>> nonblocking busy check and potentially blocking wait.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian König <christian.koenig@amd.com>
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
I couldn't find if its already applied to your tree, or your acked-by;
could you please let me know if it's there, or if you'd like me to
pick it up via drm-misc (and an Acked-by would be appreciated in the
latter case :) )

Best,
Sumit.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/11] drm/nouveau: Remove call to reservation_object_test_signaled_rcu before wait
  2016-09-23 12:55   ` Daniel Vetter
@ 2016-10-05 16:05     ` Sumit Semwal
  0 siblings, 0 replies; 63+ messages in thread
From: Sumit Semwal @ 2016-10-05 16:05 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: intel-gfx, DRI mailing list

Hi Ben,

On 23 September 2016 at 18:25, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Aug 29, 2016 at 08:08:27AM +0100, Chris Wilson wrote:
>> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
>> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
>> need to handle such conversion in the caller. The only challenge are
>> those callers that wish to differentiate the error code between the
>> nonblocking busy check and potentially blocking wait.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Ben Skeggs <bskeggs@redhat.com>
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
May I please know if this patch is already in your queue, or should I
take it through drm-misc (in which case an Acked-by would be highly
appreciated :) )


Best,
Sumit.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 05/11] drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
  2016-09-23 12:56   ` Daniel Vetter
@ 2016-10-05 16:11     ` Sumit Semwal
  2016-10-05 17:03       ` Sinclair Yeh
  0 siblings, 1 reply; 63+ messages in thread
From: Sumit Semwal @ 2016-10-05 16:11 UTC (permalink / raw)
  To: Thomas Hellstrom, syeh; +Cc: intel-gfx, DRI mailing list

Hi Thomas, Sinclair,

On 23 September 2016 at 18:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Aug 29, 2016 at 08:08:28AM +0100, Chris Wilson wrote:
>> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
>> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
>> need to handle such conversion in the caller. The only challenge are
>> those callers that wish to differentiate the error code between the
>> nonblocking busy check and potentially blocking wait.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sinclair Yeh <syeh@vmware.com>
>> Cc: Thomas Hellstrom <thellstrom@vmware.com>
>> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
Could you please let me know if this patch is already queued up at
your end, or should I just take it via drm-misc with Sinclair's r-b?

Thanks and Best,
Sumit.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/11] drm/etnaviv: Remove manual call to reservation_object_test_signaled_rcu before wait
  2016-09-23 12:55   ` Daniel Vetter
@ 2016-10-05 16:15     ` Sumit Semwal
  2016-10-10 13:17       ` Lucas Stach
  0 siblings, 1 reply; 63+ messages in thread
From: Sumit Semwal @ 2016-10-05 16:15 UTC (permalink / raw)
  To: l.stach; +Cc: intel-gfx, Russell King, DRI mailing list

Hi Lucas,

On 23 September 2016 at 18:25, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Aug 29, 2016 at 08:08:25AM +0100, Chris Wilson wrote:
>> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
>> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
>> need to handle such conversion in the caller. The only challenge are
>> those callers that wish to differentiate the error code between the
>> nonblocking busy check and potentially blocking wait.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Lucas Stach <l.stach@pengutronix.de>
>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
Could you please let me know if this is in your tree already, or would
you like me to take it via drm-misc (in which case, an Acked-by would
be fabulous!)

Thanks and best,
Sumit.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Intel-gfx] [PATCH 05/11] drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
  2016-10-05 16:11     ` [Intel-gfx] " Sumit Semwal
@ 2016-10-05 17:03       ` Sinclair Yeh
  0 siblings, 0 replies; 63+ messages in thread
From: Sinclair Yeh @ 2016-10-05 17:03 UTC (permalink / raw)
  To: Sumit Semwal; +Cc: DRI mailing list, Thomas Hellstrom, intel-gfx

Hi,

I'm preparing a fixes pull request, and I'll include this one if it
hasn't been applied by others already.

Sinclair


On Wed, Oct 05, 2016 at 09:41:22PM +0530, Sumit Semwal wrote:
> Hi Thomas, Sinclair,
> 
> On 23 September 2016 at 18:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Aug 29, 2016 at 08:08:28AM +0100, Chris Wilson wrote:
> >> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> >> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> >> need to handle such conversion in the caller. The only challenge are
> >> those callers that wish to differentiate the error code between the
> >> nonblocking busy check and potentially blocking wait.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Sinclair Yeh <syeh@vmware.com>
> >> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> >> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> Could you please let me know if this patch is already queued up at
> your end, or should I just take it via drm-misc with Sinclair's r-b?
> 
> Thanks and Best,
> Sumit.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/11] drm/etnaviv: Remove manual call to reservation_object_test_signaled_rcu before wait
  2016-10-05 16:15     ` Sumit Semwal
@ 2016-10-10 13:17       ` Lucas Stach
  0 siblings, 0 replies; 63+ messages in thread
From: Lucas Stach @ 2016-10-10 13:17 UTC (permalink / raw)
  To: Sumit Semwal; +Cc: intel-gfx, Russell King, DRI mailing list

Am Mittwoch, den 05.10.2016, 21:45 +0530 schrieb Sumit Semwal:
> Hi Lucas,
> 
> On 23 September 2016 at 18:25, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Aug 29, 2016 at 08:08:25AM +0100, Chris Wilson wrote:
> >> Since fence_wait_timeout_reservation_object_wait_timeout_rcu() with a
> >> timeout of 0 becomes reservation_object_test_signaled_rcu(), we do not
> >> need to handle such conversion in the caller. The only challenge are
> >> those callers that wish to differentiate the error code between the
> >> nonblocking busy check and potentially blocking wait.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Lucas Stach <l.stach@pengutronix.de>
> >> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> Could you please let me know if this is in your tree already, or would
> you like me to take it via drm-misc (in which case, an Acked-by would
> be fabulous!)
> 
I haven't picked it up yet. If you are going to take the series through
drm-misc feel free to add:

Acked-by: Lucas Stach <l.stach@pengutronix.de>

Regards,
Lucas

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2016-10-10 13:17 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-29  7:08 [PATCH 01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Chris Wilson
2016-08-29  7:08 ` [PATCH 02/11] drm/etnaviv: Remove manual " Chris Wilson
2016-09-23 12:55   ` Daniel Vetter
2016-10-05 16:15     ` Sumit Semwal
2016-10-10 13:17       ` Lucas Stach
2016-08-29  7:08 ` [PATCH 03/11] drm/msm: Remove " Chris Wilson
2016-09-23 12:55   ` Daniel Vetter
2016-09-23 13:07     ` [Intel-gfx] " Rob Clark
2016-08-29  7:08 ` [PATCH 04/11] drm/nouveau: " Chris Wilson
2016-09-23 12:55   ` Daniel Vetter
2016-10-05 16:05     ` Sumit Semwal
2016-08-29  7:08 ` [PATCH 05/11] drm/vmwgfx: " Chris Wilson
2016-09-23 12:56   ` Daniel Vetter
2016-10-05 16:11     ` [Intel-gfx] " Sumit Semwal
2016-10-05 17:03       ` Sinclair Yeh
2016-08-29  7:08 ` [PATCH 06/11] dma-buf: Introduce fence_get_rcu_safe() Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-09-23 12:59   ` Daniel Vetter
2016-09-23 12:59     ` Daniel Vetter
2016-09-23 13:34     ` Markus Heiser
2016-09-23 13:34       ` Markus Heiser
2016-08-29  7:08 ` [PATCH 07/11] dma-buf: Restart reservation_object_get_fences_rcu() after writes Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-09-23 13:03   ` Daniel Vetter
2016-09-23 13:03     ` Daniel Vetter
2016-08-29  7:08 ` [PATCH 08/11] dma-buf: Restart reservation_object_wait_timeout_rcu() " Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-09-23 13:18   ` Daniel Vetter
2016-08-29  7:08 ` [PATCH 09/11] dma-buf: Restart reservation_object_test_signaled_rcu() " Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-09-23 13:43   ` Daniel Vetter
2016-09-23 13:43     ` Daniel Vetter
2016-08-29  7:08 ` [PATCH 10/11] dma-buf: Use seqlock to close RCU race in test_signaled_single Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-09-23 13:49   ` [Linaro-mm-sig] " Daniel Vetter
2016-09-23 13:49     ` Daniel Vetter
2016-09-23 14:02     ` Chris Wilson
2016-09-25 20:43       ` Daniel Vetter
2016-09-25 20:43         ` Daniel Vetter
2016-08-29  7:08 ` [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Chris Wilson
2016-08-29  7:08   ` Chris Wilson
2016-08-29 18:16   ` [PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0) Chris Wilson
2016-08-29 18:16     ` Chris Wilson
2016-08-29 18:26     ` Gustavo Padovan
2016-09-13 14:46       ` Sumit Semwal
2016-09-13 14:46         ` Sumit Semwal
2016-09-15  0:00     ` Rafael Antognolli
2016-09-21  7:26       ` Gustavo Padovan
2016-09-21 11:08         ` Chris Wilson
2016-09-23 13:50   ` [Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0 Daniel Vetter
2016-09-23 13:50     ` Daniel Vetter
2016-09-23 14:15     ` Chris Wilson
2016-09-23 15:06     ` Chris Wilson
2016-09-23 15:06       ` Chris Wilson
2016-09-23 15:20     ` [Intel-gfx] " Chris Wilson
2016-09-23 17:59       ` Christian König
2016-09-23 17:59         ` Christian König
2016-09-25 20:44         ` Daniel Vetter
2016-09-25 20:44           ` Daniel Vetter
2016-08-29  7:50 ` ✗ Fi.CI.BAT: warning for series starting with [01/11] drm/amdgpu: Remove call to reservation_object_test_signaled_rcu before wait Patchwork
2016-08-29  8:20 ` [PATCH 01/11] " Christian König
2016-09-23 12:54 ` Daniel Vetter
2016-10-05 16:03   ` Sumit Semwal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.