All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface
@ 2014-05-14 14:57 Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers Maarten Lankhorst
                   ` (15 more replies)
  0 siblings, 16 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

This series depends on the previously posted reservation api patches.
2 of them are not yet in for-next-fences branch of
git://git.linaro.org/people/sumit.semwal/linux-3.x.git

The missing patches are still in my vmwgfx_wip branch at
git://people.freedesktop.org/~mlankhorst/linux

All ttm drivers are converted to the fence api, fence_lock is removed
and rcu is used in its place.

qxl is the first driver to use shared fence slots, but when these patches
are applied it's easy to convert nouveau too. I've done it as part of the
cross-device gpu synchronization patch series.

---

Maarten Lankhorst (16):
      drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
      drm/ttm: kill off some members to ttm_validate_buffer
      drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
      drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
      drm/ttm: call ttm_bo_wait while inside a reservation
      drm/ttm: kill fence_lock
      drm/nouveau: rework to new fence interface
      drm/radeon: use common fence implementation for fences
      drm/qxl: rework to new fence interface
      drm/vmwgfx: get rid of different types of fence_flags entirely
      drm/vmwgfx: rework to new fence interface
      drm/ttm: flip the switch, and convert to dma_fence
      drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
      drm/radeon: use rcu waits in some ioctls
      drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
      drm/ttm: use rcu in core ttm

 drivers/gpu/drm/nouveau/core/core/event.c |    4 
 drivers/gpu/drm/nouveau/nouveau_bo.c      |   59 +---
 drivers/gpu/drm/nouveau/nouveau_display.c |   25 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  430 +++++++++++++++++++----------
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   22 +
 drivers/gpu/drm/nouveau/nouveau_gem.c     |   55 +---
 drivers/gpu/drm/nouveau/nv04_fence.c      |    4 
 drivers/gpu/drm/nouveau/nv10_fence.c      |    4 
 drivers/gpu/drm/nouveau/nv17_fence.c      |    2 
 drivers/gpu/drm/nouveau/nv50_fence.c      |    2 
 drivers/gpu/drm/nouveau/nv84_fence.c      |   11 -
 drivers/gpu/drm/qxl/Makefile              |    2 
 drivers/gpu/drm/qxl/qxl_cmd.c             |    7 
 drivers/gpu/drm/qxl/qxl_debugfs.c         |   16 +
 drivers/gpu/drm/qxl/qxl_drv.h             |   20 -
 drivers/gpu/drm/qxl/qxl_fence.c           |   91 ------
 drivers/gpu/drm/qxl/qxl_kms.c             |    1 
 drivers/gpu/drm/qxl/qxl_object.c          |    2 
 drivers/gpu/drm/qxl/qxl_object.h          |    6 
 drivers/gpu/drm/qxl/qxl_release.c         |  172 ++++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c             |   93 ------
 drivers/gpu/drm/radeon/radeon.h           |   15 -
 drivers/gpu/drm/radeon/radeon_cs.c        |   10 +
 drivers/gpu/drm/radeon/radeon_device.c    |    1 
 drivers/gpu/drm/radeon/radeon_display.c   |   20 +
 drivers/gpu/drm/radeon/radeon_fence.c     |  191 ++++++++++---
 drivers/gpu/drm/radeon/radeon_gem.c       |   19 +
 drivers/gpu/drm/radeon/radeon_object.c    |    8 -
 drivers/gpu/drm/radeon/radeon_ttm.c       |   34 --
 drivers/gpu/drm/radeon/radeon_uvd.c       |   10 -
 drivers/gpu/drm/ttm/ttm_bo.c              |  187 ++++++-------
 drivers/gpu/drm/ttm/ttm_bo_util.c         |   28 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c           |    3 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c    |  146 +++-------
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c    |   47 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h       |    1 
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |   24 --
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c     |  329 ++++++++++++----------
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h     |   35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   43 +--
 include/drm/ttm/ttm_bo_api.h              |    7 
 include/drm/ttm/ttm_bo_driver.h           |   29 --
 include/drm/ttm/ttm_execbuf_util.h        |   22 +
 43 files changed, 1107 insertions(+), 1130 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

-- 
Signature

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 02/16] drm/ttm: kill off some members to ttm_validate_buffer Maarten Lankhorst
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

It seems some drivers really want this as a parameter,
like vmwgfx.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/qxl/qxl_release.c        |    2 +-
 drivers/gpu/drm/radeon/radeon_object.c   |    2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c      |    2 +-
 drivers/gpu/drm/radeon/radeon_vm.c       |    2 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   22 +++++++++++++---------
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |    7 ++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |    2 +-
 include/drm/ttm/ttm_execbuf_util.h       |    9 +++++----
 8 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 14e776f1d14e..2b43e5deb051 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -159,7 +159,7 @@ int qxl_release_reserve_list(struct qxl_release *release, bool no_intr)
 	if (list_is_singular(&release->bos))
 		return 0;
 
-	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos);
+	ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, !no_intr);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 19bec0dbfa38..51bf80cdce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -438,7 +438,7 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
 	u64 bytes_moved = 0, initial_bytes_moved;
 	u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
 
-	r = ttm_eu_reserve_buffers(ticket, head);
+	r = ttm_eu_reserve_buffers(ticket, head, true);
 	if (unlikely(r != 0)) {
 		return r;
 	}
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 1b65ae2433cd..2f93fef15aab 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -620,7 +620,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
 	INIT_LIST_HEAD(&head);
 	list_add(&tv.head, &head);
 
-	r = ttm_eu_reserve_buffers(&ticket, &head);
+	r = ttm_eu_reserve_buffers(&ticket, &head, true);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 2aae6ce49d32..f4fd72477a71 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -364,7 +364,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
         INIT_LIST_HEAD(&head);
         list_add(&tv.head, &head);
 
-        r = ttm_eu_reserve_buffers(&ticket, &head);
+        r = ttm_eu_reserve_buffers(&ticket, &head, true);
         if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index e8dac8758528..39a11bbd2bac 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
  */
 
 int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-			   struct list_head *list)
+			   struct list_head *list, bool intr)
 {
 	struct ttm_bo_global *glob;
 	struct ttm_validate_buffer *entry;
@@ -140,7 +140,7 @@ retry:
 		if (entry->reserved)
 			continue;
 
-		ret = __ttm_bo_reserve(bo, true, (ticket == NULL), true,
+		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
 				       ticket);
 
 		if (ret == -EDEADLK) {
@@ -153,13 +153,17 @@ retry:
 			ttm_eu_backoff_reservation_locked(list);
 			spin_unlock(&glob->lru_lock);
 			ttm_eu_list_ref_sub(list);
-			ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
-							       ticket);
-			if (unlikely(ret != 0)) {
-				if (ret == -EINTR)
-					ret = -ERESTARTSYS;
-				goto err_fini;
-			}
+
+			if (intr) {
+				ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+								       ticket);
+				if (unlikely(ret != 0)) {
+					if (ret == -EINTR)
+						ret = -ERESTARTSYS;
+					goto err_fini;
+				}
+			} else
+				ww_mutex_lock_slow(&bo->resv->lock, ticket);
 
 			entry->reserved = true;
 			if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 87df0b3674fd..5d7d2e00296b 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2465,7 +2465,7 @@ int vmw_execbuf_process(struct drm_file *file_priv,
 	if (unlikely(ret != 0))
 		goto out_err_nores;
 
-	ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes);
+	ret = ttm_eu_reserve_buffers(&ticket, &sw_context->validate_nodes, true);
 	if (unlikely(ret != 0))
 		goto out_err;
 
@@ -2655,10 +2655,7 @@ void __vmw_execbuf_release_pinned_bo(struct vmw_private *dev_priv,
 	query_val.bo = ttm_bo_reference(dev_priv->dummy_query_bo);
 	list_add_tail(&query_val.head, &validate_list);
 
-	do {
-		ret = ttm_eu_reserve_buffers(&ticket, &validate_list);
-	} while (ret == -ERESTARTSYS);
-
+	ret = ttm_eu_reserve_buffers(&ticket, &validate_list, false);
 	if (unlikely(ret != 0)) {
 		vmw_execbuf_unpin_panic(dev_priv);
 		goto out_no_reserve;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 01d68f0a69dc..873613a16f72 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1215,7 +1215,7 @@ vmw_resource_check_buffer(struct vmw_resource *res,
 	INIT_LIST_HEAD(&val_list);
 	val_buf->bo = ttm_bo_reference(&res->backup->base);
 	list_add_tail(&val_buf->head, &val_list);
-	ret = ttm_eu_reserve_buffers(NULL, &val_list);
+	ret = ttm_eu_reserve_buffers(NULL, &val_list, interruptible);
 	if (unlikely(ret != 0))
 		goto out_no_reserve;
 
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index 16db7d01a336..fd95fd569ca3 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -73,6 +73,7 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  * @ticket:  [out] ww_acquire_ctx filled in by call, or NULL if only
  *           non-blocking reserves should be tried.
  * @list:    thread private list of ttm_validate_buffer structs.
+ * @intr:    should the wait be interruptible
  *
  * Tries to reserve bos pointed to by the list entries for validation.
  * If the function returns 0, all buffers are marked as "unfenced",
@@ -84,9 +85,9 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  * CPU write reservations to be cleared, and for other threads to
  * unreserve their buffers.
  *
- * This function may return -ERESTART or -EAGAIN if the calling process
- * receives a signal while waiting. In that case, no buffers on the list
- * will be reserved upon return.
+ * If intr is set to true, this function may return -ERESTARTSYS if the
+ * calling process receives a signal while waiting. In that case, no
+ * buffers on the list will be reserved upon return.
  *
  * Buffers reserved by this function should be unreserved by
  * a call to either ttm_eu_backoff_reservation() or
@@ -95,7 +96,7 @@ extern void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
  */
 
 extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-				  struct list_head *list);
+				  struct list_head *list, bool intr);
 
 /**
  * function ttm_eu_fence_buffer_objects.


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 02/16] drm/ttm: kill off some members to ttm_validate_buffer
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 03/16] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

This reorders the list to keep track of what buffers are reserved,
so previous members are always unreserved.

This gets rid of some bookkeeping that's no longer needed,
while simplifying the code some.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/qxl/qxl_release.c       |    1 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c  |  142 +++++++++++--------------------
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |    1 
 include/drm/ttm/ttm_execbuf_util.h      |    3 -
 4 files changed, 50 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 2b43e5deb051..e85c4d274dc0 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -350,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
-		entry->reserved = false;
 	}
 	spin_unlock(&bdev->fence_lock);
 	spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 39a11bbd2bac..6db47a72667e 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -32,20 +32,12 @@
 #include <linux/sched.h>
 #include <linux/module.h>
 
-static void ttm_eu_backoff_reservation_locked(struct list_head *list)
+static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
+					      struct ttm_validate_buffer *entry)
 {
-	struct ttm_validate_buffer *entry;
-
-	list_for_each_entry(entry, list, head) {
+	list_for_each_entry_continue_reverse(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
-		if (!entry->reserved)
-			continue;
 
-		entry->reserved = false;
-		if (entry->removed) {
-			ttm_bo_add_to_lru(bo);
-			entry->removed = false;
-		}
 		__ttm_bo_unreserve(bo);
 	}
 }
@@ -56,27 +48,9 @@ static void ttm_eu_del_from_lru_locked(struct list_head *list)
 
 	list_for_each_entry(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
-		if (!entry->reserved)
-			continue;
+		unsigned put_count = ttm_bo_del_from_lru(bo);
 
-		if (!entry->removed) {
-			entry->put_count = ttm_bo_del_from_lru(bo);
-			entry->removed = true;
-		}
-	}
-}
-
-static void ttm_eu_list_ref_sub(struct list_head *list)
-{
-	struct ttm_validate_buffer *entry;
-
-	list_for_each_entry(entry, list, head) {
-		struct ttm_buffer_object *bo = entry->bo;
-
-		if (entry->put_count) {
-			ttm_bo_list_ref_sub(bo, entry->put_count, true);
-			entry->put_count = 0;
-		}
+		ttm_bo_list_ref_sub(bo, put_count, true);
 	}
 }
 
@@ -91,11 +65,18 @@ void ttm_eu_backoff_reservation(struct ww_acquire_ctx *ticket,
 
 	entry = list_first_entry(list, struct ttm_validate_buffer, head);
 	glob = entry->bo->glob;
+
 	spin_lock(&glob->lru_lock);
-	ttm_eu_backoff_reservation_locked(list);
+	list_for_each_entry(entry, list, head) {
+		struct ttm_buffer_object *bo = entry->bo;
+
+		ttm_bo_add_to_lru(bo);
+		__ttm_bo_unreserve(bo);
+	}
+	spin_unlock(&glob->lru_lock);
+
 	if (ticket)
 		ww_acquire_fini(ticket);
-	spin_unlock(&glob->lru_lock);
 }
 EXPORT_SYMBOL(ttm_eu_backoff_reservation);
 
@@ -121,64 +102,55 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 	if (list_empty(list))
 		return 0;
 
-	list_for_each_entry(entry, list, head) {
-		entry->reserved = false;
-		entry->put_count = 0;
-		entry->removed = false;
-	}
-
 	entry = list_first_entry(list, struct ttm_validate_buffer, head);
 	glob = entry->bo->glob;
 
 	if (ticket)
 		ww_acquire_init(ticket, &reservation_ww_class);
-retry:
+
 	list_for_each_entry(entry, list, head) {
 		struct ttm_buffer_object *bo = entry->bo;
 
-		/* already slowpath reserved? */
-		if (entry->reserved)
-			continue;
-
 		ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
 				       ticket);
+		if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
+			__ttm_bo_unreserve(bo);
 
-		if (ret == -EDEADLK) {
-			/* uh oh, we lost out, drop every reservation and try
-			 * to only reserve this buffer, then start over if
-			 * this succeeds.
-			 */
-			BUG_ON(ticket == NULL);
-			spin_lock(&glob->lru_lock);
-			ttm_eu_backoff_reservation_locked(list);
-			spin_unlock(&glob->lru_lock);
-			ttm_eu_list_ref_sub(list);
-
-			if (intr) {
-				ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
-								       ticket);
-				if (unlikely(ret != 0)) {
-					if (ret == -EINTR)
-						ret = -ERESTARTSYS;
-					goto err_fini;
-				}
-			} else
-				ww_mutex_lock_slow(&bo->resv->lock, ticket);
-
-			entry->reserved = true;
-			if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
-				ret = -EBUSY;
-				goto err;
-			}
-			goto retry;
-		} else if (ret)
-			goto err;
-
-		entry->reserved = true;
-		if (unlikely(atomic_read(&bo->cpu_writers) > 0)) {
 			ret = -EBUSY;
-			goto err;
 		}
+
+		if (!ret)
+			continue;
+
+		/* uh oh, we lost out, drop every reservation and try
+		 * to only reserve this buffer, then start over if
+		 * this succeeds.
+		 */
+		ttm_eu_backoff_reservation_reverse(list, entry);
+
+		if (ret == -EDEADLK && intr) {
+			ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+							       ticket);
+		} else if (ret == -EDEADLK) {
+			ww_mutex_lock_slow(&bo->resv->lock, ticket);
+			ret = 0;
+		}
+
+		if (unlikely(ret != 0)) {
+			if (ret == -EINTR)
+				ret = -ERESTARTSYS;
+			if (ticket) {
+				ww_acquire_done(ticket);
+				ww_acquire_fini(ticket);
+			}
+			return ret;
+		}
+
+		/* move this item to the front of the list,
+		 * forces correct iteration of the loop without keeping track
+		 */
+		list_del(&entry->head);
+		list_add(&entry->head, list);
 	}
 
 	if (ticket)
@@ -186,20 +158,7 @@ retry:
 	spin_lock(&glob->lru_lock);
 	ttm_eu_del_from_lru_locked(list);
 	spin_unlock(&glob->lru_lock);
-	ttm_eu_list_ref_sub(list);
 	return 0;
-
-err:
-	spin_lock(&glob->lru_lock);
-	ttm_eu_backoff_reservation_locked(list);
-	spin_unlock(&glob->lru_lock);
-	ttm_eu_list_ref_sub(list);
-err_fini:
-	if (ticket) {
-		ww_acquire_done(ticket);
-		ww_acquire_fini(ticket);
-	}
-	return ret;
 }
 EXPORT_SYMBOL(ttm_eu_reserve_buffers);
 
@@ -229,7 +188,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 		bo->sync_obj = driver->sync_obj_ref(sync_obj);
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
-		entry->reserved = false;
 	}
 	spin_unlock(&bdev->fence_lock);
 	spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 5d7d2e00296b..f8b25bc4e634 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -346,7 +346,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context *sw_context,
 		++sw_context->cur_val_buf;
 		val_buf = &vval_buf->base;
 		val_buf->bo = ttm_bo_reference(bo);
-		val_buf->reserved = false;
 		list_add_tail(&val_buf->head, &sw_context->validate_nodes);
 		vval_buf->validate_as_mob = validate_as_mob;
 	}
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index fd95fd569ca3..8490cb8ee0d8 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -48,9 +48,6 @@
 struct ttm_validate_buffer {
 	struct list_head head;
 	struct ttm_buffer_object *bo;
-	bool reserved;
-	bool removed;
-	int put_count;
 	void *old_sync_obj;
 };
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 03/16] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 02/16] drm/ttm: kill off some members to ttm_validate_buffer Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 04/16] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence Maarten Lankhorst
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Apart from some code inside ttm itself and nouveau_bo_vma_del,
this is the only place where ttm_bo_wait is used without a reservation.
Fix this so we can remove the fence_lock later on.

After the switch to rcu the reservation lock will be
removed again.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..6e1c58a880fe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -886,17 +886,31 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 	struct drm_gem_object *gem;
 	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-	int ret = -EINVAL;
+	int ret;
+	struct nouveau_fence *fence = NULL;
 
 	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
 	if (!gem)
 		return -ENOENT;
 	nvbo = nouveau_gem_object(gem);
 
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
+	ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
+	if (!ret) {
+		spin_lock(&nvbo->bo.bdev->fence_lock);
+		ret = ttm_bo_wait(&nvbo->bo, true, true, true);
+		if (!no_wait && ret)
+			fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+		spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+		ttm_bo_unreserve(&nvbo->bo);
+	}
 	drm_gem_object_unreference_unlocked(gem);
+
+	if (fence) {
+		ret = nouveau_fence_wait(fence, true, no_wait);
+		nouveau_fence_unref(&fence);
+	}
+
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 04/16] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (2 preceding siblings ...)
  2014-05-14 14:57 ` [RFC PATCH v1 03/16] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 05/16] drm/ttm: call ttm_bo_wait while inside a reservation Maarten Lankhorst
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

This will ensure we always hold the required lock when calling those functions.
---
 drivers/gpu/drm/nouveau/nouveau_bo.c      |    2 ++
 drivers/gpu/drm/nouveau/nouveau_display.c |   17 +++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..33eb7164525a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1431,6 +1431,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 	struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
 	struct nouveau_fence *old_fence = NULL;
 
+	lockdep_assert_held(&nvbo->bo.resv->lock.base);
+
 	spin_lock(&nvbo->bo.bdev->fence_lock);
 	old_fence = nvbo->bo.sync_obj;
 	nvbo->bo.sync_obj = new_fence;
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index da764a4ed958..61b8c3375135 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -716,6 +716,9 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 	}
 
 	mutex_lock(&chan->cli->mutex);
+	ret = ttm_bo_reserve(&new_bo->bo, true, false, false, NULL);
+	if (ret)
+		goto fail_unpin;
 
 	/* synchronise rendering channel with the kernel's channel */
 	spin_lock(&new_bo->bo.bdev->fence_lock);
@@ -723,12 +726,18 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 	spin_unlock(&new_bo->bo.bdev->fence_lock);
 	ret = nouveau_fence_sync(fence, chan);
 	nouveau_fence_unref(&fence);
-	if (ret)
+	if (ret) {
+		ttm_bo_unreserve(&new_bo->bo);
 		goto fail_unpin;
+	}
 
-	ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
-	if (ret)
-		goto fail_unpin;
+	if (new_bo != old_bo) {
+		ttm_bo_unreserve(&new_bo->bo);
+
+		ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
+		if (ret)
+			goto fail_unpin;
+	}
 
 	/* Initialize a page flip struct */
 	*s = (struct nouveau_page_flip_state)


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 05/16] drm/ttm: call ttm_bo_wait while inside a reservation
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (3 preceding siblings ...)
  2014-05-14 14:57 ` [RFC PATCH v1 04/16] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:57 ` [RFC PATCH v1 06/16] drm/ttm: kill fence_lock Maarten Lankhorst
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

This is the last remaining function that doesn't use the reservation
lock completely to fence off access to a buffer.
---
 drivers/gpu/drm/ttm/ttm_bo.c |   25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4ab9f7171c4f..d7d34336f108 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -502,17 +502,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 		if (ret)
 			return ret;
 
-		/*
-		 * remove sync_obj with ttm_bo_wait, the wait should be
-		 * finished, and no new wait object should have been added.
-		 */
-		spin_lock(&bdev->fence_lock);
-		ret = ttm_bo_wait(bo, false, false, true);
-		WARN_ON(ret);
-		spin_unlock(&bdev->fence_lock);
-		if (ret)
-			return ret;
-
 		spin_lock(&glob->lru_lock);
 		ret = __ttm_bo_reserve(bo, false, true, false, 0);
 
@@ -528,8 +517,16 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 			spin_unlock(&glob->lru_lock);
 			return 0;
 		}
-	} else
-		spin_unlock(&bdev->fence_lock);
+
+		/*
+		 * remove sync_obj with ttm_bo_wait, the wait should be
+		 * finished, and no new wait object should have been added.
+		 */
+		spin_lock(&bdev->fence_lock);
+		ret = ttm_bo_wait(bo, false, false, true);
+		WARN_ON(ret);
+	}
+	spin_unlock(&bdev->fence_lock);
 
 	if (ret || unlikely(list_empty(&bo->ddestroy))) {
 		__ttm_bo_unreserve(bo);
@@ -1539,6 +1536,8 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
 	void *sync_obj;
 	int ret = 0;
 
+	lockdep_assert_held(&bo->resv->lock.base);
+
 	if (likely(bo->sync_obj == NULL))
 		return 0;
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 06/16] drm/ttm: kill fence_lock
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (4 preceding siblings ...)
  2014-05-14 14:57 ` [RFC PATCH v1 05/16] drm/ttm: call ttm_bo_wait while inside a reservation Maarten Lankhorst
@ 2014-05-14 14:57 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 07/16] drm/nouveau: rework to new fence interface Maarten Lankhorst
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:57 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

No users are left, kill it off! :D
Conversion to the reservation api is next on the list, after
that the functionality can be restored with rcu.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c      |   25 +++-------
 drivers/gpu/drm/nouveau/nouveau_display.c |    6 --
 drivers/gpu/drm/nouveau/nouveau_gem.c     |   16 +-----
 drivers/gpu/drm/qxl/qxl_cmd.c             |    2 -
 drivers/gpu/drm/qxl/qxl_fence.c           |    4 --
 drivers/gpu/drm/qxl/qxl_object.h          |    2 -
 drivers/gpu/drm/qxl/qxl_release.c         |    2 -
 drivers/gpu/drm/radeon/radeon_display.c   |    2 -
 drivers/gpu/drm/radeon/radeon_object.c    |    2 -
 drivers/gpu/drm/ttm/ttm_bo.c              |   75 +++++++----------------------
 drivers/gpu/drm/ttm/ttm_bo_util.c         |    5 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c           |    3 -
 drivers/gpu/drm/ttm/ttm_execbuf_util.c    |    2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c    |    4 --
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   17 ++-----
 include/drm/ttm/ttm_bo_api.h              |    5 --
 include/drm/ttm/ttm_bo_driver.h           |    3 -
 17 files changed, 36 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 33eb7164525a..e98af2e9a1cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1196,9 +1196,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, bool intr,
 	}
 
 	/* Fallback to software copy. */
-	spin_lock(&bo->bdev->fence_lock);
 	ret = ttm_bo_wait(bo, true, intr, no_wait_gpu);
-	spin_unlock(&bo->bdev->fence_lock);
 	if (ret == 0)
 		ret = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
 
@@ -1425,26 +1423,19 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 	ttm_pool_unpopulate(ttm);
 }
 
+static void
+nouveau_bo_fence_unref(void **sync_obj)
+{
+	nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+}
+
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-	struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
-	struct nouveau_fence *old_fence = NULL;
-
 	lockdep_assert_held(&nvbo->bo.resv->lock.base);
 
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	old_fence = nvbo->bo.sync_obj;
-	nvbo->bo.sync_obj = new_fence;
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-	nouveau_fence_unref(&old_fence);
-}
-
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-	nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+	nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
+	nvbo->bo.sync_obj = nouveau_fence_ref(fence);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 61b8c3375135..6a0ca004bd19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -721,11 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		goto fail_unpin;
 
 	/* synchronise rendering channel with the kernel's channel */
-	spin_lock(&new_bo->bo.bdev->fence_lock);
-	fence = nouveau_fence_ref(new_bo->bo.sync_obj);
-	spin_unlock(&new_bo->bo.bdev->fence_lock);
-	ret = nouveau_fence_sync(fence, chan);
-	nouveau_fence_unref(&fence);
+	ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
 	if (ret) {
 		ttm_bo_unreserve(&new_bo->bo);
 		goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6e1c58a880fe..6cd5298cbb53 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -105,9 +105,7 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
 	list_del(&vma->head);
 
 	if (mapped) {
-		spin_lock(&nvbo->bo.bdev->fence_lock);
 		fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-		spin_unlock(&nvbo->bo.bdev->fence_lock);
 	}
 
 	if (fence) {
@@ -432,17 +430,11 @@ retry:
 static int
 validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
 {
-	struct nouveau_fence *fence = NULL;
+	struct nouveau_fence *fence = nvbo->bo.sync_obj;
 	int ret = 0;
 
-	spin_lock(&nvbo->bo.bdev->fence_lock);
-	fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-	spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-	if (fence) {
+	if (fence)
 		ret = nouveau_fence_sync(fence, chan);
-		nouveau_fence_unref(&fence);
-	}
 
 	return ret;
 }
@@ -661,9 +653,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 				data |= r->vor;
 		}
 
-		spin_lock(&nvbo->bo.bdev->fence_lock);
 		ret = ttm_bo_wait(&nvbo->bo, false, false, false);
-		spin_unlock(&nvbo->bo.bdev->fence_lock);
 		if (ret) {
 			NV_ERROR(cli, "reloc wait_idle failed: %d\n", ret);
 			break;
@@ -896,11 +886,9 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 
 	ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
 	if (!ret) {
-		spin_lock(&nvbo->bo.bdev->fence_lock);
 		ret = ttm_bo_wait(&nvbo->bo, true, true, true);
 		if (!no_wait && ret)
 			fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-		spin_unlock(&nvbo->bo.bdev->fence_lock);
 
 		ttm_bo_unreserve(&nvbo->bo);
 	}
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index eb89653a7a17..45fad7b45486 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -628,9 +628,7 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stal
 	if (stall)
 		mutex_unlock(&qdev->surf_evict_mutex);
 
-	spin_lock(&surf->tbo.bdev->fence_lock);
 	ret = ttm_bo_wait(&surf->tbo, true, true, !stall);
-	spin_unlock(&surf->tbo.bdev->fence_lock);
 
 	if (stall)
 		mutex_lock(&qdev->surf_evict_mutex);
diff --git a/drivers/gpu/drm/qxl/qxl_fence.c b/drivers/gpu/drm/qxl/qxl_fence.c
index ae59e91cfb9a..c7248418117d 100644
--- a/drivers/gpu/drm/qxl/qxl_fence.c
+++ b/drivers/gpu/drm/qxl/qxl_fence.c
@@ -60,9 +60,6 @@ int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
 {
 	void *ret;
 	int retval = 0;
-	struct qxl_bo *bo = container_of(qfence, struct qxl_bo, fence);
-
-	spin_lock(&bo->tbo.bdev->fence_lock);
 
 	ret = radix_tree_delete(&qfence->tree, rel_id);
 	if (ret == qfence)
@@ -71,7 +68,6 @@ int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
 		DRM_DEBUG("didn't find fence in radix tree for %d\n", rel_id);
 		retval = -ENOENT;
 	}
-	spin_unlock(&bo->tbo.bdev->fence_lock);
 	return retval;
 }
 
diff --git a/drivers/gpu/drm/qxl/qxl_object.h b/drivers/gpu/drm/qxl/qxl_object.h
index d458a140c024..98395b223ad0 100644
--- a/drivers/gpu/drm/qxl/qxl_object.h
+++ b/drivers/gpu/drm/qxl/qxl_object.h
@@ -76,12 +76,10 @@ static inline int qxl_bo_wait(struct qxl_bo *bo, u32 *mem_type,
 		}
 		return r;
 	}
-	spin_lock(&bo->tbo.bdev->fence_lock);
 	if (mem_type)
 		*mem_type = bo->tbo.mem.mem_type;
 	if (bo->tbo.sync_obj)
 		r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
-	spin_unlock(&bo->tbo.bdev->fence_lock);
 	ttm_bo_unreserve(&bo->tbo);
 	return r;
 }
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index e85c4d274dc0..4045ba873ab8 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -337,7 +337,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 	glob = bo->glob;
 
 	spin_lock(&glob->lru_lock);
-	spin_lock(&bdev->fence_lock);
 
 	list_for_each_entry(entry, &release->bos, head) {
 		bo = entry->bo;
@@ -351,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
 	}
-	spin_unlock(&bdev->fence_lock);
 	spin_unlock(&glob->lru_lock);
 	ww_acquire_fini(&release->ticket);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 408b6ac53f0b..6a7340289ddd 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -386,10 +386,8 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 	obj = new_radeon_fb->obj;
 	rbo = gem_to_radeon_bo(obj);
 
-	spin_lock(&rbo->tbo.bdev->fence_lock);
 	if (rbo->tbo.sync_obj)
 		work->fence = radeon_fence_ref(rbo->tbo.sync_obj);
-	spin_unlock(&rbo->tbo.bdev->fence_lock);
 
 	INIT_WORK(&work->work, radeon_unpin_work_func);
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 51bf80cdce5c..3b27cac9240b 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -725,12 +725,10 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
 	r = ttm_bo_reserve(&bo->tbo, true, no_wait, false, 0);
 	if (unlikely(r != 0))
 		return r;
-	spin_lock(&bo->tbo.bdev->fence_lock);
 	if (mem_type)
 		*mem_type = bo->tbo.mem.mem_type;
 	if (bo->tbo.sync_obj)
 		r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
-	spin_unlock(&bo->tbo.bdev->fence_lock);
 	ttm_bo_unreserve(&bo->tbo);
 	return r;
 }
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index d7d34336f108..ce0434377223 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -414,24 +414,20 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 	spin_lock(&glob->lru_lock);
 	ret = __ttm_bo_reserve(bo, false, true, false, 0);
 
-	spin_lock(&bdev->fence_lock);
-	(void) ttm_bo_wait(bo, false, false, true);
-	if (!ret && !bo->sync_obj) {
-		spin_unlock(&bdev->fence_lock);
-		put_count = ttm_bo_del_from_lru(bo);
+	if (!ret) {
+		(void) ttm_bo_wait(bo, false, false, true);
 
-		spin_unlock(&glob->lru_lock);
-		ttm_bo_cleanup_memtype_use(bo);
+		if (!bo->sync_obj) {
+			put_count = ttm_bo_del_from_lru(bo);
 
-		ttm_bo_list_ref_sub(bo, put_count, true);
+			spin_unlock(&glob->lru_lock);
+			ttm_bo_cleanup_memtype_use(bo);
 
-		return;
-	}
-	if (bo->sync_obj)
-		sync_obj = driver->sync_obj_ref(bo->sync_obj);
-	spin_unlock(&bdev->fence_lock);
+			ttm_bo_list_ref_sub(bo, put_count, true);
 
-	if (!ret) {
+			return;
+		}
+		sync_obj = driver->sync_obj_ref(bo->sync_obj);
 
 		/*
 		 * Make NO_EVICT bos immediately available to
@@ -480,7 +476,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 	int put_count;
 	int ret;
 
-	spin_lock(&bdev->fence_lock);
 	ret = ttm_bo_wait(bo, false, false, true);
 
 	if (ret && !no_wait_gpu) {
@@ -492,7 +487,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 		 * no new sync objects can be attached.
 		 */
 		sync_obj = driver->sync_obj_ref(bo->sync_obj);
-		spin_unlock(&bdev->fence_lock);
 
 		__ttm_bo_unreserve(bo);
 		spin_unlock(&glob->lru_lock);
@@ -522,11 +516,9 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 		 * remove sync_obj with ttm_bo_wait, the wait should be
 		 * finished, and no new wait object should have been added.
 		 */
-		spin_lock(&bdev->fence_lock);
 		ret = ttm_bo_wait(bo, false, false, true);
 		WARN_ON(ret);
 	}
-	spin_unlock(&bdev->fence_lock);
 
 	if (ret || unlikely(list_empty(&bo->ddestroy))) {
 		__ttm_bo_unreserve(bo);
@@ -664,9 +656,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, bool interruptible,
 	struct ttm_placement placement;
 	int ret = 0;
 
-	spin_lock(&bdev->fence_lock);
 	ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
-	spin_unlock(&bdev->fence_lock);
 
 	if (unlikely(ret != 0)) {
 		if (ret != -ERESTARTSYS) {
@@ -963,7 +953,6 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 {
 	int ret = 0;
 	struct ttm_mem_reg mem;
-	struct ttm_bo_device *bdev = bo->bdev;
 
 	lockdep_assert_held(&bo->resv->lock.base);
 
@@ -972,9 +961,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 	 * Have the driver move function wait for idle when necessary,
 	 * instead of doing it here.
 	 */
-	spin_lock(&bdev->fence_lock);
 	ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu);
-	spin_unlock(&bdev->fence_lock);
 	if (ret)
 		return ret;
 	mem.num_pages = bo->num_pages;
@@ -1474,7 +1461,6 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 	bdev->glob = glob;
 	bdev->need_dma32 = need_dma32;
 	bdev->val_seq = 0;
-	spin_lock_init(&bdev->fence_lock);
 	mutex_lock(&glob->device_list_mutex);
 	list_add_tail(&bdev->device_list, &glob->device_list);
 	mutex_unlock(&glob->device_list_mutex);
@@ -1532,7 +1518,6 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
 		bool lazy, bool interruptible, bool no_wait)
 {
 	struct ttm_bo_driver *driver = bo->bdev->driver;
-	struct ttm_bo_device *bdev = bo->bdev;
 	void *sync_obj;
 	int ret = 0;
 
@@ -1541,53 +1526,33 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
 	if (likely(bo->sync_obj == NULL))
 		return 0;
 
-	while (bo->sync_obj) {
-
+	if (bo->sync_obj) {
 		if (driver->sync_obj_signaled(bo->sync_obj)) {
-			void *tmp_obj = bo->sync_obj;
-			bo->sync_obj = NULL;
+			driver->sync_obj_unref(&bo->sync_obj);
 			clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
-			spin_unlock(&bdev->fence_lock);
-			driver->sync_obj_unref(&tmp_obj);
-			spin_lock(&bdev->fence_lock);
-			continue;
+			return 0;
 		}
 
 		if (no_wait)
 			return -EBUSY;
 
 		sync_obj = driver->sync_obj_ref(bo->sync_obj);
-		spin_unlock(&bdev->fence_lock);
 		ret = driver->sync_obj_wait(sync_obj,
 					    lazy, interruptible);
-		if (unlikely(ret != 0)) {
-			driver->sync_obj_unref(&sync_obj);
-			spin_lock(&bdev->fence_lock);
-			return ret;
-		}
-		spin_lock(&bdev->fence_lock);
-		if (likely(bo->sync_obj == sync_obj)) {
-			void *tmp_obj = bo->sync_obj;
-			bo->sync_obj = NULL;
+
+		if (likely(ret == 0)) {
 			clear_bit(TTM_BO_PRIV_FLAG_MOVING,
 				  &bo->priv_flags);
-			spin_unlock(&bdev->fence_lock);
-			driver->sync_obj_unref(&sync_obj);
-			driver->sync_obj_unref(&tmp_obj);
-			spin_lock(&bdev->fence_lock);
-		} else {
-			spin_unlock(&bdev->fence_lock);
-			driver->sync_obj_unref(&sync_obj);
-			spin_lock(&bdev->fence_lock);
+			driver->sync_obj_unref(&bo->sync_obj);
 		}
+		driver->sync_obj_unref(&sync_obj);
 	}
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(ttm_bo_wait);
 
 int ttm_bo_synccpu_write_grab(struct ttm_buffer_object *bo, bool no_wait)
 {
-	struct ttm_bo_device *bdev = bo->bdev;
 	int ret = 0;
 
 	/*
@@ -1597,9 +1562,7 @@ int ttm_bo_synccpu_write_grab(struct ttm_buffer_object *bo, bool no_wait)
 	ret = ttm_bo_reserve(bo, true, no_wait, false, 0);
 	if (unlikely(ret != 0))
 		return ret;
-	spin_lock(&bdev->fence_lock);
 	ret = ttm_bo_wait(bo, false, true, no_wait);
-	spin_unlock(&bdev->fence_lock);
 	if (likely(ret == 0))
 		atomic_inc(&bo->cpu_writers);
 	ttm_bo_unreserve(bo);
@@ -1656,9 +1619,7 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink)
 	 * Wait for GPU, then move to system cached.
 	 */
 
-	spin_lock(&bo->bdev->fence_lock);
 	ret = ttm_bo_wait(bo, false, false, false);
-	spin_unlock(&bo->bdev->fence_lock);
 
 	if (unlikely(ret != 0))
 		goto out;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..23db594e55c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -466,12 +466,10 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	drm_vma_node_reset(&fbo->vma_node);
 	atomic_set(&fbo->cpu_writers, 0);
 
-	spin_lock(&bdev->fence_lock);
 	if (bo->sync_obj)
 		fbo->sync_obj = driver->sync_obj_ref(bo->sync_obj);
 	else
 		fbo->sync_obj = NULL;
-	spin_unlock(&bdev->fence_lock);
 	kref_init(&fbo->list_kref);
 	kref_init(&fbo->kref);
 	fbo->destroy = &ttm_transfered_destroy;
@@ -657,7 +655,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 	struct ttm_buffer_object *ghost_obj;
 	void *tmp_obj = NULL;
 
-	spin_lock(&bdev->fence_lock);
 	if (bo->sync_obj) {
 		tmp_obj = bo->sync_obj;
 		bo->sync_obj = NULL;
@@ -665,7 +662,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 	bo->sync_obj = driver->sync_obj_ref(sync_obj);
 	if (evict) {
 		ret = ttm_bo_wait(bo, false, false, false);
-		spin_unlock(&bdev->fence_lock);
 		if (tmp_obj)
 			driver->sync_obj_unref(&tmp_obj);
 		if (ret)
@@ -688,7 +684,6 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 		 */
 
 		set_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
-		spin_unlock(&bdev->fence_lock);
 		if (tmp_obj)
 			driver->sync_obj_unref(&tmp_obj);
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 0ce48e5a9cb4..d05437f219e9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -45,10 +45,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
 				struct vm_area_struct *vma,
 				struct vm_fault *vmf)
 {
-	struct ttm_bo_device *bdev = bo->bdev;
 	int ret = 0;
 
-	spin_lock(&bdev->fence_lock);
 	if (likely(!test_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags)))
 		goto out_unlock;
 
@@ -82,7 +80,6 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
 			VM_FAULT_NOPAGE;
 
 out_unlock:
-	spin_unlock(&bdev->fence_lock);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 6db47a72667e..108730e9147b 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -180,7 +180,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 	glob = bo->glob;
 
 	spin_lock(&glob->lru_lock);
-	spin_lock(&bdev->fence_lock);
 
 	list_for_each_entry(entry, list, head) {
 		bo = entry->bo;
@@ -189,7 +188,6 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
 	}
-	spin_unlock(&bdev->fence_lock);
 	spin_unlock(&glob->lru_lock);
 	if (ticket)
 		ww_acquire_fini(ticket);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 6327cfc36805..4a36bb1dc525 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -829,11 +829,7 @@ static void vmw_move_notify(struct ttm_buffer_object *bo,
  */
 static void vmw_swap_notify(struct ttm_buffer_object *bo)
 {
-	struct ttm_bo_device *bdev = bo->bdev;
-
-	spin_lock(&bdev->fence_lock);
 	ttm_bo_wait(bo, false, false, false);
-	spin_unlock(&bdev->fence_lock);
 }
 
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 873613a16f72..48e47a100dea 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,12 +567,12 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
 	int ret;
 
 	if (flags & drm_vmw_synccpu_allow_cs) {
-		struct ttm_bo_device *bdev = bo->bdev;
-
-		spin_lock(&bdev->fence_lock);
-		ret = ttm_bo_wait(bo, false, true,
-				  !!(flags & drm_vmw_synccpu_dontblock));
-		spin_unlock(&bdev->fence_lock);
+		ret = ttm_bo_reserve(bo, true, !!(flags & drm_vmw_synccpu_dontblock), false, 0);
+		if (!ret) {
+			ret = ttm_bo_wait(bo, false, true,
+					  !!(flags & drm_vmw_synccpu_dontblock));
+			ttm_bo_unreserve(bo);
+		}
 		return ret;
 	}
 
@@ -1429,12 +1429,10 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
 	else
 		driver->sync_obj_ref(fence);
 
-	spin_lock(&bdev->fence_lock);
 
 	old_fence_obj = bo->sync_obj;
 	bo->sync_obj = fence;
 
-	spin_unlock(&bdev->fence_lock);
 
 	if (old_fence_obj)
 		vmw_fence_obj_unreference(&old_fence_obj);
@@ -1475,7 +1473,6 @@ void vmw_resource_move_notify(struct ttm_buffer_object *bo,
 
 	if (mem->mem_type != VMW_PL_MOB) {
 		struct vmw_resource *res, *n;
-		struct ttm_bo_device *bdev = bo->bdev;
 		struct ttm_validate_buffer val_buf;
 
 		val_buf.bo = bo;
@@ -1491,9 +1488,7 @@ void vmw_resource_move_notify(struct ttm_buffer_object *bo,
 			list_del_init(&res->mob_head);
 		}
 
-		spin_lock(&bdev->fence_lock);
 		(void) ttm_bo_wait(bo, false, false, false);
-		spin_unlock(&bdev->fence_lock);
 	}
 }
 
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index ee127ec33c60..f34d59b67218 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -227,10 +227,7 @@ struct ttm_buffer_object {
 	struct list_head io_reserve_lru;
 
 	/**
-	 * Members protected by struct buffer_object_device::fence_lock
-	 * In addition, setting sync_obj to anything else
-	 * than NULL requires bo::reserved to be held. This allows for
-	 * checking NULL while reserved but not holding the mentioned lock.
+	 * Members protected by a bo reservation.
 	 */
 
 	void *sync_obj;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..0aa6caa59415 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -518,8 +518,6 @@ struct ttm_bo_global {
  *
  * @driver: Pointer to a struct ttm_bo_driver struct setup by the driver.
  * @man: An array of mem_type_managers.
- * @fence_lock: Protects the synchronizing members on *all* bos belonging
- * to this device.
  * @vma_manager: Address space manager
  * lru_lock: Spinlock that protects the buffer+device lru lists and
  * ddestroy lists.
@@ -539,7 +537,6 @@ struct ttm_bo_device {
 	struct ttm_bo_global *glob;
 	struct ttm_bo_driver *driver;
 	struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
-	spinlock_t fence_lock;
 
 	/*
 	 * Protected by internal locks.


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 07/16] drm/nouveau: rework to new fence interface
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (5 preceding siblings ...)
  2014-05-14 14:57 ` [RFC PATCH v1 06/16] drm/ttm: kill fence_lock Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences Maarten Lankhorst
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

From: Maarten Lankhorst <maarten.lankhorst@ubuntu.com>

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/nouveau/core/core/event.c |    4 
 drivers/gpu/drm/nouveau/nouveau_bo.c      |    6 
 drivers/gpu/drm/nouveau/nouveau_display.c |    4 
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  434 ++++++++++++++++++++---------
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   20 +
 drivers/gpu/drm/nouveau/nouveau_gem.c     |   17 -
 drivers/gpu/drm/nouveau/nv04_fence.c      |    4 
 drivers/gpu/drm/nouveau/nv10_fence.c      |    4 
 drivers/gpu/drm/nouveau/nv17_fence.c      |    2 
 drivers/gpu/drm/nouveau/nv50_fence.c      |    2 
 drivers/gpu/drm/nouveau/nv84_fence.c      |   11 -
 11 files changed, 329 insertions(+), 179 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/core/event.c b/drivers/gpu/drm/nouveau/core/core/event.c
index 3f3c76581a9e..167408b72099 100644
--- a/drivers/gpu/drm/nouveau/core/core/event.c
+++ b/drivers/gpu/drm/nouveau/core/core/event.c
@@ -118,14 +118,14 @@ nouveau_event_ref(struct nouveau_eventh *handler, struct nouveau_eventh **ref)
 void
 nouveau_event_trigger(struct nouveau_event *event, int index)
 {
-	struct nouveau_eventh *handler;
+	struct nouveau_eventh *handler, *next;
 	unsigned long flags;
 
 	if (WARN_ON(index >= event->index_nr))
 		return;
 
 	spin_lock_irqsave(&event->list_lock, flags);
-	list_for_each_entry(handler, &event->index[index].list, head) {
+	list_for_each_entry_safe(handler, next, &event->index[index].list, head) {
 		if (test_bit(NVKM_EVENT_ENABLE, &handler->flags) &&
 		    handler->func(handler->priv, index) == NVKM_EVENT_DROP)
 			nouveau_event_put(handler);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e98af2e9a1cb..84aba3fa1bd0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -959,7 +959,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr,
 	}
 
 	mutex_lock_nested(&chan->cli->mutex, SINGLE_DEPTH_NESTING);
-	ret = nouveau_fence_sync(bo->sync_obj, chan);
+	ret = nouveau_fence_sync(nouveau_bo(bo), chan);
 	if (ret == 0) {
 		ret = drm->ttm.move(chan, bo, &bo->mem, new_mem);
 		if (ret == 0) {
@@ -1432,10 +1432,12 @@ nouveau_bo_fence_unref(void **sync_obj)
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-	lockdep_assert_held(&nvbo->bo.resv->lock.base);
+	struct reservation_object *resv = nvbo->bo.resv;
 
 	nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
 	nvbo->bo.sync_obj = nouveau_fence_ref(fence);
+
+	reservation_object_add_excl_fence(resv, &fence->base);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 6a0ca004bd19..eeb8762feaf0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -660,7 +660,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
 	/* Synchronize with the old framebuffer */
-	ret = nouveau_fence_sync(old_bo->bo.sync_obj, chan);
+	ret = nouveau_fence_sync(old_bo, chan);
 	if (ret)
 		goto fail;
 
@@ -721,7 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 		goto fail_unpin;
 
 	/* synchronise rendering channel with the kernel's channel */
-	ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
+	ret = nouveau_fence_sync(new_bo, chan);
 	if (ret) {
 		ttm_bo_unreserve(&new_bo->bo);
 		goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 90074d620e31..9a9e04985826 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -32,91 +32,139 @@
 #include "nouveau_drm.h"
 #include "nouveau_dma.h"
 #include "nouveau_fence.h"
+#include <trace/events/fence.h>
 
 #include <engine/fifo.h>
 
-struct fence_work {
-	struct work_struct base;
-	struct list_head head;
-	void (*func)(void *);
-	void *data;
-};
+static const struct fence_ops nouveau_fence_ops_uevent;
+static const struct fence_ops nouveau_fence_ops_legacy;
 
 static void
 nouveau_fence_signal(struct nouveau_fence *fence)
 {
-	struct fence_work *work, *temp;
+	__fence_signal(&fence->base);
+	list_del(&fence->head);
+
+	if (fence->base.ops == &nouveau_fence_ops_uevent &&
+	    fence->event.head.next) {
+		struct nouveau_event *event;
 
-	list_for_each_entry_safe(work, temp, &fence->work, head) {
-		schedule_work(&work->base);
-		list_del(&work->head);
+		list_del(&fence->event.head);
+		fence->event.head.next = NULL;
+
+		event = container_of(fence->base.lock, typeof(*event), list_lock);
+		nouveau_event_put(&fence->event);
 	}
 
-	fence->channel = NULL;
-	list_del(&fence->head);
+	fence_put(&fence->base);
+}
+
+static struct nouveau_fence *
+nouveau_local_fence(struct fence *fence, struct nouveau_drm *drm) {
+	struct nouveau_fence_priv *priv = (void*)drm->fence;
+	struct nouveau_fence *f = container_of(fence,
+					       struct nouveau_fence,
+					       base);
+
+	if (fence->ops != &nouveau_fence_ops_legacy &&
+	    fence->ops != &nouveau_fence_ops_uevent)
+		return NULL;
+
+	if (fence->context < priv->context_base ||
+	    fence->context >= priv->context_base + priv->contexts)
+		return NULL;
+
+	return f;
 }
 
 void
 nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
 {
 	struct nouveau_fence *fence, *fnext;
-	spin_lock(&fctx->lock);
-	list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
+
+	spin_lock_irq(fctx->lock);
+	list_for_each_entry_safe(fence, fnext, &fctx->pending, head)
 		nouveau_fence_signal(fence);
-	}
-	spin_unlock(&fctx->lock);
+	spin_unlock_irq(fctx->lock);
 }
 
 void
-nouveau_fence_context_new(struct nouveau_fence_chan *fctx)
+nouveau_fence_context_new(struct nouveau_channel *chan, struct nouveau_fence_chan *fctx)
 {
+	struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
+	struct nouveau_fifo_chan *fifo = (void*)chan->object;
+
+	fctx->lock = &pfifo->uevent->list_lock;
 	INIT_LIST_HEAD(&fctx->flip);
 	INIT_LIST_HEAD(&fctx->pending);
-	spin_lock_init(&fctx->lock);
+
+	snprintf(fctx->name, sizeof(fctx->name) - 1, "nouveau channel %i", fifo->chid);
 }
 
+struct nouveau_fence_work {
+	struct work_struct work;
+	struct fence_cb cb;
+	void (*func)(void *);
+	void *data;
+};
+
 static void
 nouveau_fence_work_handler(struct work_struct *kwork)
 {
-	struct fence_work *work = container_of(kwork, typeof(*work), base);
+	struct nouveau_fence_work *work = container_of(kwork, typeof(*work), work);
 	work->func(work->data);
 	kfree(work);
 }
 
+static void nouveau_fence_work_cb(struct fence *fence, struct fence_cb *cb)
+{
+	struct nouveau_fence_work *work = container_of(cb, typeof(*work), cb);
+
+	schedule_work(&work->work);
+}
+
+/*
+ * In an ideal world, read would not assume the channel context is still alive.
+ * This function may be called from another device, running into free memory as a
+ * result. The drm node should still be there, so we can derive the index from
+ * the fence context.
+ */
+static bool nouveau_fence_is_signaled(struct fence *f)
+{
+	struct nouveau_fence *fence = container_of(f, struct nouveau_fence, base);
+	struct nouveau_channel *chan = fence->channel;
+	struct nouveau_fence_chan *fctx = chan->fence;
+
+	return (int)(fctx->read(chan) - fence->base.seqno) >= 0;
+}
+
 void
 nouveau_fence_work(struct nouveau_fence *fence,
 		   void (*func)(void *), void *data)
 {
-	struct nouveau_channel *chan = fence->channel;
-	struct nouveau_fence_chan *fctx;
-	struct fence_work *work = NULL;
+	struct nouveau_fence_work *work;
 
-	if (nouveau_fence_done(fence)) {
-		func(data);
-		return;
-	}
+	if (fence_is_signaled(&fence->base))
+		goto err;
 
-	fctx = chan->fence;
 	work = kmalloc(sizeof(*work), GFP_KERNEL);
 	if (!work) {
 		WARN_ON(nouveau_fence_wait(fence, false, false));
-		func(data);
-		return;
+		goto err;
 	}
 
-	spin_lock(&fctx->lock);
-	if (!fence->channel) {
-		spin_unlock(&fctx->lock);
-		kfree(work);
-		func(data);
-		return;
-	}
-
-	INIT_WORK(&work->base, nouveau_fence_work_handler);
+	INIT_WORK(&work->work, nouveau_fence_work_handler);
 	work->func = func;
 	work->data = data;
-	list_add(&work->head, &fence->work);
-	spin_unlock(&fctx->lock);
+
+	if (fence_add_callback(&fence->base, &work->cb, nouveau_fence_work_cb) < 0)
+		goto err_free;
+	return;
+
+err_free:
+	kfree(work);
+err:
+	func(data);
 }
 
 static void
@@ -125,33 +173,45 @@ nouveau_fence_update(struct nouveau_channel *chan)
 	struct nouveau_fence_chan *fctx = chan->fence;
 	struct nouveau_fence *fence, *fnext;
 
-	spin_lock(&fctx->lock);
+	u32 seq = fctx->read(chan);
+
 	list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
-		if (fctx->read(chan) < fence->sequence)
+		if ((int)(seq - fence->base.seqno) < 0)
 			break;
 
 		nouveau_fence_signal(fence);
-		nouveau_fence_unref(&fence);
 	}
-	spin_unlock(&fctx->lock);
 }
 
 int
 nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 {
 	struct nouveau_fence_chan *fctx = chan->fence;
+	struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
+	struct nouveau_fifo_chan *fifo = (void*)chan->object;
+	struct nouveau_fence_priv *priv = (void*)chan->drm->fence;
 	int ret;
 
 	fence->channel  = chan;
 	fence->timeout  = jiffies + (15 * HZ);
-	fence->sequence = ++fctx->sequence;
 
+	if (priv->uevent)
+		__fence_init(&fence->base, &nouveau_fence_ops_uevent,
+			    &pfifo->uevent->list_lock,
+			    priv->context_base + fifo->chid, ++fctx->sequence);
+	else
+		__fence_init(&fence->base, &nouveau_fence_ops_legacy,
+			    &pfifo->uevent->list_lock,
+			    priv->context_base + fifo->chid, ++fctx->sequence);
+
+	trace_fence_emit(&fence->base);
 	ret = fctx->emit(fence);
 	if (!ret) {
-		kref_get(&fence->kref);
-		spin_lock(&fctx->lock);
+		fence_get(&fence->base);
+		spin_lock_irq(fctx->lock);
+		nouveau_fence_update(chan);
 		list_add_tail(&fence->head, &fctx->pending);
-		spin_unlock(&fctx->lock);
+		spin_unlock_irq(fctx->lock);
 	}
 
 	return ret;
@@ -160,104 +220,71 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 bool
 nouveau_fence_done(struct nouveau_fence *fence)
 {
-	if (fence->channel)
+	if (fence->base.ops == &nouveau_fence_ops_legacy ||
+	    fence->base.ops == &nouveau_fence_ops_uevent) {
+		struct nouveau_fence_chan *fctx;
+		unsigned long flags;
+
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
+			return true;
+
+		fctx = fence->channel->fence;
+		spin_lock_irqsave(fctx->lock, flags);
 		nouveau_fence_update(fence->channel);
-	return !fence->channel;
+		spin_unlock_irqrestore(fctx->lock, flags);
+	}
+	return fence_is_signaled(&fence->base);
 }
 
-static int
-nouveau_fence_wait_uevent_handler(void *data, int index)
+static long
+nouveau_fence_wait_legacy(struct fence *f, bool intr, long wait)
 {
-	struct nouveau_fence_priv *priv = data;
-	wake_up_all(&priv->waiting);
-	return NVKM_EVENT_KEEP;
-}
+	struct nouveau_fence *fence = container_of(f, typeof(*fence), base);
+	unsigned long sleep_time = NSEC_PER_MSEC / 1000;
+	unsigned long t = jiffies, timeout = t + wait;
 
-static int
-nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
+	while (!nouveau_fence_done(fence)) {
+		ktime_t kt;
 
-{
-	struct nouveau_channel *chan = fence->channel;
-	struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);
-	struct nouveau_fence_priv *priv = chan->drm->fence;
-	struct nouveau_eventh *handler;
-	int ret = 0;
+		t = jiffies;
 
-	ret = nouveau_event_new(pfifo->uevent, 0,
-				nouveau_fence_wait_uevent_handler,
-				priv, &handler);
-	if (ret)
-		return ret;
+		if (wait != MAX_SCHEDULE_TIMEOUT && time_after_eq(t, timeout)) {
+			__set_current_state(TASK_RUNNING);
+			return 0;
+		}
 
-	nouveau_event_get(handler);
+		__set_current_state(intr ? TASK_INTERRUPTIBLE :
+					   TASK_UNINTERRUPTIBLE);
 
-	if (fence->timeout) {
-		unsigned long timeout = fence->timeout - jiffies;
-
-		if (time_before(jiffies, fence->timeout)) {
-			if (intr) {
-				ret = wait_event_interruptible_timeout(
-						priv->waiting,
-						nouveau_fence_done(fence),
-						timeout);
-			} else {
-				ret = wait_event_timeout(priv->waiting,
-						nouveau_fence_done(fence),
-						timeout);
-			}
-		}
+		kt = ktime_set(0, sleep_time);
+		schedule_hrtimeout(&kt, HRTIMER_MODE_REL);
+		sleep_time *= 2;
+		if (sleep_time > NSEC_PER_MSEC)
+			sleep_time = NSEC_PER_MSEC;
 
-		if (ret >= 0) {
-			fence->timeout = jiffies + ret;
-			if (time_after_eq(jiffies, fence->timeout))
-				ret = -EBUSY;
-		}
-	} else {
-		if (intr) {
-			ret = wait_event_interruptible(priv->waiting,
-					nouveau_fence_done(fence));
-		} else {
-			wait_event(priv->waiting, nouveau_fence_done(fence));
-		}
+		if (intr && signal_pending(current))
+			return -ERESTARTSYS;
 	}
 
-	nouveau_event_ref(NULL, &handler);
-	if (unlikely(ret < 0))
-		return ret;
+	__set_current_state(TASK_RUNNING);
 
-	return 0;
+	return timeout - t;
 }
 
-int
-nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
+static int
+nouveau_fence_wait_busy(struct nouveau_fence *fence, bool intr)
 {
-	struct nouveau_channel *chan = fence->channel;
-	struct nouveau_fence_priv *priv = chan ? chan->drm->fence : NULL;
-	unsigned long sleep_time = NSEC_PER_MSEC / 1000;
-	ktime_t t;
 	int ret = 0;
 
-	while (priv && priv->uevent && lazy && !nouveau_fence_done(fence)) {
-		ret = nouveau_fence_wait_uevent(fence, intr);
-		if (ret < 0)
-			return ret;
-	}
-
 	while (!nouveau_fence_done(fence)) {
-		if (fence->timeout && time_after_eq(jiffies, fence->timeout)) {
+		if (time_after_eq(jiffies, fence->timeout)) {
 			ret = -EBUSY;
 			break;
 		}
 
-		__set_current_state(intr ? TASK_INTERRUPTIBLE :
-					   TASK_UNINTERRUPTIBLE);
-		if (lazy) {
-			t = ktime_set(0, sleep_time);
-			schedule_hrtimeout(&t, HRTIMER_MODE_REL);
-			sleep_time *= 2;
-			if (sleep_time > NSEC_PER_MSEC)
-				sleep_time = NSEC_PER_MSEC;
-		}
+		__set_current_state(intr ?
+				    TASK_INTERRUPTIBLE :
+				    TASK_UNINTERRUPTIBLE);
 
 		if (intr && signal_pending(current)) {
 			ret = -ERESTARTSYS;
@@ -270,36 +297,79 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
 }
 
 int
-nouveau_fence_sync(struct nouveau_fence *fence, struct nouveau_channel *chan)
+nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
+{
+	long ret;
+
+	if (!lazy)
+		return nouveau_fence_wait_busy(fence, intr);
+
+	ret = fence_wait_timeout(&fence->base, intr, 15 * HZ);
+	if (ret < 0)
+		return ret;
+	else if (!ret)
+		return -EBUSY;
+	else
+		return 0;
+}
+
+int
+nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
 {
 	struct nouveau_fence_chan *fctx = chan->fence;
-	struct nouveau_channel *prev;
-	int ret = 0;
+	struct fence *fence = NULL;
+	struct reservation_object *resv = nvbo->bo.resv;
+	struct reservation_object_list *fobj;
+	int ret = 0, i;
+
+	fence = nvbo->bo.sync_obj;
+	if (fence && fence_is_signaled(fence)) {
+		nouveau_fence_unref((struct nouveau_fence **)
+				    &nvbo->bo.sync_obj);
+		fence = NULL;
+	}
+
+	if (fence) {
+		struct nouveau_fence *f = container_of(fence,
+						       struct nouveau_fence,
+						       base);
+		struct nouveau_channel *prev = f->channel;
 
-	prev = fence ? fence->channel : NULL;
-	if (prev) {
-		if (unlikely(prev != chan && !nouveau_fence_done(fence))) {
-			ret = fctx->sync(fence, prev, chan);
+		if (prev != chan) {
+			ret = fctx->sync(f, prev, chan);
 			if (unlikely(ret))
-				ret = nouveau_fence_wait(fence, true, false);
+				ret = nouveau_fence_wait(f, true, true);
 		}
 	}
 
-	return ret;
-}
+	if (ret)
+		return ret;
 
-static void
-nouveau_fence_del(struct kref *kref)
-{
-	struct nouveau_fence *fence = container_of(kref, typeof(*fence), kref);
-	kfree(fence);
+	fence = reservation_object_get_excl(resv);
+	if (fence && !nouveau_local_fence(fence, chan->drm))
+		ret = fence_wait(fence, true);
+
+	fobj = reservation_object_get_list(resv);
+	if (!fobj || ret)
+		return ret;
+
+	for (i = 0; i < fobj->shared_count && !ret; ++i) {
+		fence = rcu_dereference_protected(fobj->shared[i],
+						reservation_object_held(resv));
+
+		/* should always be true, for now */
+		if (!nouveau_local_fence(fence, chan->drm))
+			ret = fence_wait(fence, true);
+	}
+
+	return ret;
 }
 
 void
 nouveau_fence_unref(struct nouveau_fence **pfence)
 {
 	if (*pfence)
-		kref_put(&(*pfence)->kref, nouveau_fence_del);
+		fence_put(&(*pfence)->base);
 	*pfence = NULL;
 }
 
@@ -307,7 +377,7 @@ struct nouveau_fence *
 nouveau_fence_ref(struct nouveau_fence *fence)
 {
 	if (fence)
-		kref_get(&fence->kref);
+		fence_get(&fence->base);
 	return fence;
 }
 
@@ -325,9 +395,7 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
 	if (!fence)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&fence->work);
 	fence->sysmem = sysmem;
-	kref_init(&fence->kref);
 
 	ret = nouveau_fence_emit(fence, chan);
 	if (ret)
@@ -336,3 +404,85 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sysmem,
 	*pfence = fence;
 	return ret;
 }
+
+
+static bool nouveau_fence_no_signaling(struct fence *f)
+{
+	/*
+	 * This needs uevents to work correctly, but fence_add_callback relies on
+	 * being able to enable signaling. It will still get signaled eventually,
+	 * just not right away.
+	 */
+	if (nouveau_fence_is_signaled(f))
+		return false;
+
+	return true;
+}
+
+static const char *nouveau_fence_get_get_driver_name(struct fence *fence)
+{
+	return "nouveau";
+}
+
+static const char *nouveau_fence_get_timeline_name(struct fence *f)
+{
+	struct nouveau_fence *fence =
+		container_of(f, struct nouveau_fence, base);
+	struct nouveau_fence_chan *fctx = fence->channel->fence;
+
+	return fctx ? fctx->name : "dead channel";
+}
+
+static const struct fence_ops nouveau_fence_ops_legacy = {
+	.get_driver_name = nouveau_fence_get_get_driver_name,
+	.get_timeline_name = nouveau_fence_get_timeline_name,
+	.enable_signaling = nouveau_fence_no_signaling,
+	.signaled = nouveau_fence_is_signaled,
+	.wait = nouveau_fence_wait_legacy,
+	.release = NULL
+};
+
+static int
+nouveau_fence_wait_uevent_handler(void *priv, int index)
+{
+	struct nouveau_fence *fence = priv;
+
+	if (nouveau_fence_is_signaled(&fence->base))
+		nouveau_fence_signal(fence);
+
+	/*
+	 * NVKM_EVENT_DROP is never appropriate here, nouveau_fence_signal
+	 * will unlink and free the event if needed.
+	 */
+	return NVKM_EVENT_KEEP;
+}
+
+static bool nouveau_fence_enable_signaling(struct fence *f)
+{
+	struct nouveau_fence *fence = container_of(f, struct nouveau_fence, base);
+	struct nouveau_event *event = container_of(f->lock, struct nouveau_event, list_lock);
+	struct nouveau_eventh *handler = &fence->event;
+
+	handler->event = event;
+	handler->func = nouveau_fence_wait_uevent_handler;
+	handler->priv = fence;
+
+	nouveau_event_get(handler);
+	if (nouveau_fence_is_signaled(f)) {
+		nouveau_event_put(handler);
+		return false;
+	}
+
+	list_add_tail(&handler->head, &event->index[0].list);
+
+	return true;
+}
+
+static const struct fence_ops nouveau_fence_ops_uevent = {
+	.get_driver_name = nouveau_fence_get_get_driver_name,
+	.get_timeline_name = nouveau_fence_get_timeline_name,
+	.enable_signaling = nouveau_fence_enable_signaling,
+	.signaled = nouveau_fence_is_signaled,
+	.wait = fence_default_wait,
+	.release = NULL
+};
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index c57bb61da58c..1989ec22e66e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -1,18 +1,21 @@
 #ifndef __NOUVEAU_FENCE_H__
 #define __NOUVEAU_FENCE_H__
 
+#include <linux/fence.h>
+
 struct nouveau_drm;
+struct nouveau_bo;
 
 struct nouveau_fence {
+	struct fence base;
+
 	struct list_head head;
-	struct list_head work;
-	struct kref kref;
+	struct nouveau_eventh event;
 
 	bool sysmem;
 
 	struct nouveau_channel *channel;
 	unsigned long timeout;
-	u32 sequence;
 };
 
 int  nouveau_fence_new(struct nouveau_channel *, bool sysmem,
@@ -25,7 +28,7 @@ int  nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
 bool nouveau_fence_done(struct nouveau_fence *);
 void nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *);
 int  nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
-int  nouveau_fence_sync(struct nouveau_fence *, struct nouveau_channel *);
+int  nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *);
 
 struct nouveau_fence_chan {
 	struct list_head pending;
@@ -38,8 +41,10 @@ struct nouveau_fence_chan {
 	int  (*emit32)(struct nouveau_channel *, u64, u32);
 	int  (*sync32)(struct nouveau_channel *, u64, u32);
 
-	spinlock_t lock;
+	spinlock_t *lock;
 	u32 sequence;
+	u32 context;
+	char name[24];
 };
 
 struct nouveau_fence_priv {
@@ -49,13 +54,14 @@ struct nouveau_fence_priv {
 	int  (*context_new)(struct nouveau_channel *);
 	void (*context_del)(struct nouveau_channel *);
 
-	wait_queue_head_t waiting;
 	bool uevent;
+
+	u32 contexts, context_base;
 };
 
 #define nouveau_fence(drm) ((struct nouveau_fence_priv *)(drm)->fence)
 
-void nouveau_fence_context_new(struct nouveau_fence_chan *);
+void nouveau_fence_context_new(struct nouveau_channel *, struct nouveau_fence_chan *);
 void nouveau_fence_context_del(struct nouveau_fence_chan *);
 
 int nv04_fence_create(struct nouveau_drm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6cd5298cbb53..a61530becfb9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -428,18 +428,6 @@ retry:
 }
 
 static int
-validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
-{
-	struct nouveau_fence *fence = nvbo->bo.sync_obj;
-	int ret = 0;
-
-	if (fence)
-		ret = nouveau_fence_sync(fence, chan);
-
-	return ret;
-}
-
-static int
 validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 	      struct list_head *list, struct drm_nouveau_gem_pushbuf_bo *pbbo,
 	      uint64_t user_pbbo_ptr)
@@ -468,9 +456,10 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 			return ret;
 		}
 
-		ret = validate_sync(chan, nvbo);
+		ret = nouveau_fence_sync(nvbo, chan);
 		if (unlikely(ret)) {
-			NV_ERROR(cli, "fail post-validate sync\n");
+			if (ret != -ERESTARTSYS)
+				NV_ERROR(cli, "fail post-validate sync\n");
 			return ret;
 		}
 
diff --git a/drivers/gpu/drm/nouveau/nv04_fence.c b/drivers/gpu/drm/nouveau/nv04_fence.c
index 94eadd1dd10a..997c54122ed9 100644
--- a/drivers/gpu/drm/nouveau/nv04_fence.c
+++ b/drivers/gpu/drm/nouveau/nv04_fence.c
@@ -43,7 +43,7 @@ nv04_fence_emit(struct nouveau_fence *fence)
 	int ret = RING_SPACE(chan, 2);
 	if (ret == 0) {
 		BEGIN_NV04(chan, NvSubSw, 0x0150, 1);
-		OUT_RING  (chan, fence->sequence);
+		OUT_RING  (chan, fence->base.seqno);
 		FIRE_RING (chan);
 	}
 	return ret;
@@ -77,7 +77,7 @@ nv04_fence_context_new(struct nouveau_channel *chan)
 {
 	struct nv04_fence_chan *fctx = kzalloc(sizeof(*fctx), GFP_KERNEL);
 	if (fctx) {
-		nouveau_fence_context_new(&fctx->base);
+		nouveau_fence_context_new(chan, &fctx->base);
 		fctx->base.emit = nv04_fence_emit;
 		fctx->base.sync = nv04_fence_sync;
 		fctx->base.read = nv04_fence_read;
diff --git a/drivers/gpu/drm/nouveau/nv10_fence.c b/drivers/gpu/drm/nouveau/nv10_fence.c
index 06f434f03fba..e8f73f7f31ef 100644
--- a/drivers/gpu/drm/nouveau/nv10_fence.c
+++ b/drivers/gpu/drm/nouveau/nv10_fence.c
@@ -36,7 +36,7 @@ nv10_fence_emit(struct nouveau_fence *fence)
 	int ret = RING_SPACE(chan, 2);
 	if (ret == 0) {
 		BEGIN_NV04(chan, 0, NV10_SUBCHAN_REF_CNT, 1);
-		OUT_RING  (chan, fence->sequence);
+		OUT_RING  (chan, fence->base.seqno);
 		FIRE_RING (chan);
 	}
 	return ret;
@@ -74,7 +74,7 @@ nv10_fence_context_new(struct nouveau_channel *chan)
 	if (!fctx)
 		return -ENOMEM;
 
-	nouveau_fence_context_new(&fctx->base);
+	nouveau_fence_context_new(chan, &fctx->base);
 	fctx->base.emit = nv10_fence_emit;
 	fctx->base.read = nv10_fence_read;
 	fctx->base.sync = nv10_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv17_fence.c b/drivers/gpu/drm/nouveau/nv17_fence.c
index 22aa9963ea6f..e404bab31e9d 100644
--- a/drivers/gpu/drm/nouveau/nv17_fence.c
+++ b/drivers/gpu/drm/nouveau/nv17_fence.c
@@ -83,7 +83,7 @@ nv17_fence_context_new(struct nouveau_channel *chan)
 	if (!fctx)
 		return -ENOMEM;
 
-	nouveau_fence_context_new(&fctx->base);
+	nouveau_fence_context_new(chan, &fctx->base);
 	fctx->base.emit = nv10_fence_emit;
 	fctx->base.read = nv10_fence_read;
 	fctx->base.sync = nv17_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv50_fence.c b/drivers/gpu/drm/nouveau/nv50_fence.c
index 0ee363840035..19f6fccb84a1 100644
--- a/drivers/gpu/drm/nouveau/nv50_fence.c
+++ b/drivers/gpu/drm/nouveau/nv50_fence.c
@@ -47,7 +47,7 @@ nv50_fence_context_new(struct nouveau_channel *chan)
 	if (!fctx)
 		return -ENOMEM;
 
-	nouveau_fence_context_new(&fctx->base);
+	nouveau_fence_context_new(chan, &fctx->base);
 	fctx->base.emit = nv10_fence_emit;
 	fctx->base.read = nv10_fence_read;
 	fctx->base.sync = nv17_fence_sync;
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c
index 9fd475c89820..8a06727b23d1 100644
--- a/drivers/gpu/drm/nouveau/nv84_fence.c
+++ b/drivers/gpu/drm/nouveau/nv84_fence.c
@@ -89,7 +89,7 @@ nv84_fence_emit(struct nouveau_fence *fence)
 	else
 		addr += fctx->vma.offset;
 
-	return fctx->base.emit32(chan, addr, fence->sequence);
+	return fctx->base.emit32(chan, addr, fence->base.seqno);
 }
 
 static int
@@ -105,7 +105,7 @@ nv84_fence_sync(struct nouveau_fence *fence,
 	else
 		addr += fctx->vma.offset;
 
-	return fctx->base.sync32(chan, addr, fence->sequence);
+	return fctx->base.sync32(chan, addr, fence->base.seqno);
 }
 
 static u32
@@ -149,12 +149,14 @@ nv84_fence_context_new(struct nouveau_channel *chan)
 	if (!fctx)
 		return -ENOMEM;
 
-	nouveau_fence_context_new(&fctx->base);
+	nouveau_fence_context_new(chan, &fctx->base);
 	fctx->base.emit = nv84_fence_emit;
 	fctx->base.sync = nv84_fence_sync;
 	fctx->base.read = nv84_fence_read;
 	fctx->base.emit32 = nv84_fence_emit32;
 	fctx->base.sync32 = nv84_fence_sync32;
+	fctx->base.sequence = nv84_fence_read(chan);
+	fctx->base.context = priv->base.context_base + fifo->chid;
 
 	ret = nouveau_bo_vma_add(priv->bo, client->vm, &fctx->vma);
 	if (ret == 0) {
@@ -239,7 +241,8 @@ nv84_fence_create(struct nouveau_drm *drm)
 	priv->base.context_new = nv84_fence_context_new;
 	priv->base.context_del = nv84_fence_context_del;
 
-	init_waitqueue_head(&priv->base.waiting);
+	priv->base.contexts = pfifo->max + 1;
+	priv->base.context_base = fence_context_alloc(priv->base.contexts);
 	priv->base.uevent = true;
 
 	ret = nouveau_bo_new(drm->dev, 16 * (pfifo->max + 1), 0,


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (6 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 07/16] drm/nouveau: rework to new fence interface Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 15:29   ` Christian König
  2014-05-14 14:58 ` [RFC PATCH v1 09/16] drm/qxl: rework to new fence interface Maarten Lankhorst
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/radeon/radeon.h        |   15 +--
 drivers/gpu/drm/radeon/radeon_device.c |    1 
 drivers/gpu/drm/radeon/radeon_fence.c  |  189 +++++++++++++++++++++++++-------
 3 files changed, 153 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include <linux/wait.h>
 #include <linux/list.h>
 #include <linux/kref.h>
+#include <linux/fence.h>
 
 #include <ttm/ttm_bo_api.h>
 #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
 #define RADEONFB_CONN_LIMIT			4
 #define RADEON_BIOS_NUM_SCRATCH			8
 
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {
+	struct fence base;
+
 	struct radeon_device		*rdev;
-	struct kref			kref;
 	/* protected by radeon_fence.lock */
 	uint64_t			seq;
 	/* RB, DMA, etc. */
 	unsigned			ring;
+
+	wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2256,6 +2257,7 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	bool				ib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..501d0cf9eb8b 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	for (i = 0; i < RADEON_NUM_RINGS; i++) {
 		rdev->ring[i].idx = i;
 	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
 		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..bc844f300d3f 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
 #include "radeon.h"
 #include "radeon_trace.h"
 
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
 /*
  * Fences
  * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
 		      struct radeon_fence **fence,
 		      int ring)
 {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
 	/* we are protected by the ring emission mutex */
 	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
 	if ((*fence) == NULL) {
 		return -ENOMEM;
 	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
 	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
 	radeon_fence_ring_emit(rdev, ring, *fence);
 	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
 	return 0;
 }
 
 /**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
  *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
  */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
 {
 	uint64_t seq, last_seq, last_emitted;
 	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 		}
 	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
 
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
 }
 
 /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
  *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
  *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
  */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
 {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
 }
 
 /**
@@ -237,6 +270,49 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
 	return false;
 }
 
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+
+	return radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring);
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+
+	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !fence->rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(fence->rdev, fence->ring);
+
+	if (__radeon_fence_process(fence->rdev, fence->ring))
+		wake_up_all_locked(&fence->rdev->fence_queue);
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
 /**
  * radeon_fence_signaled - check if a fence has signaled
  *
@@ -250,11 +326,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 	if (!fence) {
 		return true;
 	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
 	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
 		return true;
 	}
 	return false;
@@ -386,7 +464,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  * radeon_fence_wait - wait for a fence to signal
  *
  * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
  *
  * Wait for the requested fence to signal (all asics).
  * @intr selects whether to use interruptable (true) or non-interruptable
@@ -398,20 +476,17 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	uint64_t seq[RADEON_NUM_RINGS] = {};
 	int r;
 
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
 		return 0;
 
+	seq[fence->ring] = fence->seq;
 	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	if (r) {
 		return r;
-
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	}
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
 	return 0;
 }
 
@@ -443,12 +518,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
 			continue;
 		}
 
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
 		seq[i] = fences[i]->seq;
 		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
 	}
 
 	/* nothing to wait for ? */
@@ -525,7 +601,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  */
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
 {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
 	return fence;
 }
 
@@ -541,9 +617,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
 	struct radeon_fence *tmp = *fence;
 
 	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
 }
 
 /**
@@ -832,3 +907,31 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
 	return 0;
 #endif
 }
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = fence_default_wait,
+	.release = NULL,
+};


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 09/16] drm/qxl: rework to new fence interface
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (7 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 10/16] drm/vmwgfx: get rid of different types of fence_flags entirely Maarten Lankhorst
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Final driver! \o/

This is not a proper dma_fence because the hardware may never signal
anything, so don't use dma-buf with qxl, ever.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/qxl/Makefile      |    2 
 drivers/gpu/drm/qxl/qxl_cmd.c     |    5 -
 drivers/gpu/drm/qxl/qxl_debugfs.c |   12 ++-
 drivers/gpu/drm/qxl/qxl_drv.h     |   22 ++---
 drivers/gpu/drm/qxl/qxl_fence.c   |   87 -------------------
 drivers/gpu/drm/qxl/qxl_kms.c     |    2 
 drivers/gpu/drm/qxl/qxl_object.c  |    2 
 drivers/gpu/drm/qxl/qxl_release.c |  166 ++++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/qxl/qxl_ttm.c     |   97 ++++++++++++----------
 9 files changed, 220 insertions(+), 175 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

diff --git a/drivers/gpu/drm/qxl/Makefile b/drivers/gpu/drm/qxl/Makefile
index ea046ba691d2..ac0d74852e11 100644
--- a/drivers/gpu/drm/qxl/Makefile
+++ b/drivers/gpu/drm/qxl/Makefile
@@ -4,6 +4,6 @@
 
 ccflags-y := -Iinclude/drm
 
-qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o qxl_ioctl.o qxl_fence.o qxl_release.o
+qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o qxl_ioctl.o qxl_release.o
 
 obj-$(CONFIG_DRM_QXL)+= qxl.o
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 45fad7b45486..97823644d347 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -620,11 +620,6 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct qxl_bo *surf, bool stal
 	if (ret == -EBUSY)
 		return -EBUSY;
 
-	if (surf->fence.num_active_releases > 0 && stall == false) {
-		qxl_bo_unreserve(surf);
-		return -EBUSY;
-	}
-
 	if (stall)
 		mutex_unlock(&qdev->surf_evict_mutex);
 
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index c3c2bbdc6674..0d144e0646d6 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -57,11 +57,21 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
 	struct qxl_device *qdev = node->minor->dev->dev_private;
 	struct qxl_bo *bo;
 
+	spin_lock(&qdev->release_lock);
 	list_for_each_entry(bo, &qdev->gem.objects, list) {
+		struct reservation_object_list *fobj;
+		int rel;
+
+		rcu_read_lock();
+		fobj = rcu_dereference(bo->tbo.resv->fence);
+		rel = fobj ? fobj->shared_count : 0;
+		rcu_read_unlock();
+
 		seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
 			   (unsigned long)bo->gem_base.size, bo->pin_count,
-			   bo->tbo.sync_obj, bo->fence.num_active_releases);
+			   bo->tbo.sync_obj, rel);
 	}
+	spin_unlock(&qdev->release_lock);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 36ed40ba773f..d547cbdebeb4 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -31,6 +31,7 @@
  * Definitions taken from spice-protocol, plus kernel driver specific bits.
  */
 
+#include <linux/fence.h>
 #include <linux/workqueue.h>
 #include <linux/firmware.h>
 #include <linux/platform_device.h>
@@ -95,13 +96,6 @@ enum {
 	QXL_INTERRUPT_IO_CMD |\
 	QXL_INTERRUPT_CLIENT_MONITORS_CONFIG)
 
-struct qxl_fence {
-	struct qxl_device *qdev;
-	uint32_t num_active_releases;
-	uint32_t *release_ids;
-	struct radix_tree_root tree;
-};
-
 struct qxl_bo {
 	/* Protected by gem.mutex */
 	struct list_head		list;
@@ -113,13 +107,13 @@ struct qxl_bo {
 	unsigned			pin_count;
 	void				*kptr;
 	int                             type;
+
 	/* Constant after initialization */
 	struct drm_gem_object		gem_base;
 	bool is_primary; /* is this now a primary surface */
 	bool hw_surf_alloc;
 	struct qxl_surface surf;
 	uint32_t surface_id;
-	struct qxl_fence fence; /* per bo fence  - list of releases */
 	struct qxl_release *surf_create;
 };
 #define gem_to_qxl_bo(gobj) container_of((gobj), struct qxl_bo, gem_base)
@@ -191,6 +185,8 @@ enum {
  * spice-protocol/qxl_dev.h */
 #define QXL_MAX_RES 96
 struct qxl_release {
+	struct fence base;
+
 	int id;
 	int type;
 	uint32_t release_offset;
@@ -284,7 +280,11 @@ struct qxl_device {
 	uint8_t		slot_gen_bits;
 	uint64_t	va_slot_mask;
 
+	/* XXX: when rcu becomes available, release_lock can be killed */
+	spinlock_t	release_lock;
+	spinlock_t	fence_lock;
 	struct idr	release_idr;
+	uint32_t	release_seqno;
 	spinlock_t release_idr_lock;
 	struct mutex	async_io_mutex;
 	unsigned int last_sent_io_cmd;
@@ -561,10 +561,4 @@ qxl_surface_lookup(struct drm_device *dev, int surface_id);
 void qxl_surface_evict(struct qxl_device *qdev, struct qxl_bo *surf, bool freeing);
 int qxl_update_surface(struct qxl_device *qdev, struct qxl_bo *surf);
 
-/* qxl_fence.c */
-void qxl_fence_add_release_locked(struct qxl_fence *qfence, uint32_t rel_id);
-int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id);
-int qxl_fence_init(struct qxl_device *qdev, struct qxl_fence *qfence);
-void qxl_fence_fini(struct qxl_fence *qfence);
-
 #endif
diff --git a/drivers/gpu/drm/qxl/qxl_fence.c b/drivers/gpu/drm/qxl/qxl_fence.c
deleted file mode 100644
index c7248418117d..000000000000
--- a/drivers/gpu/drm/qxl/qxl_fence.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/*
- * Copyright 2013 Red Hat Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- * Authors: Dave Airlie
- *          Alon Levy
- */
-
-
-#include "qxl_drv.h"
-
-/* QXL fencing-
-
-   When we submit operations to the GPU we pass a release reference to the GPU
-   with them, the release reference is then added to the release ring when
-   the GPU is finished with that particular operation and has removed it from
-   its tree.
-
-   So we have can have multiple outstanding non linear fences per object.
-
-   From a TTM POV we only care if the object has any outstanding releases on
-   it.
-
-   we wait until all outstanding releases are processeed.
-
-   sync object is just a list of release ids that represent that fence on
-   that buffer.
-
-   we just add new releases onto the sync object attached to the object.
-
-   This currently uses a radix tree to store the list of release ids.
-
-   For some reason every so often qxl hw fails to release, things go wrong.
-*/
-/* must be called with the fence lock held */
-void qxl_fence_add_release_locked(struct qxl_fence *qfence, uint32_t rel_id)
-{
-	radix_tree_insert(&qfence->tree, rel_id, qfence);
-	qfence->num_active_releases++;
-}
-
-int qxl_fence_remove_release(struct qxl_fence *qfence, uint32_t rel_id)
-{
-	void *ret;
-	int retval = 0;
-
-	ret = radix_tree_delete(&qfence->tree, rel_id);
-	if (ret == qfence)
-		qfence->num_active_releases--;
-	else {
-		DRM_DEBUG("didn't find fence in radix tree for %d\n", rel_id);
-		retval = -ENOENT;
-	}
-	return retval;
-}
-
-
-int qxl_fence_init(struct qxl_device *qdev, struct qxl_fence *qfence)
-{
-	qfence->qdev = qdev;
-	qfence->num_active_releases = 0;
-	INIT_RADIX_TREE(&qfence->tree, GFP_ATOMIC);
-	return 0;
-}
-
-void qxl_fence_fini(struct qxl_fence *qfence)
-{
-	kfree(qfence->release_ids);
-	qfence->num_active_releases = 0;
-}
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index fd88eb4a3f79..a9e7c30e92c5 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -223,6 +223,8 @@ static int qxl_device_init(struct qxl_device *qdev,
 
 	idr_init(&qdev->release_idr);
 	spin_lock_init(&qdev->release_idr_lock);
+	spin_lock_init(&qdev->release_lock);
+	spin_lock_init(&qdev->fence_lock);
 
 	idr_init(&qdev->surf_id_idr);
 	spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
index b95f144f0b49..9981962451d7 100644
--- a/drivers/gpu/drm/qxl/qxl_object.c
+++ b/drivers/gpu/drm/qxl/qxl_object.c
@@ -36,7 +36,6 @@ static void qxl_ttm_bo_destroy(struct ttm_buffer_object *tbo)
 	qdev = (struct qxl_device *)bo->gem_base.dev->dev_private;
 
 	qxl_surface_evict(qdev, bo, false);
-	qxl_fence_fini(&bo->fence);
 	mutex_lock(&qdev->gem.mutex);
 	list_del_init(&bo->list);
 	mutex_unlock(&qdev->gem.mutex);
@@ -99,7 +98,6 @@ int qxl_bo_create(struct qxl_device *qdev,
 	bo->type = domain;
 	bo->pin_count = pinned ? 1 : 0;
 	bo->surface_id = 0;
-	qxl_fence_init(qdev, &bo->fence);
 	INIT_LIST_HEAD(&bo->list);
 
 	if (surf)
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 4045ba873ab8..3b1398d735f4 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -21,6 +21,7 @@
  */
 #include "qxl_drv.h"
 #include "qxl_object.h"
+#include <trace/events/fence.h>
 
 /*
  * drawable cmd cache - allocate a bunch of VRAM pages, suballocate
@@ -39,6 +40,88 @@
 static const int release_size_per_bo[] = { RELEASE_SIZE, SURFACE_RELEASE_SIZE, RELEASE_SIZE };
 static const int releases_per_bo[] = { RELEASES_PER_BO, SURFACE_RELEASES_PER_BO, RELEASES_PER_BO };
 
+static const char *qxl_get_driver_name(struct fence *fence)
+{
+	return "qxl";
+}
+
+static const char *qxl_get_timeline_name(struct fence *fence)
+{
+	return "release";
+}
+
+static bool qxl_nop_signaling(struct fence *fence)
+{
+	/* fences are always automatically signaled, so just pretend we did this.. */
+	return true;
+}
+
+static long qxl_fence_wait(struct fence *fence, bool intr, signed long timeout)
+{
+	struct qxl_device *qdev;
+	struct qxl_release *release;
+	int count = 0, sc = 0;
+	bool have_drawable_releases;
+	unsigned long cur, end = jiffies + timeout;
+
+	qdev = container_of(fence->lock, struct qxl_device, release_lock);
+	release = container_of(fence, struct qxl_release, base);
+	have_drawable_releases = release->type == QXL_RELEASE_DRAWABLE;
+
+retry:
+	sc++;
+
+	if (__fence_is_signaled(fence))
+		goto signaled;
+
+	qxl_io_notify_oom(qdev);
+
+	for (count = 0; count < 11; count++) {
+		if (!qxl_queue_garbage_collect(qdev, true))
+			break;
+
+		if (__fence_is_signaled(fence))
+			goto signaled;
+	}
+
+	if (__fence_is_signaled(fence))
+		goto signaled;
+
+	if (have_drawable_releases || sc < 4) {
+		if (sc > 2)
+			/* back off */
+			usleep_range(500, 1000);
+
+		if (time_after(jiffies, end))
+			return 0;
+
+		if (have_drawable_releases && sc > 300) {
+			FENCE_WARN(fence, "failed to wait on release %d "
+					  "after spincount %d\n",
+					  fence->context & ~0xf0000000, sc);
+			goto signaled;
+		}
+		goto retry;
+	}
+	/*
+	 * yeah, original sync_obj_wait gave up after 3 spins when
+	 * have_drawable_releases is not set.
+	 */
+
+signaled:
+	cur = jiffies;
+	if (time_after(cur, end))
+		return 0;
+	return end - cur;
+}
+
+static const struct fence_ops qxl_fence_ops = {
+	.get_driver_name = qxl_get_driver_name,
+	.get_timeline_name = qxl_get_timeline_name,
+	.enable_signaling = qxl_nop_signaling,
+	.wait = qxl_fence_wait,
+};
+
 static uint64_t
 qxl_release_alloc(struct qxl_device *qdev, int type,
 		  struct qxl_release **ret)
@@ -46,13 +129,13 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 	struct qxl_release *release;
 	int handle;
 	size_t size = sizeof(*release);
-	int idr_ret;
 
 	release = kmalloc(size, GFP_KERNEL);
 	if (!release) {
 		DRM_ERROR("Out of memory\n");
 		return 0;
 	}
+	release->base.ops = NULL;
 	release->type = type;
 	release->release_offset = 0;
 	release->surface_release_id = 0;
@@ -60,44 +143,59 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&qdev->release_idr_lock);
-	idr_ret = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+	handle = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+	release->base.seqno = ++qdev->release_seqno;
 	spin_unlock(&qdev->release_idr_lock);
 	idr_preload_end();
-	handle = idr_ret;
-	if (idr_ret < 0)
-		goto release_fail;
+	if (handle < 0) {
+		kfree(release);
+		*ret = NULL;
+		return handle;
+	}
 	*ret = release;
 	QXL_INFO(qdev, "allocated release %lld\n", handle);
 	release->id = handle;
-release_fail:
-
 	return handle;
 }
 
+static void
+qxl_release_free_list(struct qxl_release *release)
+{
+	while (!list_empty(&release->bos)) {
+		struct ttm_validate_buffer *entry;
+
+		entry = container_of(release->bos.next,
+				     struct ttm_validate_buffer, head);
+
+		list_del(&entry->head);
+		kfree(entry);
+	}
+}
+
 void
 qxl_release_free(struct qxl_device *qdev,
 		 struct qxl_release *release)
 {
-	struct qxl_bo_list *entry, *tmp;
 	QXL_INFO(qdev, "release %d, type %d\n", release->id,
 		 release->type);
 
 	if (release->surface_release_id)
 		qxl_surface_id_dealloc(qdev, release->surface_release_id);
 
-	list_for_each_entry_safe(entry, tmp, &release->bos, tv.head) {
-		struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
-		QXL_INFO(qdev, "release %llx\n",
-			drm_vma_node_offset_addr(&entry->tv.bo->vma_node)
-						- DRM_FILE_OFFSET);
-		qxl_fence_remove_release(&bo->fence, release->id);
-		qxl_bo_unref(&bo);
-		kfree(entry);
-	}
 	spin_lock(&qdev->release_idr_lock);
 	idr_remove(&qdev->release_idr, release->id);
 	spin_unlock(&qdev->release_idr_lock);
-	kfree(release);
+
+	if (release->base.ops) {
+		WARN_ON(list_empty(&release->bos));
+		qxl_release_free_list(release);
+
+		fence_signal(&release->base);
+		fence_put(&release->base);
+	} else {
+		qxl_release_free_list(release);
+		kfree(release);
+	}
 }
 
 static int qxl_release_bo_alloc(struct qxl_device *qdev,
@@ -142,6 +240,10 @@ static int qxl_release_validate_bo(struct qxl_bo *bo)
 			return ret;
 	}
 
+	ret = reservation_object_reserve_shared(bo->tbo.resv);
+	if (ret)
+		return ret;
+
 	/* allocate a surface for reserved + validated buffers */
 	ret = qxl_bo_check_id(bo->gem_base.dev->dev_private, bo);
 	if (ret)
@@ -199,6 +301,8 @@ int qxl_alloc_surface_release_reserved(struct qxl_device *qdev,
 
 		/* stash the release after the create command */
 		idr_ret = qxl_release_alloc(qdev, QXL_RELEASE_SURFACE_CMD, release);
+		if (idr_ret < 0)
+			return idr_ret;
 		bo = qxl_bo_ref(to_qxl_bo(entry->tv.bo));
 
 		(*release)->release_offset = create_rel->release_offset + 64;
@@ -239,6 +343,11 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, unsigned long size,
 	}
 
 	idr_ret = qxl_release_alloc(qdev, type, release);
+	if (idr_ret < 0) {
+		if (rbo)
+			*rbo = NULL;
+		return idr_ret;
+	}
 
 	mutex_lock(&qdev->release_mutex);
 	if (qdev->current_release_bo_offset[cur_idx] + 1 >= releases_per_bo[cur_idx]) {
@@ -319,12 +428,13 @@ void qxl_release_unmap(struct qxl_device *qdev,
 
 void qxl_release_fence_buffer_objects(struct qxl_release *release)
 {
-	struct ttm_validate_buffer *entry;
 	struct ttm_buffer_object *bo;
 	struct ttm_bo_global *glob;
 	struct ttm_bo_device *bdev;
 	struct ttm_bo_driver *driver;
 	struct qxl_bo *qbo;
+	struct ttm_validate_buffer *entry;
+	struct qxl_device *qdev;
 
 	/* if only one object on the release its the release itself
 	   since these objects are pinned no need to reserve */
@@ -333,23 +443,35 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 
 	bo = list_first_entry(&release->bos, struct ttm_validate_buffer, head)->bo;
 	bdev = bo->bdev;
+	qdev = container_of(bdev, struct qxl_device, mman.bdev);
+
+	/*
+	 * Since we never really allocated a context and we don't want to conflict,
+	 * set the highest bits. This will break if we really allow exporting of dma-bufs.
+	 */
+	__fence_init(&release->base, &qxl_fence_ops,
+		  &qdev->release_lock, release->id | 0xf0000000, release->base.seqno);
+	trace_fence_emit(&release->base);
+
 	driver = bdev->driver;
 	glob = bo->glob;
 
 	spin_lock(&glob->lru_lock);
+	/* acquire release_lock to protect bo->resv->fence and its contents */
+	spin_lock(&qdev->release_lock);
 
 	list_for_each_entry(entry, &release->bos, head) {
 		bo = entry->bo;
 		qbo = to_qxl_bo(bo);
 
 		if (!entry->bo->sync_obj)
-			entry->bo->sync_obj = &qbo->fence;
-
-		qxl_fence_add_release_locked(&qbo->fence, release->id);
+			entry->bo->sync_obj = qbo;
 
+		reservation_object_add_shared_fence(bo->resv, &release->base);
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
 	}
+	spin_unlock(&qdev->release_lock);
 	spin_unlock(&glob->lru_lock);
 	ww_acquire_fini(&release->ticket);
 }
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index d52c27527b9a..80879e38e447 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -361,67 +361,67 @@ static int qxl_bo_move(struct ttm_buffer_object *bo,
 	return ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
 }
 
+static bool qxl_sync_obj_signaled(void *sync_obj);
 
 static int qxl_sync_obj_wait(void *sync_obj,
 			     bool lazy, bool interruptible)
 {
-	struct qxl_fence *qfence = (struct qxl_fence *)sync_obj;
-	int count = 0, sc = 0;
-	struct qxl_bo *bo = container_of(qfence, struct qxl_bo, fence);
-
-	if (qfence->num_active_releases == 0)
-		return 0;
+	struct qxl_bo *bo = (struct qxl_bo *)sync_obj;
+	struct qxl_device *qdev = bo->gem_base.dev->dev_private;
+	struct reservation_object_list *fobj;
+	int count = 0, sc = 0, num_release = 0;
+	bool have_drawable_releases;
 
 retry:
 	if (sc == 0) {
 		if (bo->type == QXL_GEM_DOMAIN_SURFACE)
-			qxl_update_surface(qfence->qdev, bo);
+			qxl_update_surface(qdev, bo);
 	} else if (sc >= 1) {
-		qxl_io_notify_oom(qfence->qdev);
+		qxl_io_notify_oom(qdev);
 	}
 
 	sc++;
 
 	for (count = 0; count < 10; count++) {
-		bool ret;
-		ret = qxl_queue_garbage_collect(qfence->qdev, true);
-		if (ret == false)
-			break;
-
-		if (qfence->num_active_releases == 0)
+		if (qxl_sync_obj_signaled(sync_obj))
 			return 0;
+
+		if (!qxl_queue_garbage_collect(qdev, true))
+			break;
 	}
 
-	if (qfence->num_active_releases) {
-		bool have_drawable_releases = false;
-		void **slot;
-		struct radix_tree_iter iter;
-		int release_id;
+	have_drawable_releases = false;
+	num_release = 0;
 
-		radix_tree_for_each_slot(slot, &qfence->tree, &iter, 0) {
-			struct qxl_release *release;
+	spin_lock(&qdev->release_lock);
+	fobj = bo->tbo.resv->fence;
+	for (count = 0; fobj && count < fobj->shared_count; count++) {
+		struct qxl_release *release;
 
-			release_id = iter.index;
-			release = qxl_release_from_id_locked(qfence->qdev, release_id);
-			if (release == NULL)
-				continue;
+		release = container_of(fobj->shared[count],
+				       struct qxl_release, base);
 
-			if (release->type == QXL_RELEASE_DRAWABLE)
-				have_drawable_releases = true;
-		}
+		if (fence_is_signaled(&release->base))
+			continue;
+
+		num_release++;
+
+		if (release->type == QXL_RELEASE_DRAWABLE)
+			have_drawable_releases = true;
+	}
+	spin_unlock(&qdev->release_lock);
+
+	qxl_queue_garbage_collect(qdev, true);
 
-		qxl_queue_garbage_collect(qfence->qdev, true);
-
-		if (have_drawable_releases || sc < 4) {
-			if (sc > 2)
-				/* back off */
-				usleep_range(500, 1000);
-			if (have_drawable_releases && sc > 300) {
-				WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, qfence->num_active_releases);
-				return -EBUSY;
-			}
-			goto retry;
+	if (have_drawable_releases || sc < 4) {
+		if (sc > 2)
+			/* back off */
+			usleep_range(500, 1000);
+		if (have_drawable_releases && sc > 300) {
+			WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, num_release);
+			return -EBUSY;
 		}
+		goto retry;
 	}
 	return 0;
 }
@@ -443,8 +443,21 @@ static void *qxl_sync_obj_ref(void *sync_obj)
 
 static bool qxl_sync_obj_signaled(void *sync_obj)
 {
-	struct qxl_fence *qfence = (struct qxl_fence *)sync_obj;
-	return (qfence->num_active_releases == 0);
+	struct qxl_bo *qbo = (struct qxl_bo *)sync_obj;
+	struct qxl_device *qdev = qbo->gem_base.dev->dev_private;
+	struct reservation_object_list *fobj;
+	bool ret = true;
+	unsigned i;
+
+	spin_lock(&qdev->release_lock);
+	fobj = qbo->tbo.resv->fence;
+	for (i = 0; fobj && i < fobj->shared_count; ++i) {
+		ret = fence_is_signaled(fobj->shared[i]);
+		if (!ret)
+			break;
+	}
+	spin_unlock(&qdev->release_lock);
+	return ret;
 }
 
 static void qxl_bo_move_notify(struct ttm_buffer_object *bo,
@@ -481,8 +494,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
 	.move_notify = &qxl_bo_move_notify,
 };
 
-
-
 int qxl_ttm_init(struct qxl_device *qdev)
 {
 	int r;


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 10/16] drm/vmwgfx: get rid of different types of fence_flags entirely
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (8 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 09/16] drm/qxl: rework to new fence interface Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 11/16] drm/vmwgfx: rework to new fence interface Maarten Lankhorst
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Only one type was ever used. This is needed to simplify the fence
support in the next commit.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |    5 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h     |    1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |   14 ++-------
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |   50 ++++++++++++-------------------
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h   |    8 +----
 5 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 4a36bb1dc525..f15718cc631d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -792,15 +792,12 @@ static int vmw_sync_obj_flush(void *sync_obj)
 
 static bool vmw_sync_obj_signaled(void *sync_obj)
 {
-	return	vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj,
-				       DRM_VMW_FENCE_FLAG_EXEC);
-
+	return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
 }
 
 static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
 {
 	return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
-				  DRM_VMW_FENCE_FLAG_EXEC,
 				  lazy, interruptible,
 				  VMW_FENCE_WAIT_TIMEOUT);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 6b252a887ae2..f217e9723b9e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -332,7 +332,6 @@ struct vmw_sw_context{
 	uint32_t *cmd_bounce;
 	uint32_t cmd_bounce_size;
 	struct list_head resource_list;
-	uint32_t fence_flags;
 	struct ttm_buffer_object *cur_query_bo;
 	struct list_head res_relocations;
 	uint32_t *buf_start;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index f8b25bc4e634..db30b790ad24 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -350,8 +350,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context *sw_context,
 		vval_buf->validate_as_mob = validate_as_mob;
 	}
 
-	sw_context->fence_flags |= DRM_VMW_FENCE_FLAG_EXEC;
-
 	if (p_val_node)
 		*p_val_node = val_node;
 
@@ -2308,13 +2306,9 @@ int vmw_execbuf_fence_commands(struct drm_file *file_priv,
 
 	if (p_handle != NULL)
 		ret = vmw_user_fence_create(file_priv, dev_priv->fman,
-					    sequence,
-					    DRM_VMW_FENCE_FLAG_EXEC,
-					    p_fence, p_handle);
+					    sequence, p_fence, p_handle);
 	else
-		ret = vmw_fence_create(dev_priv->fman, sequence,
-				       DRM_VMW_FENCE_FLAG_EXEC,
-				       p_fence);
+		ret = vmw_fence_create(dev_priv->fman, sequence, p_fence);
 
 	if (unlikely(ret != 0 && !synced)) {
 		(void) vmw_fallback_wait(dev_priv, false, false,
@@ -2387,8 +2381,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
 		ttm_ref_object_base_unref(vmw_fp->tfile,
 					  fence_handle, TTM_REF_USAGE);
 		DRM_ERROR("Fence copy error. Syncing.\n");
-		(void) vmw_fence_obj_wait(fence, fence->signal_mask,
-					  false, false,
+		(void) vmw_fence_obj_wait(fence, false, false,
 					  VMW_FENCE_WAIT_TIMEOUT);
 	}
 }
@@ -2438,7 +2431,6 @@ int vmw_execbuf_process(struct drm_file *file_priv,
 	sw_context->fp = vmw_fpriv(file_priv);
 	sw_context->cur_reloc = 0;
 	sw_context->cur_val_buf = 0;
-	sw_context->fence_flags = 0;
 	INIT_LIST_HEAD(&sw_context->resource_list);
 	sw_context->cur_query_bo = dev_priv->pinned_bo;
 	sw_context->last_query_ctx = NULL;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 436b013b4231..05b9eea8e875 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -207,9 +207,7 @@ void vmw_fence_manager_takedown(struct vmw_fence_manager *fman)
 }
 
 static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
-			      struct vmw_fence_obj *fence,
-			      u32 seqno,
-			      uint32_t mask,
+			      struct vmw_fence_obj *fence, u32 seqno,
 			      void (*destroy) (struct vmw_fence_obj *fence))
 {
 	unsigned long irq_flags;
@@ -220,7 +218,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
 	INIT_LIST_HEAD(&fence->seq_passed_actions);
 	fence->fman = fman;
 	fence->signaled = 0;
-	fence->signal_mask = mask;
 	kref_init(&fence->kref);
 	fence->destroy = destroy;
 	init_waitqueue_head(&fence->queue);
@@ -356,7 +353,7 @@ static bool vmw_fence_goal_check_locked(struct vmw_fence_obj *fence)
 	u32 goal_seqno;
 	__le32 __iomem *fifo_mem;
 
-	if (fence->signaled & DRM_VMW_FENCE_FLAG_EXEC)
+	if (fence->signaled)
 		return false;
 
 	fifo_mem = fence->fman->dev_priv->mmio_virt;
@@ -386,7 +383,7 @@ rerun:
 	list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
 		if (seqno - fence->seqno < VMW_FENCE_WRAP) {
 			list_del_init(&fence->head);
-			fence->signaled |= DRM_VMW_FENCE_FLAG_EXEC;
+			fence->signaled = 1;
 			INIT_LIST_HEAD(&action_list);
 			list_splice_init(&fence->seq_passed_actions,
 					 &action_list);
@@ -417,8 +414,7 @@ rerun:
 	}
 }
 
-bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
-			    uint32_t flags)
+bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
 {
 	struct vmw_fence_manager *fman = fence->fman;
 	unsigned long irq_flags;
@@ -428,28 +424,25 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
 	signaled = fence->signaled;
 	spin_unlock_irqrestore(&fman->lock, irq_flags);
 
-	flags &= fence->signal_mask;
-	if ((signaled & flags) == flags)
+	if (signaled)
 		return 1;
 
-	if ((signaled & DRM_VMW_FENCE_FLAG_EXEC) == 0)
-		vmw_fences_update(fman);
+	vmw_fences_update(fman);
 
 	spin_lock_irqsave(&fman->lock, irq_flags);
 	signaled = fence->signaled;
 	spin_unlock_irqrestore(&fman->lock, irq_flags);
 
-	return ((signaled & flags) == flags);
+	return signaled;
 }
 
-int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
-		       uint32_t flags, bool lazy,
+int vmw_fence_obj_wait(struct vmw_fence_obj *fence, bool lazy,
 		       bool interruptible, unsigned long timeout)
 {
 	struct vmw_private *dev_priv = fence->fman->dev_priv;
 	long ret;
 
-	if (likely(vmw_fence_obj_signaled(fence, flags)))
+	if (likely(vmw_fence_obj_signaled(fence)))
 		return 0;
 
 	vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
@@ -458,12 +451,12 @@ int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
 	if (interruptible)
 		ret = wait_event_interruptible_timeout
 			(fence->queue,
-			 vmw_fence_obj_signaled(fence, flags),
+			 vmw_fence_obj_signaled(fence),
 			 timeout);
 	else
 		ret = wait_event_timeout
 			(fence->queue,
-			 vmw_fence_obj_signaled(fence, flags),
+			 vmw_fence_obj_signaled(fence),
 			 timeout);
 
 	vmw_seqno_waiter_remove(dev_priv);
@@ -497,7 +490,6 @@ static void vmw_fence_destroy(struct vmw_fence_obj *fence)
 
 int vmw_fence_create(struct vmw_fence_manager *fman,
 		     uint32_t seqno,
-		     uint32_t mask,
 		     struct vmw_fence_obj **p_fence)
 {
 	struct ttm_mem_global *mem_glob = vmw_mem_glob(fman->dev_priv);
@@ -515,7 +507,7 @@ int vmw_fence_create(struct vmw_fence_manager *fman,
 		goto out_no_object;
 	}
 
-	ret = vmw_fence_obj_init(fman, fence, seqno, mask,
+	ret = vmw_fence_obj_init(fman, fence, seqno,
 				 vmw_fence_destroy);
 	if (unlikely(ret != 0))
 		goto out_err_init;
@@ -559,7 +551,6 @@ static void vmw_user_fence_base_release(struct ttm_base_object **p_base)
 int vmw_user_fence_create(struct drm_file *file_priv,
 			  struct vmw_fence_manager *fman,
 			  uint32_t seqno,
-			  uint32_t mask,
 			  struct vmw_fence_obj **p_fence,
 			  uint32_t *p_handle)
 {
@@ -586,7 +577,7 @@ int vmw_user_fence_create(struct drm_file *file_priv,
 	}
 
 	ret = vmw_fence_obj_init(fman, &ufence->fence, seqno,
-				 mask, vmw_user_fence_destroy);
+				 vmw_user_fence_destroy);
 	if (unlikely(ret != 0)) {
 		kfree(ufence);
 		goto out_no_object;
@@ -647,13 +638,12 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 		kref_get(&fence->kref);
 		spin_unlock_irq(&fman->lock);
 
-		ret = vmw_fence_obj_wait(fence, fence->signal_mask,
-					 false, false,
+		ret = vmw_fence_obj_wait(fence, false, false,
 					 VMW_FENCE_WAIT_TIMEOUT);
 
 		if (unlikely(ret != 0)) {
 			list_del_init(&fence->head);
-			fence->signaled |= DRM_VMW_FENCE_FLAG_EXEC;
+			fence->signaled = 1;
 			INIT_LIST_HEAD(&action_list);
 			list_splice_init(&fence->seq_passed_actions,
 					 &action_list);
@@ -716,14 +706,14 @@ int vmw_fence_obj_wait_ioctl(struct drm_device *dev, void *data,
 
 	timeout = jiffies;
 	if (time_after_eq(timeout, (unsigned long)arg->kernel_cookie)) {
-		ret = ((vmw_fence_obj_signaled(fence, arg->flags)) ?
+		ret = ((vmw_fence_obj_signaled(fence)) ?
 		       0 : -EBUSY);
 		goto out;
 	}
 
 	timeout = (unsigned long)arg->kernel_cookie - timeout;
 
-	ret = vmw_fence_obj_wait(fence, arg->flags, arg->lazy, true, timeout);
+	ret = vmw_fence_obj_wait(fence, arg->lazy, true, timeout);
 
 out:
 	ttm_base_object_unref(&base);
@@ -760,10 +750,10 @@ int vmw_fence_obj_signaled_ioctl(struct drm_device *dev, void *data,
 	fence = &(container_of(base, struct vmw_user_fence, base)->fence);
 	fman = fence->fman;
 
-	arg->signaled = vmw_fence_obj_signaled(fence, arg->flags);
+	arg->signaled = vmw_fence_obj_signaled(fence);
 	spin_lock_irq(&fman->lock);
 
-	arg->signaled_flags = fence->signaled;
+	arg->signaled_flags = arg->flags;
 	arg->passed_seqno = dev_priv->last_read_seqno;
 	spin_unlock_irq(&fman->lock);
 
@@ -908,7 +898,7 @@ static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
 	spin_lock_irqsave(&fman->lock, irq_flags);
 
 	fman->pending_actions[action->type]++;
-	if (fence->signaled & DRM_VMW_FENCE_FLAG_EXEC) {
+	if (fence->signaled) {
 		struct list_head action_list;
 
 		INIT_LIST_HEAD(&action_list);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
index faf2e7873860..8c18d32bd1c3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
@@ -56,7 +56,6 @@ struct vmw_fence_obj {
 	struct vmw_fence_manager *fman;
 	struct list_head head;
 	uint32_t signaled;
-	uint32_t signal_mask;
 	struct list_head seq_passed_actions;
 	void (*destroy)(struct vmw_fence_obj *fence);
 	wait_queue_head_t queue;
@@ -74,10 +73,9 @@ vmw_fence_obj_reference(struct vmw_fence_obj *fence);
 
 extern void vmw_fences_update(struct vmw_fence_manager *fman);
 
-extern bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence,
-				   uint32_t flags);
+extern bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence);
 
-extern int vmw_fence_obj_wait(struct vmw_fence_obj *fence, uint32_t flags,
+extern int vmw_fence_obj_wait(struct vmw_fence_obj *fence,
 			      bool lazy,
 			      bool interruptible, unsigned long timeout);
 
@@ -85,13 +83,11 @@ extern void vmw_fence_obj_flush(struct vmw_fence_obj *fence);
 
 extern int vmw_fence_create(struct vmw_fence_manager *fman,
 			    uint32_t seqno,
-			    uint32_t mask,
 			    struct vmw_fence_obj **p_fence);
 
 extern int vmw_user_fence_create(struct drm_file *file_priv,
 				 struct vmw_fence_manager *fman,
 				 uint32_t sequence,
-				 uint32_t mask,
 				 struct vmw_fence_obj **p_fence,
 				 uint32_t *p_handle);
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 11/16] drm/vmwgfx: rework to new fence interface
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (9 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 10/16] drm/vmwgfx: get rid of different types of fence_flags entirely Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 12/16] drm/ttm: flip the switch, and convert to dma_fence Maarten Lankhorst
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |    2 
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c    |  299 ++++++++++++++++++------------
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h    |   29 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |    9 -
 4 files changed, 200 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index db30b790ad24..f3f8caa09cc8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2360,7 +2360,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
 		BUG_ON(fence == NULL);
 
 		fence_rep.handle = fence_handle;
-		fence_rep.seqno = fence->seqno;
+		fence_rep.seqno = fence->base.seqno;
 		vmw_update_seqno(dev_priv, &dev_priv->fifo);
 		fence_rep.passed_seqno = dev_priv->last_read_seqno;
 	}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 05b9eea8e875..5d595ca5d82a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -46,6 +46,7 @@ struct vmw_fence_manager {
 	bool goal_irq_on; /* Protected by @goal_irq_mutex */
 	bool seqno_valid; /* Protected by @lock, and may not be set to true
 			     without the @goal_irq_mutex held. */
+	unsigned ctx;
 };
 
 struct vmw_user_fence {
@@ -80,6 +81,12 @@ struct vmw_event_fence_action {
 	uint32_t *tv_usec;
 };
 
+static struct vmw_fence_manager *
+fman_from_fence(struct vmw_fence_obj *fence)
+{
+	return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+}
+
 /**
  * Note on fencing subsystem usage of irqs:
  * Typically the vmw_fences_update function is called
@@ -102,25 +109,130 @@ struct vmw_event_fence_action {
  * objects with actions attached to them.
  */
 
-static void vmw_fence_obj_destroy_locked(struct kref *kref)
+static void vmw_fence_obj_destroy(struct fence *f)
 {
 	struct vmw_fence_obj *fence =
-		container_of(kref, struct vmw_fence_obj, kref);
+		container_of(f, struct vmw_fence_obj, base);
 
-	struct vmw_fence_manager *fman = fence->fman;
-	unsigned int num_fences;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
+	unsigned long irq_flags;
 
+	spin_lock_irqsave(&fman->lock, irq_flags);
 	list_del_init(&fence->head);
-	num_fences = --fman->num_fence_objects;
-	spin_unlock_irq(&fman->lock);
-	if (fence->destroy)
-		fence->destroy(fence);
-	else
-		kfree(fence);
+	--fman->num_fence_objects;
+	spin_unlock_irqrestore(&fman->lock, irq_flags);
+	fence->destroy(fence);
+}
 
-	spin_lock_irq(&fman->lock);
+static const char *vmw_fence_get_driver_name(struct fence *f)
+{
+	return "vmwgfx";
+}
+
+static const char *vmw_fence_get_timeline_name(struct fence *f)
+{
+	return "svga";
+}
+
+static bool vmw_fence_enable_signaling(struct fence *f)
+{
+	struct vmw_fence_obj *fence =
+		container_of(f, struct vmw_fence_obj, base);
+
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+	__le32 __iomem *fifo_mem = fman->dev_priv->mmio_virt;
+	u32 seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
+	if (seqno - fence->base.seqno < VMW_FENCE_WRAP)
+		return false;
+
+	vmw_fifo_ping_host(fman->dev_priv, SVGA_SYNC_GENERIC);
+
+	return true;
+}
+
+struct vmwgfx_wait_cb {
+	struct fence_cb base;
+	struct task_struct *task;
+};
+
+static void
+vmwgfx_wait_cb(struct fence *fence, struct fence_cb *cb)
+{
+	struct vmwgfx_wait_cb *wait =
+		container_of(cb, struct vmwgfx_wait_cb, base);
+
+	wake_up_state(wait->task, TASK_NORMAL);
 }
 
+static void __vmw_fences_update(struct vmw_fence_manager *fman);
+
+static long vmw_fence_wait(struct fence *f, bool intr, signed long timeout)
+{
+	struct vmw_fence_obj *fence =
+		container_of(f, struct vmw_fence_obj, base);
+
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
+	struct vmw_private *dev_priv = fman->dev_priv;
+	struct vmwgfx_wait_cb cb;
+	long ret = timeout;
+	unsigned long irq_flags;
+
+	if (likely(vmw_fence_obj_signaled(fence)))
+		return timeout;
+
+	vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
+	vmw_seqno_waiter_add(dev_priv);
+
+	spin_lock_irqsave(f->lock, irq_flags);
+
+	if (intr && signal_pending(current)) {
+		ret = -ERESTARTSYS;
+		goto out;
+	}
+
+	cb.base.func = vmwgfx_wait_cb;
+	cb.task = current;
+	list_add(&cb.base.node, &f->cb_list);
+
+	while (ret > 0) {
+		__vmw_fences_update(fman);
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &f->flags))
+			break;
+
+		if (intr)
+			__set_current_state(TASK_INTERRUPTIBLE);
+		else
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_unlock_irqrestore(f->lock, irq_flags);
+
+		ret = schedule_timeout(ret);
+
+		spin_lock_irqsave(f->lock, irq_flags);
+		if (ret > 0 && intr && signal_pending(current))
+			ret = -ERESTARTSYS;
+	}
+
+	if (!list_empty(&cb.base.node))
+		list_del(&cb.base.node);
+	__set_current_state(TASK_RUNNING);
+
+out:
+	spin_unlock_irqrestore(f->lock, irq_flags);
+
+	vmw_seqno_waiter_remove(dev_priv);
+
+	return ret;
+}
+
+static struct fence_ops vmw_fence_ops = {
+	.get_driver_name = vmw_fence_get_driver_name,
+	.get_timeline_name = vmw_fence_get_timeline_name,
+	.enable_signaling = vmw_fence_enable_signaling,
+	.wait = vmw_fence_wait,
+	.release = vmw_fence_obj_destroy,
+};
+
 
 /**
  * Execute signal actions on fences recently signaled.
@@ -186,6 +298,7 @@ struct vmw_fence_manager *vmw_fence_manager_init(struct vmw_private *dev_priv)
 	fman->event_fence_action_size =
 		ttm_round_pot(sizeof(struct vmw_event_fence_action));
 	mutex_init(&fman->goal_irq_mutex);
+	fman->ctx = fence_context_alloc(1);
 
 	return fman;
 }
@@ -211,16 +324,12 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
 			      void (*destroy) (struct vmw_fence_obj *fence))
 {
 	unsigned long irq_flags;
-	unsigned int num_fences;
 	int ret = 0;
 
-	fence->seqno = seqno;
+	__fence_init(&fence->base, &vmw_fence_ops, &fman->lock,
+		     fman->ctx, seqno);
 	INIT_LIST_HEAD(&fence->seq_passed_actions);
-	fence->fman = fman;
-	fence->signaled = 0;
-	kref_init(&fence->kref);
 	fence->destroy = destroy;
-	init_waitqueue_head(&fence->queue);
 
 	spin_lock_irqsave(&fman->lock, irq_flags);
 	if (unlikely(fman->fifo_down)) {
@@ -228,7 +337,7 @@ static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
 		goto out_unlock;
 	}
 	list_add_tail(&fence->head, &fman->fence_list);
-	num_fences = ++fman->num_fence_objects;
+	++fman->num_fence_objects;
 
 out_unlock:
 	spin_unlock_irqrestore(&fman->lock, irq_flags);
@@ -236,38 +345,6 @@ out_unlock:
 
 }
 
-struct vmw_fence_obj *vmw_fence_obj_reference(struct vmw_fence_obj *fence)
-{
-	if (unlikely(fence == NULL))
-		return NULL;
-
-	kref_get(&fence->kref);
-	return fence;
-}
-
-/**
- * vmw_fence_obj_unreference
- *
- * Note that this function may not be entered with disabled irqs since
- * it may re-enable them in the destroy function.
- *
- */
-void vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p)
-{
-	struct vmw_fence_obj *fence = *fence_p;
-	struct vmw_fence_manager *fman;
-
-	if (unlikely(fence == NULL))
-		return;
-
-	fman = fence->fman;
-	*fence_p = NULL;
-	spin_lock_irq(&fman->lock);
-	BUG_ON(atomic_read(&fence->kref.refcount) == 0);
-	kref_put(&fence->kref, vmw_fence_obj_destroy_locked);
-	spin_unlock_irq(&fman->lock);
-}
-
 static void vmw_fences_perform_actions(struct vmw_fence_manager *fman,
 				struct list_head *list)
 {
@@ -323,7 +400,7 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
 	list_for_each_entry(fence, &fman->fence_list, head) {
 		if (!list_empty(&fence->seq_passed_actions)) {
 			fman->seqno_valid = true;
-			iowrite32(fence->seqno,
+			iowrite32(fence->base.seqno,
 				  fifo_mem + SVGA_FIFO_FENCE_GOAL);
 			break;
 		}
@@ -350,27 +427,27 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman,
  */
 static bool vmw_fence_goal_check_locked(struct vmw_fence_obj *fence)
 {
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
 	u32 goal_seqno;
 	__le32 __iomem *fifo_mem;
 
-	if (fence->signaled)
+	if (__fence_is_signaled(&fence->base))
 		return false;
 
-	fifo_mem = fence->fman->dev_priv->mmio_virt;
+	fifo_mem = fman->dev_priv->mmio_virt;
 	goal_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE_GOAL);
-	if (likely(fence->fman->seqno_valid &&
-		   goal_seqno - fence->seqno < VMW_FENCE_WRAP))
+	if (likely(fman->seqno_valid &&
+		   goal_seqno - fence->base.seqno < VMW_FENCE_WRAP))
 		return false;
 
-	iowrite32(fence->seqno, fifo_mem + SVGA_FIFO_FENCE_GOAL);
-	fence->fman->seqno_valid = true;
+	iowrite32(fence->base.seqno, fifo_mem + SVGA_FIFO_FENCE_GOAL);
+	fman->seqno_valid = true;
 
 	return true;
 }
 
-void vmw_fences_update(struct vmw_fence_manager *fman)
+static void __vmw_fences_update(struct vmw_fence_manager *fman)
 {
-	unsigned long flags;
 	struct vmw_fence_obj *fence, *next_fence;
 	struct list_head action_list;
 	bool needs_rerun;
@@ -379,32 +456,25 @@ void vmw_fences_update(struct vmw_fence_manager *fman)
 
 	seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
 rerun:
-	spin_lock_irqsave(&fman->lock, flags);
 	list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) {
-		if (seqno - fence->seqno < VMW_FENCE_WRAP) {
+		if (seqno - fence->base.seqno < VMW_FENCE_WRAP) {
 			list_del_init(&fence->head);
-			fence->signaled = 1;
+			__fence_signal(&fence->base);
 			INIT_LIST_HEAD(&action_list);
 			list_splice_init(&fence->seq_passed_actions,
 					 &action_list);
 			vmw_fences_perform_actions(fman, &action_list);
-			wake_up_all(&fence->queue);
 		} else
 			break;
 	}
 
-	needs_rerun = vmw_fence_goal_new_locked(fman, seqno);
-
-	if (!list_empty(&fman->cleanup_list))
-		(void) schedule_work(&fman->work);
-	spin_unlock_irqrestore(&fman->lock, flags);
-
 	/*
 	 * Rerun if the fence goal seqno was updated, and the
 	 * hardware might have raced with that update, so that
 	 * we missed a fence_goal irq.
 	 */
 
+	needs_rerun = vmw_fence_goal_new_locked(fman, seqno);
 	if (unlikely(needs_rerun)) {
 		new_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
 		if (new_seqno != seqno) {
@@ -412,75 +482,58 @@ rerun:
 			goto rerun;
 		}
 	}
+
+	if (!list_empty(&fman->cleanup_list))
+		(void) schedule_work(&fman->work);
 }
 
-bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
+void vmw_fences_update(struct vmw_fence_manager *fman)
 {
-	struct vmw_fence_manager *fman = fence->fman;
 	unsigned long irq_flags;
-	uint32_t signaled;
 
 	spin_lock_irqsave(&fman->lock, irq_flags);
-	signaled = fence->signaled;
+	__vmw_fences_update(fman);
 	spin_unlock_irqrestore(&fman->lock, irq_flags);
+}
+
+bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
+{
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
 
-	if (signaled)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
 		return 1;
 
 	vmw_fences_update(fman);
 
-	spin_lock_irqsave(&fman->lock, irq_flags);
-	signaled = fence->signaled;
-	spin_unlock_irqrestore(&fman->lock, irq_flags);
-
-	return signaled;
+	return __fence_is_signaled(&fence->base);
 }
 
 int vmw_fence_obj_wait(struct vmw_fence_obj *fence, bool lazy,
 		       bool interruptible, unsigned long timeout)
 {
-	struct vmw_private *dev_priv = fence->fman->dev_priv;
-	long ret;
+	long ret = fence_wait_timeout(&fence->base, interruptible, timeout);
 
-	if (likely(vmw_fence_obj_signaled(fence)))
+	if (likely(ret > 0))
 		return 0;
-
-	vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
-	vmw_seqno_waiter_add(dev_priv);
-
-	if (interruptible)
-		ret = wait_event_interruptible_timeout
-			(fence->queue,
-			 vmw_fence_obj_signaled(fence),
-			 timeout);
+	else if (ret == 0)
+		return -EBUSY;
 	else
-		ret = wait_event_timeout
-			(fence->queue,
-			 vmw_fence_obj_signaled(fence),
-			 timeout);
-
-	vmw_seqno_waiter_remove(dev_priv);
-
-	if (unlikely(ret == 0))
-		ret = -EBUSY;
-	else if (likely(ret > 0))
-		ret = 0;
-
-	return ret;
+		return ret;
 }
 
 void vmw_fence_obj_flush(struct vmw_fence_obj *fence)
 {
-	struct vmw_private *dev_priv = fence->fman->dev_priv;
+	struct vmw_private *dev_priv = fman_from_fence(fence)->dev_priv;
 
 	vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
 }
 
 static void vmw_fence_destroy(struct vmw_fence_obj *fence)
 {
-	struct vmw_fence_manager *fman = fence->fman;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+	free_fence(&fence->base);
 
-	kfree(fence);
 	/*
 	 * Free kernel space accounting.
 	 */
@@ -527,7 +580,7 @@ static void vmw_user_fence_destroy(struct vmw_fence_obj *fence)
 {
 	struct vmw_user_fence *ufence =
 		container_of(fence, struct vmw_user_fence, fence);
-	struct vmw_fence_manager *fman = fence->fman;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
 
 	ttm_base_object_kfree(ufence, base);
 	/*
@@ -620,7 +673,6 @@ out_no_object:
 
 void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 {
-	unsigned long irq_flags;
 	struct list_head action_list;
 	int ret;
 
@@ -629,13 +681,13 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 	 * restart when we've released the fman->lock.
 	 */
 
-	spin_lock_irqsave(&fman->lock, irq_flags);
+	spin_lock_irq(&fman->lock);
 	fman->fifo_down = true;
 	while (!list_empty(&fman->fence_list)) {
 		struct vmw_fence_obj *fence =
 			list_entry(fman->fence_list.prev, struct vmw_fence_obj,
 				   head);
-		kref_get(&fence->kref);
+		fence_get(&fence->base);
 		spin_unlock_irq(&fman->lock);
 
 		ret = vmw_fence_obj_wait(fence, false, false,
@@ -643,20 +695,18 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
 
 		if (unlikely(ret != 0)) {
 			list_del_init(&fence->head);
-			fence->signaled = 1;
+			fence_signal(&fence->base);
 			INIT_LIST_HEAD(&action_list);
 			list_splice_init(&fence->seq_passed_actions,
 					 &action_list);
 			vmw_fences_perform_actions(fman, &action_list);
-			wake_up_all(&fence->queue);
 		}
 
-		spin_lock_irq(&fman->lock);
-
 		BUG_ON(!list_empty(&fence->head));
-		kref_put(&fence->kref, vmw_fence_obj_destroy_locked);
+		fence_put(&fence->base);
+		spin_lock_irq(&fman->lock);
 	}
-	spin_unlock_irqrestore(&fman->lock, irq_flags);
+	spin_unlock_irq(&fman->lock);
 }
 
 void vmw_fence_fifo_up(struct vmw_fence_manager *fman)
@@ -748,12 +798,12 @@ int vmw_fence_obj_signaled_ioctl(struct drm_device *dev, void *data,
 	}
 
 	fence = &(container_of(base, struct vmw_user_fence, base)->fence);
-	fman = fence->fman;
+	fman = fman_from_fence(fence);
 
 	arg->signaled = vmw_fence_obj_signaled(fence);
-	spin_lock_irq(&fman->lock);
 
 	arg->signaled_flags = arg->flags;
+	spin_lock_irq(&fman->lock);
 	arg->passed_seqno = dev_priv->last_read_seqno;
 	spin_unlock_irq(&fman->lock);
 
@@ -866,7 +916,7 @@ static void vmw_event_fence_action_cleanup(struct vmw_fence_action *action)
 {
 	struct vmw_event_fence_action *eaction =
 		container_of(action, struct vmw_event_fence_action, action);
-	struct vmw_fence_manager *fman = eaction->fence->fman;
+	struct vmw_fence_manager *fman = fman_from_fence(eaction->fence);
 	unsigned long irq_flags;
 
 	spin_lock_irqsave(&fman->lock, irq_flags);
@@ -890,7 +940,7 @@ static void vmw_event_fence_action_cleanup(struct vmw_fence_action *action)
 static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
 			      struct vmw_fence_action *action)
 {
-	struct vmw_fence_manager *fman = fence->fman;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
 	unsigned long irq_flags;
 	bool run_update = false;
 
@@ -898,7 +948,7 @@ static void vmw_fence_obj_add_action(struct vmw_fence_obj *fence,
 	spin_lock_irqsave(&fman->lock, irq_flags);
 
 	fman->pending_actions[action->type]++;
-	if (fence->signaled) {
+	if (__fence_is_signaled(&fence->base)) {
 		struct list_head action_list;
 
 		INIT_LIST_HEAD(&action_list);
@@ -950,7 +1000,7 @@ int vmw_event_fence_action_queue(struct drm_file *file_priv,
 				 bool interruptible)
 {
 	struct vmw_event_fence_action *eaction;
-	struct vmw_fence_manager *fman = fence->fman;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
 	struct vmw_fpriv *vmw_fp = vmw_fpriv(file_priv);
 	unsigned long irq_flags;
 
@@ -990,7 +1040,8 @@ static int vmw_event_fence_action_create(struct drm_file *file_priv,
 				  bool interruptible)
 {
 	struct vmw_event_fence_pending *event;
-	struct drm_device *dev = fence->fman->dev_priv->dev;
+	struct vmw_fence_manager *fman = fman_from_fence(fence);
+	struct drm_device *dev = fman->dev_priv->dev;
 	unsigned long irq_flags;
 	int ret;
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
index 8c18d32bd1c3..26a4add39208 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.h
@@ -27,6 +27,8 @@
 
 #ifndef _VMWGFX_FENCE_H_
 
+#include <linux/fence.h>
+
 #define VMW_FENCE_WAIT_TIMEOUT (5*HZ)
 
 struct vmw_private;
@@ -50,15 +52,11 @@ struct vmw_fence_action {
 };
 
 struct vmw_fence_obj {
-	struct kref kref;
-	u32 seqno;
+	struct fence base;
 
-	struct vmw_fence_manager *fman;
 	struct list_head head;
-	uint32_t signaled;
 	struct list_head seq_passed_actions;
 	void (*destroy)(struct vmw_fence_obj *fence);
-	wait_queue_head_t queue;
 };
 
 extern struct vmw_fence_manager *
@@ -66,10 +64,23 @@ vmw_fence_manager_init(struct vmw_private *dev_priv);
 
 extern void vmw_fence_manager_takedown(struct vmw_fence_manager *fman);
 
-extern void vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p);
-
-extern struct vmw_fence_obj *
-vmw_fence_obj_reference(struct vmw_fence_obj *fence);
+static inline void
+vmw_fence_obj_unreference(struct vmw_fence_obj **fence_p)
+{
+	struct vmw_fence_obj *fence = *fence_p;
+
+	*fence_p = NULL;
+	if (fence)
+		fence_put(&fence->base);
+}
+
+static inline struct vmw_fence_obj *
+vmw_fence_obj_reference(struct vmw_fence_obj *fence)
+{
+	if (fence)
+		fence_get(&fence->base);
+	return fence;
+}
 
 extern void vmw_fences_update(struct vmw_fence_manager *fman);
 
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 48e47a100dea..6688a6341486 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1419,21 +1419,20 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
 			 struct vmw_fence_obj *fence)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
-	struct ttm_bo_driver *driver = bdev->driver;
 	struct vmw_fence_obj *old_fence_obj;
 	struct vmw_private *dev_priv =
 		container_of(bdev, struct vmw_private, bdev);
 
-	if (fence == NULL)
+	if (fence == NULL) {
 		vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);
-	else
-		driver->sync_obj_ref(fence);
+	} else
+		vmw_fence_obj_reference(fence);
 
+	reservation_object_add_excl_fence(bo->resv, &fence->base);
 
 	old_fence_obj = bo->sync_obj;
 	bo->sync_obj = fence;
 
-
 	if (old_fence_obj)
 		vmw_fence_obj_unreference(&old_fence_obj);
 }


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 12/16] drm/ttm: flip the switch, and convert to dma_fence
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (10 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 11/16] drm/vmwgfx: rework to new fence interface Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 13/16] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel


---
 drivers/gpu/drm/nouveau/nouveau_bo.c     |   48 +-------
 drivers/gpu/drm/nouveau/nouveau_fence.c  |   24 +---
 drivers/gpu/drm/nouveau/nouveau_fence.h  |    2 
 drivers/gpu/drm/nouveau/nouveau_gem.c    |   16 ++-
 drivers/gpu/drm/qxl/qxl_debugfs.c        |    6 +
 drivers/gpu/drm/qxl/qxl_drv.h            |    2 
 drivers/gpu/drm/qxl/qxl_kms.c            |    1 
 drivers/gpu/drm/qxl/qxl_object.h         |    4 -
 drivers/gpu/drm/qxl/qxl_release.c        |    3 -
 drivers/gpu/drm/qxl/qxl_ttm.c            |  104 ------------------
 drivers/gpu/drm/radeon/radeon_cs.c       |   10 +-
 drivers/gpu/drm/radeon/radeon_display.c  |   18 +++
 drivers/gpu/drm/radeon/radeon_object.c   |    4 -
 drivers/gpu/drm/radeon/radeon_ttm.c      |   34 ------
 drivers/gpu/drm/radeon/radeon_uvd.c      |    8 +
 drivers/gpu/drm/radeon/radeon_vm.c       |    2 
 drivers/gpu/drm/ttm/ttm_bo.c             |  171 +++++++++++++++++++++---------
 drivers/gpu/drm/ttm/ttm_bo_util.c        |   23 +---
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   10 --
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c   |   40 -------
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   14 +-
 include/drm/ttm/ttm_bo_api.h             |    2 
 include/drm/ttm/ttm_bo_driver.h          |   26 -----
 include/drm/ttm/ttm_execbuf_util.h       |   10 +-
 24 files changed, 197 insertions(+), 385 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 84aba3fa1bd0..5b8ccc39a282 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -92,13 +92,13 @@ nv10_bo_get_tile_region(struct drm_device *dev, int i)
 
 static void
 nv10_bo_put_tile_region(struct drm_device *dev, struct nouveau_drm_tile *tile,
-			struct nouveau_fence *fence)
+			struct fence *fence)
 {
 	struct nouveau_drm *drm = nouveau_drm(dev);
 
 	if (tile) {
 		spin_lock(&drm->tile.lock);
-		tile->fence = nouveau_fence_ref(fence);
+		tile->fence = nouveau_fence_ref((struct nouveau_fence *)fence);
 		tile->used = false;
 		spin_unlock(&drm->tile.lock);
 	}
@@ -965,7 +965,8 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr,
 		if (ret == 0) {
 			ret = nouveau_fence_new(chan, false, &fence);
 			if (ret == 0) {
-				ret = ttm_bo_move_accel_cleanup(bo, fence,
+				ret = ttm_bo_move_accel_cleanup(bo,
+								&fence->base,
 								evict,
 								no_wait_gpu,
 								new_mem);
@@ -1151,8 +1152,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
 {
 	struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
 	struct drm_device *dev = drm->dev;
+	struct fence *fence = reservation_object_get_excl(bo->resv);
 
-	nv10_bo_put_tile_region(dev, *old_tile, bo->sync_obj);
+	nv10_bo_put_tile_region(dev, *old_tile, fence);
 	*old_tile = new_tile;
 }
 
@@ -1423,47 +1425,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 	ttm_pool_unpopulate(ttm);
 }
 
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-	nouveau_fence_unref((struct nouveau_fence **)sync_obj);
-}
-
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
 	struct reservation_object *resv = nvbo->bo.resv;
 
-	nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
-	nvbo->bo.sync_obj = nouveau_fence_ref(fence);
-
 	reservation_object_add_excl_fence(resv, &fence->base);
 }
 
-static void *
-nouveau_bo_fence_ref(void *sync_obj)
-{
-	return nouveau_fence_ref(sync_obj);
-}
-
-static bool
-nouveau_bo_fence_signalled(void *sync_obj)
-{
-	return nouveau_fence_done(sync_obj);
-}
-
-static int
-nouveau_bo_fence_wait(void *sync_obj, bool lazy, bool intr)
-{
-	return nouveau_fence_wait(sync_obj, lazy, intr);
-}
-
-static int
-nouveau_bo_fence_flush(void *sync_obj)
-{
-	return 0;
-}
-
 struct ttm_bo_driver nouveau_bo_driver = {
 	.ttm_tt_create = &nouveau_ttm_tt_create,
 	.ttm_tt_populate = &nouveau_ttm_tt_populate,
@@ -1474,11 +1443,6 @@ struct ttm_bo_driver nouveau_bo_driver = {
 	.move_notify = nouveau_bo_move_ntfy,
 	.move = nouveau_bo_move,
 	.verify_access = nouveau_bo_verify_access,
-	.sync_obj_signaled = nouveau_bo_fence_signalled,
-	.sync_obj_wait = nouveau_bo_fence_wait,
-	.sync_obj_flush = nouveau_bo_fence_flush,
-	.sync_obj_unref = nouveau_bo_fence_unref,
-	.sync_obj_ref = nouveau_bo_fence_ref,
 	.fault_reserve_notify = &nouveau_ttm_fault_reserve_notify,
 	.io_mem_reserve = &nouveau_ttm_io_mem_reserve,
 	.io_mem_free = &nouveau_ttm_io_mem_free,
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 9a9e04985826..b1aba6f79605 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -139,17 +139,18 @@ static bool nouveau_fence_is_signaled(struct fence *f)
 }
 
 void
-nouveau_fence_work(struct nouveau_fence *fence,
+nouveau_fence_work(struct fence *fence,
 		   void (*func)(void *), void *data)
 {
 	struct nouveau_fence_work *work;
 
-	if (fence_is_signaled(&fence->base))
+	if (fence_is_signaled(fence))
 		goto err;
 
 	work = kmalloc(sizeof(*work), GFP_KERNEL);
 	if (!work) {
-		WARN_ON(nouveau_fence_wait(fence, false, false));
+		WARN_ON(nouveau_fence_wait((struct nouveau_fence *)fence,
+					   false, false));
 		goto err;
 	}
 
@@ -157,7 +158,7 @@ nouveau_fence_work(struct nouveau_fence *fence,
 	work->func = func;
 	work->data = data;
 
-	if (fence_add_callback(&fence->base, &work->cb, nouveau_fence_work_cb) < 0)
+	if (fence_add_callback(fence, &work->cb, nouveau_fence_work_cb) < 0)
 		goto err_free;
 	return;
 
@@ -322,14 +323,9 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
 	struct reservation_object_list *fobj;
 	int ret = 0, i;
 
-	fence = nvbo->bo.sync_obj;
-	if (fence && fence_is_signaled(fence)) {
-		nouveau_fence_unref((struct nouveau_fence **)
-				    &nvbo->bo.sync_obj);
-		fence = NULL;
-	}
+	fence = reservation_object_get_excl(resv);
 
-	if (fence) {
+	if (fence && !fence_is_signaled(fence)) {
 		struct nouveau_fence *f = container_of(fence,
 						       struct nouveau_fence,
 						       base);
@@ -345,12 +341,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan)
 	if (ret)
 		return ret;
 
-	fence = reservation_object_get_excl(resv);
-	if (fence && !nouveau_local_fence(fence, chan->drm))
-		ret = fence_wait(fence, true);
-
 	fobj = reservation_object_get_list(resv);
-	if (!fobj || ret)
+	if (!fobj)
 		return ret;
 
 	for (i = 0; i < fobj->shared_count && !ret; ++i) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 1989ec22e66e..41abc8a44e3c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -26,7 +26,7 @@ void nouveau_fence_unref(struct nouveau_fence **);
 
 int  nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *);
 bool nouveau_fence_done(struct nouveau_fence *);
-void nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *);
+void nouveau_fence_work(struct fence *, void (*)(void *), void *);
 int  nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
 int  nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a61530becfb9..4beaa897adad 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -100,13 +100,12 @@ static void
 nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
 {
 	const bool mapped = nvbo->bo.mem.mem_type != TTM_PL_SYSTEM;
-	struct nouveau_fence *fence = NULL;
+	struct fence *fence = NULL;
 
 	list_del(&vma->head);
 
-	if (mapped) {
-		fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-	}
+	if (mapped)
+		fence = reservation_object_get_excl(nvbo->bo.resv);
 
 	if (fence) {
 		nouveau_fence_work(fence, nouveau_gem_object_delete, vma);
@@ -116,7 +115,6 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct nouveau_vma *vma)
 		nouveau_vm_put(vma);
 		kfree(vma);
 	}
-	nouveau_fence_unref(&fence);
 }
 
 void
@@ -876,8 +874,12 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 	ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
 	if (!ret) {
 		ret = ttm_bo_wait(&nvbo->bo, true, true, true);
-		if (!no_wait && ret)
-			fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+		if (!no_wait && ret) {
+			struct fence *excl;
+
+			excl = reservation_object_get_excl(nvbo->bo.resv);
+			fence = nouveau_fence_ref((struct nouveau_fence *)excl);
+		}
 
 		ttm_bo_unreserve(&nvbo->bo);
 	}
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c
index 0d144e0646d6..a4a63fd84803 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -67,9 +67,9 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
 		rel = fobj ? fobj->shared_count : 0;
 		rcu_read_unlock();
 
-		seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
-			   (unsigned long)bo->gem_base.size, bo->pin_count,
-			   bo->tbo.sync_obj, rel);
+		seq_printf(m, "size %ld, pc %d, num releases %d\n",
+			   (unsigned long)bo->gem_base.size,
+			   bo->pin_count, rel);
 	}
 	spin_unlock(&qdev->release_lock);
 	return 0;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index d547cbdebeb4..74e2117ee0e6 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -280,9 +280,7 @@ struct qxl_device {
 	uint8_t		slot_gen_bits;
 	uint64_t	va_slot_mask;
 
-	/* XXX: when rcu becomes available, release_lock can be killed */
 	spinlock_t	release_lock;
-	spinlock_t	fence_lock;
 	struct idr	release_idr;
 	uint32_t	release_seqno;
 	spinlock_t release_idr_lock;
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index a9e7c30e92c5..7234561e09d9 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -224,7 +224,6 @@ static int qxl_device_init(struct qxl_device *qdev,
 	idr_init(&qdev->release_idr);
 	spin_lock_init(&qdev->release_idr_lock);
 	spin_lock_init(&qdev->release_lock);
-	spin_lock_init(&qdev->fence_lock);
 
 	idr_init(&qdev->surf_id_idr);
 	spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_object.h b/drivers/gpu/drm/qxl/qxl_object.h
index 98395b223ad0..9da7becbdb34 100644
--- a/drivers/gpu/drm/qxl/qxl_object.h
+++ b/drivers/gpu/drm/qxl/qxl_object.h
@@ -78,8 +78,8 @@ static inline int qxl_bo_wait(struct qxl_bo *bo, u32 *mem_type,
 	}
 	if (mem_type)
 		*mem_type = bo->tbo.mem.mem_type;
-	if (bo->tbo.sync_obj)
-		r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
+
+	r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
 	ttm_bo_unreserve(&bo->tbo);
 	return r;
 }
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 3b1398d735f4..cfd4b8036269 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -464,9 +464,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release *release)
 		bo = entry->bo;
 		qbo = to_qxl_bo(bo);
 
-		if (!entry->bo->sync_obj)
-			entry->bo->sync_obj = qbo;
-
 		reservation_object_add_shared_fence(bo->resv, &release->base);
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 80879e38e447..439e2b23465d 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -361,105 +361,6 @@ static int qxl_bo_move(struct ttm_buffer_object *bo,
 	return ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
 }
 
-static bool qxl_sync_obj_signaled(void *sync_obj);
-
-static int qxl_sync_obj_wait(void *sync_obj,
-			     bool lazy, bool interruptible)
-{
-	struct qxl_bo *bo = (struct qxl_bo *)sync_obj;
-	struct qxl_device *qdev = bo->gem_base.dev->dev_private;
-	struct reservation_object_list *fobj;
-	int count = 0, sc = 0, num_release = 0;
-	bool have_drawable_releases;
-
-retry:
-	if (sc == 0) {
-		if (bo->type == QXL_GEM_DOMAIN_SURFACE)
-			qxl_update_surface(qdev, bo);
-	} else if (sc >= 1) {
-		qxl_io_notify_oom(qdev);
-	}
-
-	sc++;
-
-	for (count = 0; count < 10; count++) {
-		if (qxl_sync_obj_signaled(sync_obj))
-			return 0;
-
-		if (!qxl_queue_garbage_collect(qdev, true))
-			break;
-	}
-
-	have_drawable_releases = false;
-	num_release = 0;
-
-	spin_lock(&qdev->release_lock);
-	fobj = bo->tbo.resv->fence;
-	for (count = 0; fobj && count < fobj->shared_count; count++) {
-		struct qxl_release *release;
-
-		release = container_of(fobj->shared[count],
-				       struct qxl_release, base);
-
-		if (fence_is_signaled(&release->base))
-			continue;
-
-		num_release++;
-
-		if (release->type == QXL_RELEASE_DRAWABLE)
-			have_drawable_releases = true;
-	}
-	spin_unlock(&qdev->release_lock);
-
-	qxl_queue_garbage_collect(qdev, true);
-
-	if (have_drawable_releases || sc < 4) {
-		if (sc > 2)
-			/* back off */
-			usleep_range(500, 1000);
-		if (have_drawable_releases && sc > 300) {
-			WARN(1, "sync obj %d still has outstanding releases %d %d %d %ld %d\n", sc, bo->surface_id, bo->is_primary, bo->pin_count, (unsigned long)bo->gem_base.size, num_release);
-			return -EBUSY;
-		}
-		goto retry;
-	}
-	return 0;
-}
-
-static int qxl_sync_obj_flush(void *sync_obj)
-{
-	return 0;
-}
-
-static void qxl_sync_obj_unref(void **sync_obj)
-{
-	*sync_obj = NULL;
-}
-
-static void *qxl_sync_obj_ref(void *sync_obj)
-{
-	return sync_obj;
-}
-
-static bool qxl_sync_obj_signaled(void *sync_obj)
-{
-	struct qxl_bo *qbo = (struct qxl_bo *)sync_obj;
-	struct qxl_device *qdev = qbo->gem_base.dev->dev_private;
-	struct reservation_object_list *fobj;
-	bool ret = true;
-	unsigned i;
-
-	spin_lock(&qdev->release_lock);
-	fobj = qbo->tbo.resv->fence;
-	for (i = 0; fobj && i < fobj->shared_count; ++i) {
-		ret = fence_is_signaled(fobj->shared[i]);
-		if (!ret)
-			break;
-	}
-	spin_unlock(&qdev->release_lock);
-	return ret;
-}
-
 static void qxl_bo_move_notify(struct ttm_buffer_object *bo,
 			       struct ttm_mem_reg *new_mem)
 {
@@ -486,11 +387,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
 	.verify_access = &qxl_verify_access,
 	.io_mem_reserve = &qxl_ttm_io_mem_reserve,
 	.io_mem_free = &qxl_ttm_io_mem_free,
-	.sync_obj_signaled = &qxl_sync_obj_signaled,
-	.sync_obj_wait = &qxl_sync_obj_wait,
-	.sync_obj_flush = &qxl_sync_obj_flush,
-	.sync_obj_unref = &qxl_sync_obj_unref,
-	.sync_obj_ref = &qxl_sync_obj_ref,
 	.move_notify = &qxl_bo_move_notify,
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 2b6e0ebcc13a..af1dae9ce829 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -222,11 +222,17 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
 	int i;
 
 	for (i = 0; i < p->nrelocs; i++) {
+		struct reservation_object *resv;
+		struct fence *fence;
+
 		if (!p->relocs[i].robj)
 			continue;
 
+		resv = p->relocs[i].robj->tbo.resv;
+		fence = reservation_object_get_excl(resv);
+
 		radeon_semaphore_sync_to(p->ib.semaphore,
-					 p->relocs[i].robj->tbo.sync_obj);
+					 (struct radeon_fence *)fence);
 	}
 }
 
@@ -389,7 +395,7 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 
 		ttm_eu_fence_buffer_objects(&parser->ticket,
 					    &parser->validated,
-					    parser->ib.fence);
+					    &parser->ib.fence->base);
 	} else if (backoff) {
 		ttm_eu_backoff_reservation(&parser->ticket,
 					   &parser->validated);
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index 6a7340289ddd..f977b552e1ca 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -363,6 +363,7 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 	struct drm_gem_object *obj;
 	struct radeon_bo *rbo;
 	struct radeon_unpin_work *work;
+	struct fence *fence;
 	unsigned long flags;
 	u32 tiling_flags, pitch_pixels;
 	u64 base;
@@ -386,9 +387,6 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 	obj = new_radeon_fb->obj;
 	rbo = gem_to_radeon_bo(obj);
 
-	if (rbo->tbo.sync_obj)
-		work->fence = radeon_fence_ref(rbo->tbo.sync_obj);
-
 	INIT_WORK(&work->work, radeon_unpin_work_func);
 
 	/* We borrow the event spin lock for protecting unpin_work */
@@ -411,6 +409,20 @@ static int radeon_crtc_page_flip(struct drm_crtc *crtc,
 		DRM_ERROR("failed to reserve new rbo buffer before flip\n");
 		goto pflip_cleanup;
 	}
+
+	fence = reservation_object_get_excl(rbo->tbo.resv);
+	if (fence) {
+		fence_get(fence);
+		work->fence = (struct radeon_fence *)fence;
+
+		if (!fence->ops->signaled)
+			/*
+			* make sure if this fence doesn't belong to this
+			* device that it will still signal completion
+			*/
+			fence_enable_sw_signaling(fence);
+	}
+
 	/* Only 27 bit offset for legacy CRTC */
 	r = radeon_bo_pin_restricted(rbo, RADEON_GEM_DOMAIN_VRAM,
 				     ASIC_IS_AVIVO(rdev) ? 0 : 1 << 27, &base);
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 3b27cac9240b..e082feae53d9 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -727,8 +727,8 @@ int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type, bool no_wait)
 		return r;
 	if (mem_type)
 		*mem_type = bo->tbo.mem.mem_type;
-	if (bo->tbo.sync_obj)
-		r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
+
+	r = ttm_bo_wait(&bo->tbo, true, true, no_wait);
 	ttm_bo_unreserve(&bo->tbo);
 	return r;
 }
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index c8a8a5144ec1..715e29f984c1 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -265,12 +265,12 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 	BUILD_BUG_ON((PAGE_SIZE % RADEON_GPU_PAGE_SIZE) != 0);
 
 	/* sync other rings */
-	fence = bo->sync_obj;
+	fence = (struct radeon_fence *)reservation_object_get_excl(bo->resv);
 	r = radeon_copy(rdev, old_start, new_start,
 			new_mem->num_pages * (PAGE_SIZE / RADEON_GPU_PAGE_SIZE), /* GPU pages */
 			&fence);
 	/* FIXME: handle copy error */
-	r = ttm_bo_move_accel_cleanup(bo, (void *)fence,
+	r = ttm_bo_move_accel_cleanup(bo, &fence->base,
 				      evict, no_wait_gpu, new_mem);
 	radeon_fence_unref(&fence);
 	return r;
@@ -483,31 +483,6 @@ static void radeon_ttm_io_mem_free(struct ttm_bo_device *bdev, struct ttm_mem_re
 {
 }
 
-static int radeon_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
-{
-	return radeon_fence_wait((struct radeon_fence *)sync_obj, interruptible);
-}
-
-static int radeon_sync_obj_flush(void *sync_obj)
-{
-	return 0;
-}
-
-static void radeon_sync_obj_unref(void **sync_obj)
-{
-	radeon_fence_unref((struct radeon_fence **)sync_obj);
-}
-
-static void *radeon_sync_obj_ref(void *sync_obj)
-{
-	return radeon_fence_ref((struct radeon_fence *)sync_obj);
-}
-
-static bool radeon_sync_obj_signaled(void *sync_obj)
-{
-	return radeon_fence_signaled((struct radeon_fence *)sync_obj);
-}
-
 /*
  * TTM backend functions.
  */
@@ -685,11 +660,6 @@ static struct ttm_bo_driver radeon_bo_driver = {
 	.evict_flags = &radeon_evict_flags,
 	.move = &radeon_bo_move,
 	.verify_access = &radeon_verify_access,
-	.sync_obj_signaled = &radeon_sync_obj_signaled,
-	.sync_obj_wait = &radeon_sync_obj_wait,
-	.sync_obj_flush = &radeon_sync_obj_flush,
-	.sync_obj_unref = &radeon_sync_obj_unref,
-	.sync_obj_ref = &radeon_sync_obj_ref,
 	.move_notify = &radeon_bo_move_notify,
 	.fault_reserve_notify = &radeon_bo_fault_reserve_notify,
 	.io_mem_reserve = &radeon_ttm_io_mem_reserve,
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index 2f93fef15aab..08eb88067ffa 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -356,6 +356,7 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 {
 	int32_t *msg, msg_type, handle;
 	unsigned img_size = 0;
+	struct fence *f;
 	void *ptr;
 
 	int i, r;
@@ -365,8 +366,9 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 		return -EINVAL;
 	}
 
-	if (bo->tbo.sync_obj) {
-		r = radeon_fence_wait(bo->tbo.sync_obj, false);
+	f = reservation_object_get_excl(bo->tbo.resv);
+	if (f) {
+		r = radeon_fence_wait((struct radeon_fence *)f, false);
 		if (r) {
 			DRM_ERROR("Failed waiting for UVD message (%d)!\n", r);
 			return r;
@@ -649,7 +651,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
 	r = radeon_ib_schedule(rdev, &ib, NULL);
 	if (r)
 		goto err;
-	ttm_eu_fence_buffer_objects(&ticket, &head, ib.fence);
+	ttm_eu_fence_buffer_objects(&ticket, &head, &ib.fence->base);
 
 	if (fence)
 		*fence = radeon_fence_ref(ib.fence);
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index f4fd72477a71..8bf660e5bcc3 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -388,7 +388,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
 	if (r)
                 goto error;
 
-	ttm_eu_fence_buffer_objects(&ticket, &head, ib.fence);
+	ttm_eu_fence_buffer_objects(&ticket, &head, &ib.fence->base);
 	radeon_ib_free(rdev, &ib);
 
 	return 0;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ce0434377223..31c4a6dd722d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -40,6 +40,7 @@
 #include <linux/file.h>
 #include <linux/module.h>
 #include <linux/atomic.h>
+#include <linux/reservation.h>
 
 #define TTM_ASSERT_LOCKED(param)
 #define TTM_DEBUG(fmt, arg...)
@@ -141,7 +142,6 @@ static void ttm_bo_release_list(struct kref *list_kref)
 	BUG_ON(atomic_read(&bo->list_kref.refcount));
 	BUG_ON(atomic_read(&bo->kref.refcount));
 	BUG_ON(atomic_read(&bo->cpu_writers));
-	BUG_ON(bo->sync_obj != NULL);
 	BUG_ON(bo->mem.mm_node != NULL);
 	BUG_ON(!list_empty(&bo->lru));
 	BUG_ON(!list_empty(&bo->ddestroy));
@@ -402,12 +402,30 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
 	ww_mutex_unlock (&bo->resv->lock);
 }
 
+static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
+{
+	struct reservation_object_list *fobj;
+	struct fence *fence;
+	int i;
+
+	fobj = reservation_object_get_list(bo->resv);
+	fence = reservation_object_get_excl(bo->resv);
+	if (fence && !fence->ops->signaled)
+		fence_enable_sw_signaling(fence);
+
+	for (i = 0; fobj && i < fobj->shared_count; ++i) {
+		fence = rcu_dereference_protected(fobj->shared[i],
+					reservation_object_held(bo->resv));
+
+		if (!fence->ops->signaled)
+			fence_enable_sw_signaling(fence);
+	}
+}
+
 static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
 	struct ttm_bo_global *glob = bo->glob;
-	struct ttm_bo_driver *driver = bdev->driver;
-	void *sync_obj = NULL;
 	int put_count;
 	int ret;
 
@@ -415,9 +433,7 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 	ret = __ttm_bo_reserve(bo, false, true, false, 0);
 
 	if (!ret) {
-		(void) ttm_bo_wait(bo, false, false, true);
-
-		if (!bo->sync_obj) {
+		if (!ttm_bo_wait(bo, false, false, true)) {
 			put_count = ttm_bo_del_from_lru(bo);
 
 			spin_unlock(&glob->lru_lock);
@@ -426,8 +442,8 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			ttm_bo_list_ref_sub(bo, put_count, true);
 
 			return;
-		}
-		sync_obj = driver->sync_obj_ref(bo->sync_obj);
+		} else
+			ttm_bo_flush_all_fences(bo);
 
 		/*
 		 * Make NO_EVICT bos immediately available to
@@ -446,14 +462,70 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 	list_add_tail(&bo->ddestroy, &bdev->ddestroy);
 	spin_unlock(&glob->lru_lock);
 
-	if (sync_obj) {
-		driver->sync_obj_flush(sync_obj);
-		driver->sync_obj_unref(&sync_obj);
-	}
 	schedule_delayed_work(&bdev->wq,
 			      ((HZ / 100) < 1) ? 1 : HZ / 100);
 }
 
+static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
+				     bool interruptible)
+{
+	struct ttm_bo_global *glob = bo->glob;
+	struct reservation_object_list *fobj;
+	struct fence *excl = NULL;
+	struct fence **shared = NULL;
+	u32 shared_count = 0, i;
+	int ret = 0;
+
+	fobj = reservation_object_get_list(bo->resv);
+	if (fobj && fobj->shared_count) {
+		shared = kmalloc(sizeof(*shared) * fobj->shared_count,
+				 GFP_KERNEL);
+
+		if (!shared) {
+			ret = -ENOMEM;
+			__ttm_bo_unreserve(bo);
+			spin_unlock(&glob->lru_lock);
+			return ret;
+		}
+
+		for (i = 0; i < fobj->shared_count; ++i) {
+			if (!fence_is_signaled(fobj->shared[i])) {
+				fence_get(fobj->shared[i]);
+				shared[shared_count++] = fobj->shared[i];
+			}
+		}
+		if (!shared_count) {
+			kfree(shared);
+			shared = NULL;
+		}
+	}
+
+	excl = reservation_object_get_excl(bo->resv);
+	if (excl && !fence_is_signaled(excl))
+		fence_get(excl);
+	else
+		excl = NULL;
+
+	__ttm_bo_unreserve(bo);
+	spin_unlock(&glob->lru_lock);
+
+	if (excl) {
+		ret = fence_wait(excl, interruptible);
+		fence_put(excl);
+	}
+
+	if (shared_count > 0) {
+		for (i = 0; i < shared_count; ++i) {
+			if (!ret)
+				ret = fence_wait(shared[i], interruptible);
+			fence_put(shared[i]);
+		}
+		kfree(shared);
+	}
+
+	return ret;
+}
+
 /**
  * function ttm_bo_cleanup_refs_and_unlock
  * If bo idle, remove from delayed- and lru lists, and unref.
@@ -470,8 +542,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 					  bool interruptible,
 					  bool no_wait_gpu)
 {
-	struct ttm_bo_device *bdev = bo->bdev;
-	struct ttm_bo_driver *driver = bdev->driver;
 	struct ttm_bo_global *glob = bo->glob;
 	int put_count;
 	int ret;
@@ -479,20 +549,7 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 	ret = ttm_bo_wait(bo, false, false, true);
 
 	if (ret && !no_wait_gpu) {
-		void *sync_obj;
-
-		/*
-		 * Take a reference to the fence and unreserve,
-		 * at this point the buffer should be dead, so
-		 * no new sync objects can be attached.
-		 */
-		sync_obj = driver->sync_obj_ref(bo->sync_obj);
-
-		__ttm_bo_unreserve(bo);
-		spin_unlock(&glob->lru_lock);
-
-		ret = driver->sync_obj_wait(sync_obj, false, interruptible);
-		driver->sync_obj_unref(&sync_obj);
+		ret = ttm_bo_unreserve_and_wait(bo, interruptible);
 		if (ret)
 			return ret;
 
@@ -1513,41 +1570,51 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
 
 EXPORT_SYMBOL(ttm_bo_unmap_virtual);
 
-
 int ttm_bo_wait(struct ttm_buffer_object *bo,
 		bool lazy, bool interruptible, bool no_wait)
 {
-	struct ttm_bo_driver *driver = bo->bdev->driver;
-	void *sync_obj;
-	int ret = 0;
-
-	lockdep_assert_held(&bo->resv->lock.base);
+	struct reservation_object_list *fobj;
+	struct reservation_object *resv;
+	struct fence *excl;
+	long timeout = 15 * HZ;
+	int i;
 
-	if (likely(bo->sync_obj == NULL))
-		return 0;
+	resv = bo->resv;
+	fobj = reservation_object_get_list(resv);
+	excl = reservation_object_get_excl(resv);
+	if (excl) {
+		if (!fence_is_signaled(excl)) {
+			if (no_wait)
+				return -EBUSY;
 
-	if (bo->sync_obj) {
-		if (driver->sync_obj_signaled(bo->sync_obj)) {
-			driver->sync_obj_unref(&bo->sync_obj);
-			clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
-			return 0;
+			timeout = fence_wait_timeout(excl,
+						     interruptible, timeout);
 		}
+	}
 
-		if (no_wait)
-			return -EBUSY;
+	for (i = 0; fobj && timeout > 0 && i < fobj->shared_count; ++i) {
+		struct fence *fence;
+		fence = rcu_dereference_protected(fobj->shared[i],
+						reservation_object_held(resv));
 
-		sync_obj = driver->sync_obj_ref(bo->sync_obj);
-		ret = driver->sync_obj_wait(sync_obj,
-					    lazy, interruptible);
+		if (!fence_is_signaled(fence)) {
+			if (no_wait)
+				return -EBUSY;
 
-		if (likely(ret == 0)) {
-			clear_bit(TTM_BO_PRIV_FLAG_MOVING,
-				  &bo->priv_flags);
-			driver->sync_obj_unref(&bo->sync_obj);
+			timeout = fence_wait_timeout(fence,
+						     interruptible, timeout);
 		}
-		driver->sync_obj_unref(&sync_obj);
 	}
-	return ret;
+
+	if (timeout < 0)
+		return timeout;
+
+	if (timeout == 0)
+		return -EBUSY;
+
+	reservation_object_add_excl_fence(resv, NULL);
+	clear_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
+	return 0;
 }
 EXPORT_SYMBOL(ttm_bo_wait);
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 23db594e55c0..fe806c1ded9e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -37,6 +37,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
+#include <linux/reservation.h>
 
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
@@ -444,8 +445,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 				      struct ttm_buffer_object **new_obj)
 {
 	struct ttm_buffer_object *fbo;
-	struct ttm_bo_device *bdev = bo->bdev;
-	struct ttm_bo_driver *driver = bdev->driver;
 	int ret;
 
 	fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
@@ -466,10 +465,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 	drm_vma_node_reset(&fbo->vma_node);
 	atomic_set(&fbo->cpu_writers, 0);
 
-	if (bo->sync_obj)
-		fbo->sync_obj = driver->sync_obj_ref(bo->sync_obj);
-	else
-		fbo->sync_obj = NULL;
 	kref_init(&fbo->list_kref);
 	kref_init(&fbo->kref);
 	fbo->destroy = &ttm_transfered_destroy;
@@ -642,28 +637,20 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map)
 EXPORT_SYMBOL(ttm_bo_kunmap);
 
 int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
-			      void *sync_obj,
+			      struct fence *fence,
 			      bool evict,
 			      bool no_wait_gpu,
 			      struct ttm_mem_reg *new_mem)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
-	struct ttm_bo_driver *driver = bdev->driver;
 	struct ttm_mem_type_manager *man = &bdev->man[new_mem->mem_type];
 	struct ttm_mem_reg *old_mem = &bo->mem;
 	int ret;
 	struct ttm_buffer_object *ghost_obj;
-	void *tmp_obj = NULL;
 
-	if (bo->sync_obj) {
-		tmp_obj = bo->sync_obj;
-		bo->sync_obj = NULL;
-	}
-	bo->sync_obj = driver->sync_obj_ref(sync_obj);
+	reservation_object_add_excl_fence(bo->resv, fence);
 	if (evict) {
 		ret = ttm_bo_wait(bo, false, false, false);
-		if (tmp_obj)
-			driver->sync_obj_unref(&tmp_obj);
 		if (ret)
 			return ret;
 
@@ -684,13 +671,13 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 		 */
 
 		set_bit(TTM_BO_PRIV_FLAG_MOVING, &bo->priv_flags);
-		if (tmp_obj)
-			driver->sync_obj_unref(&tmp_obj);
 
 		ret = ttm_buffer_object_transfer(bo, &ghost_obj);
 		if (ret)
 			return ret;
 
+		reservation_object_add_excl_fence(ghost_obj->resv, fence);
+
 		/**
 		 * If we're not moving to fixed memory, the TTM object
 		 * needs to stay alive. Otherwhise hang it on the ghost
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 108730e9147b..adafc0f8ec06 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -163,7 +163,7 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 EXPORT_SYMBOL(ttm_eu_reserve_buffers);
 
 void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
-				 struct list_head *list, void *sync_obj)
+				 struct list_head *list, struct fence *fence)
 {
 	struct ttm_validate_buffer *entry;
 	struct ttm_buffer_object *bo;
@@ -183,18 +183,12 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
 
 	list_for_each_entry(entry, list, head) {
 		bo = entry->bo;
-		entry->old_sync_obj = bo->sync_obj;
-		bo->sync_obj = driver->sync_obj_ref(sync_obj);
+		reservation_object_add_excl_fence(bo->resv, fence);
 		ttm_bo_add_to_lru(bo);
 		__ttm_bo_unreserve(bo);
 	}
 	spin_unlock(&glob->lru_lock);
 	if (ticket)
 		ww_acquire_fini(ticket);
-
-	list_for_each_entry(entry, list, head) {
-		if (entry->old_sync_obj)
-			driver->sync_obj_unref(&entry->old_sync_obj);
-	}
 }
 EXPORT_SYMBOL(ttm_eu_fence_buffer_objects);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index f15718cc631d..656c88485e14 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -768,41 +768,6 @@ static int vmw_ttm_fault_reserve_notify(struct ttm_buffer_object *bo)
 }
 
 /**
- * FIXME: We're using the old vmware polling method to sync.
- * Do this with fences instead.
- */
-
-static void *vmw_sync_obj_ref(void *sync_obj)
-{
-
-	return (void *)
-		vmw_fence_obj_reference((struct vmw_fence_obj *) sync_obj);
-}
-
-static void vmw_sync_obj_unref(void **sync_obj)
-{
-	vmw_fence_obj_unreference((struct vmw_fence_obj **) sync_obj);
-}
-
-static int vmw_sync_obj_flush(void *sync_obj)
-{
-	vmw_fence_obj_flush((struct vmw_fence_obj *) sync_obj);
-	return 0;
-}
-
-static bool vmw_sync_obj_signaled(void *sync_obj)
-{
-	return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
-}
-
-static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
-{
-	return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
-				  lazy, interruptible,
-				  VMW_FENCE_WAIT_TIMEOUT);
-}
-
-/**
  * vmw_move_notify - TTM move_notify_callback
  *
  * @bo:             The TTM buffer object about to move.
@@ -839,11 +804,6 @@ struct ttm_bo_driver vmw_bo_driver = {
 	.evict_flags = vmw_evict_flags,
 	.move = NULL,
 	.verify_access = vmw_verify_access,
-	.sync_obj_signaled = vmw_sync_obj_signaled,
-	.sync_obj_wait = vmw_sync_obj_wait,
-	.sync_obj_flush = vmw_sync_obj_flush,
-	.sync_obj_unref = vmw_sync_obj_unref,
-	.sync_obj_ref = vmw_sync_obj_ref,
 	.move_notify = vmw_move_notify,
 	.swap_notify = vmw_swap_notify,
 	.fault_reserve_notify = &vmw_ttm_fault_reserve_notify,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 6688a6341486..20a1a866ceeb 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -1419,22 +1419,16 @@ void vmw_fence_single_bo(struct ttm_buffer_object *bo,
 			 struct vmw_fence_obj *fence)
 {
 	struct ttm_bo_device *bdev = bo->bdev;
-	struct vmw_fence_obj *old_fence_obj;
+
 	struct vmw_private *dev_priv =
 		container_of(bdev, struct vmw_private, bdev);
 
 	if (fence == NULL) {
 		vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);
+		reservation_object_add_excl_fence(bo->resv, &fence->base);
+		fence_put(&fence->base);
 	} else
-		vmw_fence_obj_reference(fence);
-
-	reservation_object_add_excl_fence(bo->resv, &fence->base);
-
-	old_fence_obj = bo->sync_obj;
-	bo->sync_obj = fence;
-
-	if (old_fence_obj)
-		vmw_fence_obj_unreference(&old_fence_obj);
+		reservation_object_add_excl_fence(bo->resv, &fence->base);
 }
 
 /**
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index f34d59b67218..cdc75814a861 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -163,7 +163,6 @@ struct ttm_tt;
  * @lru: List head for the lru list.
  * @ddestroy: List head for the delayed destroy list.
  * @swap: List head for swap LRU list.
- * @sync_obj: Pointer to a synchronization object.
  * @priv_flags: Flags describing buffer object internal state.
  * @vma_node: Address space manager node.
  * @offset: The current GPU offset, which can have different meanings
@@ -230,7 +229,6 @@ struct ttm_buffer_object {
 	 * Members protected by a bo reservation.
 	 */
 
-	void *sync_obj;
 	unsigned long priv_flags;
 
 	struct drm_vma_offset_node vma_node;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 0aa6caa59415..71a345ee92d5 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -309,11 +309,6 @@ struct ttm_mem_type_manager {
  * @move: Callback for a driver to hook in accelerated functions to
  * move a buffer.
  * If set to NULL, a potentially slow memcpy() move is used.
- * @sync_obj_signaled: See ttm_fence_api.h
- * @sync_obj_wait: See ttm_fence_api.h
- * @sync_obj_flush: See ttm_fence_api.h
- * @sync_obj_unref: See ttm_fence_api.h
- * @sync_obj_ref: See ttm_fence_api.h
  */
 
 struct ttm_bo_driver {
@@ -415,23 +410,6 @@ struct ttm_bo_driver {
 	int (*verify_access) (struct ttm_buffer_object *bo,
 			      struct file *filp);
 
-	/**
-	 * In case a driver writer dislikes the TTM fence objects,
-	 * the driver writer can replace those with sync objects of
-	 * his / her own. If it turns out that no driver writer is
-	 * using these. I suggest we remove these hooks and plug in
-	 * fences directly. The bo driver needs the following functionality:
-	 * See the corresponding functions in the fence object API
-	 * documentation.
-	 */
-
-	bool (*sync_obj_signaled) (void *sync_obj);
-	int (*sync_obj_wait) (void *sync_obj,
-			      bool lazy, bool interruptible);
-	int (*sync_obj_flush) (void *sync_obj);
-	void (*sync_obj_unref) (void **sync_obj);
-	void *(*sync_obj_ref) (void *sync_obj);
-
 	/* hook to notify driver about a driver move so it
 	 * can do tiling things */
 	void (*move_notify)(struct ttm_buffer_object *bo,
@@ -1031,7 +1009,7 @@ extern void ttm_bo_free_old_node(struct ttm_buffer_object *bo);
  * ttm_bo_move_accel_cleanup.
  *
  * @bo: A pointer to a struct ttm_buffer_object.
- * @sync_obj: A sync object that signals when moving is complete.
+ * @fence: A fence object that signals when moving is complete.
  * @evict: This is an evict move. Don't return until the buffer is idle.
  * @no_wait_gpu: Return immediately if the GPU is busy.
  * @new_mem: struct ttm_mem_reg indicating where to move.
@@ -1045,7 +1023,7 @@ extern void ttm_bo_free_old_node(struct ttm_buffer_object *bo);
  */
 
 extern int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
-				     void *sync_obj,
+				     struct fence *fence,
 				     bool evict, bool no_wait_gpu,
 				     struct ttm_mem_reg *new_mem);
 /**
diff --git a/include/drm/ttm/ttm_execbuf_util.h b/include/drm/ttm/ttm_execbuf_util.h
index 8490cb8ee0d8..ff11a424f752 100644
--- a/include/drm/ttm/ttm_execbuf_util.h
+++ b/include/drm/ttm/ttm_execbuf_util.h
@@ -39,16 +39,11 @@
  *
  * @head:           list head for thread-private list.
  * @bo:             refcounted buffer object pointer.
- * @reserved:       Indicates whether @bo has been reserved for validation.
- * @removed:        Indicates whether @bo has been removed from lru lists.
- * @put_count:      Number of outstanding references on bo::list_kref.
- * @old_sync_obj:   Pointer to a sync object about to be unreferenced
  */
 
 struct ttm_validate_buffer {
 	struct list_head head;
 	struct ttm_buffer_object *bo;
-	void *old_sync_obj;
 };
 
 /**
@@ -100,7 +95,7 @@ extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
  *
  * @ticket:      ww_acquire_ctx from reserve call
  * @list:        thread private list of ttm_validate_buffer structs.
- * @sync_obj:    The new sync object for the buffers.
+ * @fence:       The new exclusive fence for the buffers.
  *
  * This function should be called when command submission is complete, and
  * it will add a new sync object to bos pointed to by entries on @list.
@@ -109,6 +104,7 @@ extern int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
  */
 
 extern void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket,
-					struct list_head *list, void *sync_obj);
+					struct list_head *list,
+					struct fence *fence);
 
 #endif


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 13/16] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (11 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 12/16] drm/ttm: flip the switch, and convert to dma_fence Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 14/16] drm/radeon: use rcu waits in some ioctls Maarten Lankhorst
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

With the conversion to the reservation api this should be safe.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 4beaa897adad..c2ca894f6507 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -863,33 +863,29 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data,
 	struct drm_gem_object *gem;
 	struct nouveau_bo *nvbo;
 	bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
+	bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
 	int ret;
-	struct nouveau_fence *fence = NULL;
 
 	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
 	if (!gem)
 		return -ENOENT;
 	nvbo = nouveau_gem_object(gem);
 
-	ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
-	if (!ret) {
-		ret = ttm_bo_wait(&nvbo->bo, true, true, true);
-		if (!no_wait && ret) {
-			struct fence *excl;
-
-			excl = reservation_object_get_excl(nvbo->bo.resv);
-			fence = nouveau_fence_ref((struct nouveau_fence *)excl);
-		}
+	if (no_wait)
+		ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, write) ? 0 : -EBUSY;
+	else {
+		long lret;
 
-		ttm_bo_unreserve(&nvbo->bo);
+		lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, write, true, 30 * HZ);
+		if (!lret)
+			ret = -EBUSY;
+		else if (lret > 0)
+			ret = 0;
+		else
+			ret = lret;
 	}
 	drm_gem_object_unreference_unlocked(gem);
 
-	if (fence) {
-		ret = nouveau_fence_wait(fence, true, no_wait);
-		nouveau_fence_unref(&fence);
-	}
-
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 14/16] drm/radeon: use rcu waits in some ioctls
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (12 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 13/16] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 15/16] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 16/16] drm/ttm: use rcu in core ttm Maarten Lankhorst
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/radeon/radeon_gem.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index d09650c1d720..7ba883843668 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -107,9 +107,12 @@ static int radeon_gem_set_domain(struct drm_gem_object *gobj,
 	}
 	if (domain == RADEON_GEM_DOMAIN_CPU) {
 		/* Asking for cpu access wait for object idle */
-		r = radeon_bo_wait(robj, NULL, false);
-		if (r) {
-			printk(KERN_ERR "Failed to wait for object !\n");
+		r = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);
+		if (!r)
+			r = -EBUSY;
+
+		if (r < 0 && r != -EINTR) {
+			printk(KERN_ERR "Failed to wait for object: %i\n", r);
 			return r;
 		}
 	}
@@ -357,14 +360,20 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
 	struct drm_radeon_gem_wait_idle *args = data;
 	struct drm_gem_object *gobj;
 	struct radeon_bo *robj;
-	int r;
+	int r = 0;
+	long ret;
 
 	gobj = drm_gem_object_lookup(dev, filp, args->handle);
 	if (gobj == NULL) {
 		return -ENOENT;
 	}
 	robj = gem_to_radeon_bo(gobj);
-	r = radeon_bo_wait(robj, NULL, false);
+	ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);
+	if (ret == 0)
+		r = -EBUSY;
+	else if (ret < 0)
+		r = ret;
+
 	/* callback hw specific functions if any */
 	if (rdev->asic->ioctl_wait_idle)
 		robj->rdev->asic->ioctl_wait_idle(rdev, robj);


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 15/16] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (13 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 14/16] drm/radeon: use rcu waits in some ioctls Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  2014-05-14 14:58 ` [RFC PATCH v1 16/16] drm/ttm: use rcu in core ttm Maarten Lankhorst
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 20a1a866ceeb..79e950df3018 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,13 +567,16 @@ static int vmw_user_dmabuf_synccpu_grab(struct vmw_user_dma_buffer *user_bo,
 	int ret;
 
 	if (flags & drm_vmw_synccpu_allow_cs) {
-		ret = ttm_bo_reserve(bo, true, !!(flags & drm_vmw_synccpu_dontblock), false, 0);
-		if (!ret) {
-			ret = ttm_bo_wait(bo, false, true,
-					  !!(flags & drm_vmw_synccpu_dontblock));
-			ttm_bo_unreserve(bo);
-		}
-		return ret;
+		long lret;
+		if (flags & drm_vmw_synccpu_dontblock)
+			return reservation_object_test_signaled_rcu(bo->resv, true) ? 0 : -EBUSY;
+
+		lret = reservation_object_wait_timeout_rcu(bo->resv, true, true, MAX_SCHEDULE_TIMEOUT);
+		if (!lret)
+			return -EBUSY;
+		else if (lret < 0)
+			return lret;
+		return 0;
 	}
 
 	ret = ttm_bo_synccpu_write_grab


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1 16/16] drm/ttm: use rcu in core ttm
  2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
                   ` (14 preceding siblings ...)
  2014-05-14 14:58 ` [RFC PATCH v1 15/16] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab Maarten Lankhorst
@ 2014-05-14 14:58 ` Maarten Lankhorst
  15 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-14 14:58 UTC (permalink / raw)
  To: airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   76 +++++++-----------------------------------
 1 file changed, 13 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 31c4a6dd722d..6fe1f4bf37ed 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -466,66 +466,6 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			      ((HZ / 100) < 1) ? 1 : HZ / 100);
 }
 
-static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
-				     bool interruptible)
-{
-	struct ttm_bo_global *glob = bo->glob;
-	struct reservation_object_list *fobj;
-	struct fence *excl = NULL;
-	struct fence **shared = NULL;
-	u32 shared_count = 0, i;
-	int ret = 0;
-
-	fobj = reservation_object_get_list(bo->resv);
-	if (fobj && fobj->shared_count) {
-		shared = kmalloc(sizeof(*shared) * fobj->shared_count,
-				 GFP_KERNEL);
-
-		if (!shared) {
-			ret = -ENOMEM;
-			__ttm_bo_unreserve(bo);
-			spin_unlock(&glob->lru_lock);
-			return ret;
-		}
-
-		for (i = 0; i < fobj->shared_count; ++i) {
-			if (!fence_is_signaled(fobj->shared[i])) {
-				fence_get(fobj->shared[i]);
-				shared[shared_count++] = fobj->shared[i];
-			}
-		}
-		if (!shared_count) {
-			kfree(shared);
-			shared = NULL;
-		}
-	}
-
-	excl = reservation_object_get_excl(bo->resv);
-	if (excl && !fence_is_signaled(excl))
-		fence_get(excl);
-	else
-		excl = NULL;
-
-	__ttm_bo_unreserve(bo);
-	spin_unlock(&glob->lru_lock);
-
-	if (excl) {
-		ret = fence_wait(excl, interruptible);
-		fence_put(excl);
-	}
-
-	if (shared_count > 0) {
-		for (i = 0; i < shared_count; ++i) {
-			if (!ret)
-				ret = fence_wait(shared[i], interruptible);
-			fence_put(shared[i]);
-		}
-		kfree(shared);
-	}
-
-	return ret;
-}
-
 /**
  * function ttm_bo_cleanup_refs_and_unlock
  * If bo idle, remove from delayed- and lru lists, and unref.
@@ -549,9 +489,19 @@ static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo,
 	ret = ttm_bo_wait(bo, false, false, true);
 
 	if (ret && !no_wait_gpu) {
-		ret = ttm_bo_unreserve_and_wait(bo, interruptible);
-		if (ret)
-			return ret;
+		long lret;
+		ww_mutex_unlock(&bo->resv->lock);
+		spin_unlock(&glob->lru_lock);
+
+		lret = reservation_object_wait_timeout_rcu(bo->resv,
+							   true,
+							   interruptible,
+							   30 * HZ);
+
+		if (lret < 0)
+			return lret;
+		else if (lret == 0)
+			return -EBUSY;
 
 		spin_lock(&glob->lru_lock);
 		ret = __ttm_bo_reserve(bo, false, true, false, 0);


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-14 14:58 ` [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences Maarten Lankhorst
@ 2014-05-14 15:29   ` Christian König
  2014-05-15  1:06       ` Maarten Lankhorst
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2014-05-14 15:29 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

> +	/* did fence get signaled after we enabled the sw irq? */
> +	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		return false;
> +	}
> +
> +	fence->fence_wake.flags = 0;
> +	fence->fence_wake.private = NULL;
> +	fence->fence_wake.func = radeon_fence_check_signaled;
> +	__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +	fence_get(f);
That looks like a race condition to me. The fence needs to be added to 
the wait queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. 
How for example is lockup detection supposed to happen with this?

Christian.

Am 14.05.2014 16:58, schrieb Maarten Lankhorst:
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
>   drivers/gpu/drm/radeon/radeon.h        |   15 +--
>   drivers/gpu/drm/radeon/radeon_device.c |    1
>   drivers/gpu/drm/radeon/radeon_fence.c  |  189 +++++++++++++++++++++++++-------
>   3 files changed, 153 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 68528619834a..a7d839a158ae 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
>   #include <linux/wait.h>
>   #include <linux/list.h>
>   #include <linux/kref.h>
> +#include <linux/fence.h>
>   
>   #include <ttm/ttm_bo_api.h>
>   #include <ttm/ttm_bo_driver.h>
> @@ -113,9 +114,6 @@ extern int radeon_hard_reset;
>   #define RADEONFB_CONN_LIMIT			4
>   #define RADEON_BIOS_NUM_SCRATCH			8
>   
> -/* fence seq are set to this number when signaled */
> -#define RADEON_FENCE_SIGNALED_SEQ		0LL
> -
>   /* internal ring indices */
>   /* r1xx+ has gfx CP ring */
>   #define RADEON_RING_TYPE_GFX_INDEX		0
> @@ -347,12 +345,15 @@ struct radeon_fence_driver {
>   };
>   
>   struct radeon_fence {
> +	struct fence base;
> +
>   	struct radeon_device		*rdev;
> -	struct kref			kref;
>   	/* protected by radeon_fence.lock */
>   	uint64_t			seq;
>   	/* RB, DMA, etc. */
>   	unsigned			ring;
> +
> +	wait_queue_t fence_wake;
>   };
>   
>   int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
> @@ -2256,6 +2257,7 @@ struct radeon_device {
>   	struct radeon_mman		mman;
>   	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
>   	wait_queue_head_t		fence_queue;
> +	unsigned			fence_context;
>   	struct mutex			ring_lock;
>   	struct radeon_ring		ring[RADEON_NUM_RINGS];
>   	bool				ib_pool_ready;
> @@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
>   void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>   
>   /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
>    * Registers read & write functions.
>    */
>   #define RREG8(reg) readb((rdev->rmmio) + (reg))
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 0e770bbf7e29..501d0cf9eb8b 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	for (i = 0; i < RADEON_NUM_RINGS; i++) {
>   		rdev->ring[i].idx = i;
>   	}
> +	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
>   
>   	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
>   		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index a77b1c13ea43..bc844f300d3f 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -39,6 +39,15 @@
>   #include "radeon.h"
>   #include "radeon_trace.h"
>   
> +static const struct fence_ops radeon_fence_ops;
> +
> +#define to_radeon_fence(p) \
> +	({								\
> +		struct radeon_fence *__f;				\
> +		__f = container_of((p), struct radeon_fence, base);	\
> +		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
> +	})
> +
>   /*
>    * Fences
>    * Fences mark an event in the GPUs pipeline and are used
> @@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
>   		      struct radeon_fence **fence,
>   		      int ring)
>   {
> +	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
>   	/* we are protected by the ring emission mutex */
>   	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
>   	if ((*fence) == NULL) {
>   		return -ENOMEM;
>   	}
> -	kref_init(&((*fence)->kref));
> -	(*fence)->rdev = rdev;
> -	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
>   	(*fence)->ring = ring;
> +	__fence_init(&(*fence)->base, &radeon_fence_ops,
> +		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
> +	(*fence)->rdev = rdev;
> +	(*fence)->seq = seq;
>   	radeon_fence_ring_emit(rdev, ring, *fence);
>   	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>   	return 0;
>   }
>   
>   /**
> - * radeon_fence_process - process a fence
> + * radeon_fence_check_signaled - callback from fence_queue
>    *
> - * @rdev: radeon_device pointer
> - * @ring: ring index the fence is associated with
> - *
> - * Checks the current fence value and wakes the fence queue
> - * if the sequence number has increased (all asics).
> + * this function is called with fence_queue lock held, which is also used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
>    */
> -void radeon_fence_process(struct radeon_device *rdev, int ring)
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
> +{
> +	struct radeon_fence *fence;
> +	u64 seq;
> +
> +	fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> +	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> +	if (seq >= fence->seq) {
> +		int ret = __fence_signal(&fence->base);
> +
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +		else
> +			FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +		fence_put(&fence->base);
> +	} else
> +		FENCE_TRACE(&fence->base, "pending\n");
> +	return 0;
> +}
> +
> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>   {
>   	uint64_t seq, last_seq, last_emitted;
>   	unsigned count_loop = 0;
> @@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>   		}
>   	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>   
> -	if (wake)
> -		wake_up_all(&rdev->fence_queue);
> +	return wake;
>   }
>   
>   /**
> - * radeon_fence_destroy - destroy a fence
> + * radeon_fence_process - process a fence
>    *
> - * @kref: fence kref
> + * @rdev: radeon_device pointer
> + * @ring: ring index the fence is associated with
>    *
> - * Frees the fence object (all asics).
> + * Checks the current fence value and wakes the fence queue
> + * if the sequence number has increased (all asics).
>    */
> -static void radeon_fence_destroy(struct kref *kref)
> +void radeon_fence_process(struct radeon_device *rdev, int ring)
>   {
> -	struct radeon_fence *fence;
> -
> -	fence = container_of(kref, struct radeon_fence, kref);
> -	kfree(fence);
> +	if (__radeon_fence_process(rdev, ring))
> +		wake_up_all(&rdev->fence_queue);
>   }
>   
>   /**
> @@ -237,6 +270,49 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
>   	return false;
>   }
>   
> +static bool __radeon_fence_signaled(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +
> +	return radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring);
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +
> +	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
> +	    !fence->rdev->ddev->irq_enabled)
> +		return false;
> +
> +	radeon_irq_kms_sw_irq_get(fence->rdev, fence->ring);
> +
> +	if (__radeon_fence_process(fence->rdev, fence->ring))
> +		wake_up_all_locked(&fence->rdev->fence_queue);
> +
> +	/* did fence get signaled after we enabled the sw irq? */
> +	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		return false;
> +	}
> +
> +	fence->fence_wake.flags = 0;
> +	fence->fence_wake.private = NULL;
> +	fence->fence_wake.func = radeon_fence_check_signaled;
> +	__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +	fence_get(f);
> +
> +	return true;
> +}
> +
>   /**
>    * radeon_fence_signaled - check if a fence has signaled
>    *
> @@ -250,11 +326,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>   	if (!fence) {
>   		return true;
>   	}
> -	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
> -		return true;
> -	}
> +
>   	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
> -		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +		int ret;
> +
> +		ret = fence_signal(&fence->base);
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
>   		return true;
>   	}
>   	return false;
> @@ -386,7 +464,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
>    * radeon_fence_wait - wait for a fence to signal
>    *
>    * @fence: radeon fence object
> - * @intr: use interruptable sleep
> + * @intr: use interruptible sleep
>    *
>    * Wait for the requested fence to signal (all asics).
>    * @intr selects whether to use interruptable (true) or non-interruptable
> @@ -398,20 +476,17 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>   	uint64_t seq[RADEON_NUM_RINGS] = {};
>   	int r;
>   
> -	if (fence == NULL) {
> -		WARN(1, "Querying an invalid fence : %p !\n", fence);
> -		return -EINVAL;
> -	}
> -
> -	seq[fence->ring] = fence->seq;
> -	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
> +	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
>   		return 0;
>   
> +	seq[fence->ring] = fence->seq;
>   	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -	if (r)
> +	if (r) {
>   		return r;
> -
> -	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +	}
> +	r = fence_signal(&fence->base);
> +	if (!r)
> +		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
>   	return 0;
>   }
>   
> @@ -443,12 +518,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>   			continue;
>   		}
>   
> +		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
> +			/* already signaled */
> +			return 0;
> +		}
> +
>   		seq[i] = fences[i]->seq;
>   		++num_rings;
> -
> -		/* test if something was allready signaled */
> -		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
> -			return 0;
>   	}
>   
>   	/* nothing to wait for ? */
> @@ -525,7 +601,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
>    */
>   struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
>   {
> -	kref_get(&fence->kref);
> +	fence_get(&fence->base);
>   	return fence;
>   }
>   
> @@ -541,9 +617,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
>   	struct radeon_fence *tmp = *fence;
>   
>   	*fence = NULL;
> -	if (tmp) {
> -		kref_put(&tmp->kref, radeon_fence_destroy);
> -	}
> +	if (tmp)
> +		fence_put(&tmp->base);
>   }
>   
>   /**
> @@ -832,3 +907,31 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
>   	return 0;
>   #endif
>   }
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> +	return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +	switch (fence->ring) {
> +	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> +	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> +	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> +	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> +	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> +	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> +	default: WARN_ON_ONCE(1); return "radeon.unk";
> +	}
> +}
> +
> +static const struct fence_ops radeon_fence_ops = {
> +	.get_driver_name = radeon_fence_get_driver_name,
> +	.get_timeline_name = radeon_fence_get_timeline_name,
> +	.enable_signaling = radeon_fence_enable_signaling,
> +	.signaled = __radeon_fence_signaled,
> +	.wait = fence_default_wait,
> +	.release = NULL,
> +};
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  1:06       ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15  1:06 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 14-05-14 17:29, Christian König schreef:
>> +    /* did fence get signaled after we enabled the sw irq? */
>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>> +        return false;
>> +    }
>> +
>> +    fence->fence_wake.flags = 0;
>> +    fence->fence_wake.private = NULL;
>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>> +    fence_get(f);
> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>
> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
It's not a race condition because fence_queue.lock is held when this function is called.

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
and report that the function timed out.

~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  1:06       ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15  1:06 UTC (permalink / raw)
  To: Christian König, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

op 14-05-14 17:29, Christian König schreef:
>> +    /* did fence get signaled after we enabled the sw irq? */
>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>> +        return false;
>> +    }
>> +
>> +    fence->fence_wake.flags = 0;
>> +    fence->fence_wake.private = NULL;
>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>> +    fence_get(f);
> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>
> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
It's not a race condition because fence_queue.lock is held when this function is called.

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
and report that the function timed out.

~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  9:21         ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15  9:21 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
> op 14-05-14 17:29, Christian König schreef:
>>> +    /* did fence get signaled after we enabled the sw irq? */
>>> +    if 
>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>> fence->seq) {
>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>> +        return false;
>>> +    }
>>> +
>>> +    fence->fence_wake.flags = 0;
>>> +    fence->fence_wake.private = NULL;
>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>> +    fence_get(f);
>> That looks like a race condition to me. The fence needs to be added 
>> to the wait queue before the check, not after.
>>
>> Apart from that the whole approach looks like a really bad idea to 
>> me. How for example is lockup detection supposed to happen with this? 
> It's not a race condition because fence_queue.lock is held when this 
> function is called.
Ah, I see. That's also the reason why you moved the wake_up_all out of 
the processing function.

>
> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm 
> code to handle the lockup any more,
> but any driver specific wait code would still handle this. I did this 
> by design, because in future patches the wait
> function may be called from outside of the radeon driver. The official 
> wait function takes a timeout parameter,
> so lockups wouldn't be fatal if the timeout is set to something like 
> 30*HZ for example, it would still return
> and report that the function timed out.
Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is 
called in non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout 
and can then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets 
a chance to wait for the hardware to become idle and so never has the 
opportunity to the reset the whole thing.

Christian.

>
> ~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  9:21         ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15  9:21 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
> op 14-05-14 17:29, Christian König schreef:
>>> +    /* did fence get signaled after we enabled the sw irq? */
>>> +    if 
>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>> fence->seq) {
>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>> +        return false;
>>> +    }
>>> +
>>> +    fence->fence_wake.flags = 0;
>>> +    fence->fence_wake.private = NULL;
>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>> +    fence_get(f);
>> That looks like a race condition to me. The fence needs to be added 
>> to the wait queue before the check, not after.
>>
>> Apart from that the whole approach looks like a really bad idea to 
>> me. How for example is lockup detection supposed to happen with this? 
> It's not a race condition because fence_queue.lock is held when this 
> function is called.
Ah, I see. That's also the reason why you moved the wake_up_all out of 
the processing function.

>
> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm 
> code to handle the lockup any more,
> but any driver specific wait code would still handle this. I did this 
> by design, because in future patches the wait
> function may be called from outside of the radeon driver. The official 
> wait function takes a timeout parameter,
> so lockups wouldn't be fatal if the timeout is set to something like 
> 30*HZ for example, it would still return
> and report that the function timed out.
Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is 
called in non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout 
and can then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets 
a chance to wait for the hardware to become idle and so never has the 
opportunity to the reset the whole thing.

Christian.

>
> ~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-15  9:21         ` Christian König
  (?)
@ 2014-05-15  9:38         ` Maarten Lankhorst
  2014-05-15  9:42             ` Christian König
  -1 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15  9:38 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 15-05-14 11:21, Christian König schreef:
> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>> op 14-05-14 17:29, Christian König schreef:
>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    fence->fence_wake.flags = 0;
>>>> +    fence->fence_wake.private = NULL;
>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>> +    fence_get(f);
>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>
>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>> It's not a race condition because fence_queue.lock is held when this function is called.
> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
Correct. :-)
>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>> and report that the function timed out.
> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>
> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>
> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>
> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.

~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  9:42             ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15  9:42 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
> op 15-05-14 11:21, Christian König schreef:
>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>> op 14-05-14 17:29, Christian König schreef:
>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>> +    if 
>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>>>> fence->seq) {
>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    fence->fence_wake.flags = 0;
>>>>> +    fence->fence_wake.private = NULL;
>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>> +    fence_get(f);
>>>> That looks like a race condition to me. The fence needs to be added 
>>>> to the wait queue before the check, not after.
>>>>
>>>> Apart from that the whole approach looks like a really bad idea to 
>>>> me. How for example is lockup detection supposed to happen with this? 
>>> It's not a race condition because fence_queue.lock is held when this 
>>> function is called.
>> Ah, I see. That's also the reason why you moved the wake_up_all out 
>> of the processing function.
> Correct. :-)
>>> Lockup's a bit of a weird problem, the changes wouldn't allow core 
>>> ttm code to handle the lockup any more,
>>> but any driver specific wait code would still handle this. I did 
>>> this by design, because in future patches the wait
>>> function may be called from outside of the radeon driver. The 
>>> official wait function takes a timeout parameter,
>>> so lockups wouldn't be fatal if the timeout is set to something like 
>>> 30*HZ for example, it would still return
>>> and report that the function timed out.
>> Timeouts help with the detection of the lockup, but not at all with 
>> the handling of them.
>>
>> What we essentially need is a wait callback into the driver that is 
>> called in non atomic context without any locks held.
>>
>> This way we can block for the fence to become signaled with a timeout 
>> and can then also initiate the reset handling if necessary.
>>
>> The way you designed the interface now means that the driver never 
>> gets a chance to wait for the hardware to become idle and so never 
>> has the opportunity to the reset the whole thing.
> You could set up a hangcheck timer like intel does, and end up with a 
> reliable hangcheck detection that doesn't depend on cpu waits. :-) Or 
> override the default wait function and restore the old behavior.

Overriding the default wait function sounds better, please implement it 
this way.

Thanks,
Christian.

>
> ~Maarten
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15  9:42             ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15  9:42 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
> op 15-05-14 11:21, Christian König schreef:
>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>> op 14-05-14 17:29, Christian König schreef:
>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>> +    if 
>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>>>> fence->seq) {
>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    fence->fence_wake.flags = 0;
>>>>> +    fence->fence_wake.private = NULL;
>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>> +    fence_get(f);
>>>> That looks like a race condition to me. The fence needs to be added 
>>>> to the wait queue before the check, not after.
>>>>
>>>> Apart from that the whole approach looks like a really bad idea to 
>>>> me. How for example is lockup detection supposed to happen with this? 
>>> It's not a race condition because fence_queue.lock is held when this 
>>> function is called.
>> Ah, I see. That's also the reason why you moved the wake_up_all out 
>> of the processing function.
> Correct. :-)
>>> Lockup's a bit of a weird problem, the changes wouldn't allow core 
>>> ttm code to handle the lockup any more,
>>> but any driver specific wait code would still handle this. I did 
>>> this by design, because in future patches the wait
>>> function may be called from outside of the radeon driver. The 
>>> official wait function takes a timeout parameter,
>>> so lockups wouldn't be fatal if the timeout is set to something like 
>>> 30*HZ for example, it would still return
>>> and report that the function timed out.
>> Timeouts help with the detection of the lockup, but not at all with 
>> the handling of them.
>>
>> What we essentially need is a wait callback into the driver that is 
>> called in non atomic context without any locks held.
>>
>> This way we can block for the fence to become signaled with a timeout 
>> and can then also initiate the reset handling if necessary.
>>
>> The way you designed the interface now means that the driver never 
>> gets a chance to wait for the hardware to become idle and so never 
>> has the opportunity to the reset the whole thing.
> You could set up a hangcheck timer like intel does, and end up with a 
> reliable hangcheck detection that doesn't depend on cpu waits. :-) Or 
> override the default wait function and restore the old behavior.

Overriding the default wait function sounds better, please implement it 
this way.

Thanks,
Christian.

>
> ~Maarten
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-15  9:42             ` Christian König
  (?)
@ 2014-05-15 13:04             ` Maarten Lankhorst
  2014-05-15 13:19                 ` Christian König
  -1 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15 13:04 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 15-05-14 11:42, Christian König schreef:
> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>> op 15-05-14 11:21, Christian König schreef:
>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    fence->fence_wake.flags = 0;
>>>>>> +    fence->fence_wake.private = NULL;
>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>> +    fence_get(f);
>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>
>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>> Correct. :-)
>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>> and report that the function timed out.
>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>
>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>
>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>
>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>
> Overriding the default wait function sounds better, please implement it this way.
>
> Thanks,
> Christian. 

Does this modification look sane?

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index bc844f300d3f..2d415eb2834a 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -361,28 +361,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited = timeout;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -397,13 +404,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
+		timeout -= waited - r;
+
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  			if (!target_seq[i])
  				continue;
@@ -415,6 +424,12 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -457,7 +472,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
@@ -480,8 +495,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  		return 0;
  
  	seq[fence->ring] = fence->seq;
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	r = fence_signal(&fence->base);
@@ -509,7 +524,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -531,8 +546,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -551,6 +566,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -558,7 +574,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -580,8 +599,8 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
@@ -908,6 +927,15 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  #endif
  }
  
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+
+	target_seq[fence->ring] = fence->seq;
+	return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+}
+
  static const char *radeon_fence_get_driver_name(struct fence *fence)
  {
  	return "radeon";
@@ -932,6 +960,6 @@ static const struct fence_ops radeon_fence_ops = {
  	.get_timeline_name = radeon_fence_get_timeline_name,
  	.enable_signaling = radeon_fence_enable_signaling,
  	.signaled = __radeon_fence_signaled,
-	.wait = fence_default_wait,
+	.wait = __radeon_fence_wait,
  	.release = NULL,
  };


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 13:19                 ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15 13:19 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
> op 15-05-14 11:42, Christian König schreef:
>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>> op 15-05-14 11:21, Christian König schreef:
>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>> +    if 
>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>>>>>> fence->seq) {
>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>> +        return false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>> +    fence_get(f);
>>>>>> That looks like a race condition to me. The fence needs to be 
>>>>>> added to the wait queue before the check, not after.
>>>>>>
>>>>>> Apart from that the whole approach looks like a really bad idea 
>>>>>> to me. How for example is lockup detection supposed to happen 
>>>>>> with this? 
>>>>> It's not a race condition because fence_queue.lock is held when 
>>>>> this function is called.
>>>> Ah, I see. That's also the reason why you moved the wake_up_all out 
>>>> of the processing function.
>>> Correct. :-)
>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core 
>>>>> ttm code to handle the lockup any more,
>>>>> but any driver specific wait code would still handle this. I did 
>>>>> this by design, because in future patches the wait
>>>>> function may be called from outside of the radeon driver. The 
>>>>> official wait function takes a timeout parameter,
>>>>> so lockups wouldn't be fatal if the timeout is set to something 
>>>>> like 30*HZ for example, it would still return
>>>>> and report that the function timed out.
>>>> Timeouts help with the detection of the lockup, but not at all with 
>>>> the handling of them.
>>>>
>>>> What we essentially need is a wait callback into the driver that is 
>>>> called in non atomic context without any locks held.
>>>>
>>>> This way we can block for the fence to become signaled with a 
>>>> timeout and can then also initiate the reset handling if necessary.
>>>>
>>>> The way you designed the interface now means that the driver never 
>>>> gets a chance to wait for the hardware to become idle and so never 
>>>> has the opportunity to the reset the whole thing.
>>> You could set up a hangcheck timer like intel does, and end up with 
>>> a reliable hangcheck detection that doesn't depend on cpu waits. :-) 
>>> Or override the default wait function and restore the old behavior.
>>
>> Overriding the default wait function sounds better, please implement 
>> it this way.
>>
>> Thanks,
>> Christian. 
>
> Does this modification look sane?
Adding the timeout is on my todo list for quite some time as well, so 
this part makes sense.

> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +
> +    target_seq[fence->ring] = fence->seq;
> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
> intr, timeout);
> +}
When this call is comming from outside the radeon driver you need to 
lock rdev->exclusive_lock here to make sure not to interfere with a 
possible reset.

>      .get_timeline_name = radeon_fence_get_timeline_name,
>      .enable_signaling = radeon_fence_enable_signaling,
>      .signaled = __radeon_fence_signaled,
Do we still need those callback when we implemented the wait callback?

Christian.

>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index bc844f300d3f..2d415eb2834a 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -361,28 +361,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
>   *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;
> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -397,13 +404,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -415,6 +424,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */
> +        if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -457,7 +472,7 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
> @@ -480,8 +495,8 @@ int radeon_fence_wait(struct radeon_fence *fence, 
> bool intr)
>          return 0;
>
>      seq[fence->ring] = fence->seq;
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      r = fence_signal(&fence->base);
> @@ -509,7 +524,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -531,8 +546,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -551,6 +566,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -558,7 +574,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -580,8 +599,8 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> @@ -908,6 +927,15 @@ int radeon_debugfs_fence_init(struct 
> radeon_device *rdev)
>  #endif
>  }
>
> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +
> +    target_seq[fence->ring] = fence->seq;
> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
> intr, timeout);
> +}
> +
>  static const char *radeon_fence_get_driver_name(struct fence *fence)
>  {
>      return "radeon";
> @@ -932,6 +960,6 @@ static const struct fence_ops radeon_fence_ops = {
>      .get_timeline_name = radeon_fence_get_timeline_name,
>      .enable_signaling = radeon_fence_enable_signaling,
>      .signaled = __radeon_fence_signaled,
> -    .wait = fence_default_wait,
> +    .wait = __radeon_fence_wait,
>      .release = NULL,
>  };
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 13:19                 ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15 13:19 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
> op 15-05-14 11:42, Christian König schreef:
>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>> op 15-05-14 11:21, Christian König schreef:
>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>> +    if 
>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>>>>>> fence->seq) {
>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>> +        return false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>> +    fence_get(f);
>>>>>> That looks like a race condition to me. The fence needs to be 
>>>>>> added to the wait queue before the check, not after.
>>>>>>
>>>>>> Apart from that the whole approach looks like a really bad idea 
>>>>>> to me. How for example is lockup detection supposed to happen 
>>>>>> with this? 
>>>>> It's not a race condition because fence_queue.lock is held when 
>>>>> this function is called.
>>>> Ah, I see. That's also the reason why you moved the wake_up_all out 
>>>> of the processing function.
>>> Correct. :-)
>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core 
>>>>> ttm code to handle the lockup any more,
>>>>> but any driver specific wait code would still handle this. I did 
>>>>> this by design, because in future patches the wait
>>>>> function may be called from outside of the radeon driver. The 
>>>>> official wait function takes a timeout parameter,
>>>>> so lockups wouldn't be fatal if the timeout is set to something 
>>>>> like 30*HZ for example, it would still return
>>>>> and report that the function timed out.
>>>> Timeouts help with the detection of the lockup, but not at all with 
>>>> the handling of them.
>>>>
>>>> What we essentially need is a wait callback into the driver that is 
>>>> called in non atomic context without any locks held.
>>>>
>>>> This way we can block for the fence to become signaled with a 
>>>> timeout and can then also initiate the reset handling if necessary.
>>>>
>>>> The way you designed the interface now means that the driver never 
>>>> gets a chance to wait for the hardware to become idle and so never 
>>>> has the opportunity to the reset the whole thing.
>>> You could set up a hangcheck timer like intel does, and end up with 
>>> a reliable hangcheck detection that doesn't depend on cpu waits. :-) 
>>> Or override the default wait function and restore the old behavior.
>>
>> Overriding the default wait function sounds better, please implement 
>> it this way.
>>
>> Thanks,
>> Christian. 
>
> Does this modification look sane?
Adding the timeout is on my todo list for quite some time as well, so 
this part makes sense.

> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +
> +    target_seq[fence->ring] = fence->seq;
> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
> intr, timeout);
> +}
When this call is comming from outside the radeon driver you need to 
lock rdev->exclusive_lock here to make sure not to interfere with a 
possible reset.

>      .get_timeline_name = radeon_fence_get_timeline_name,
>      .enable_signaling = radeon_fence_enable_signaling,
>      .signaled = __radeon_fence_signaled,
Do we still need those callback when we implemented the wait callback?

Christian.

>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index bc844f300d3f..2d415eb2834a 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -361,28 +361,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
>   *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;
> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -397,13 +404,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -415,6 +424,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */
> +        if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -457,7 +472,7 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
> @@ -480,8 +495,8 @@ int radeon_fence_wait(struct radeon_fence *fence, 
> bool intr)
>          return 0;
>
>      seq[fence->ring] = fence->seq;
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      r = fence_signal(&fence->base);
> @@ -509,7 +524,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -531,8 +546,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -551,6 +566,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -558,7 +574,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -580,8 +599,8 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> @@ -908,6 +927,15 @@ int radeon_debugfs_fence_init(struct 
> radeon_device *rdev)
>  #endif
>  }
>
> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +
> +    target_seq[fence->ring] = fence->seq;
> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
> intr, timeout);
> +}
> +
>  static const char *radeon_fence_get_driver_name(struct fence *fence)
>  {
>      return "radeon";
> @@ -932,6 +960,6 @@ static const struct fence_ops radeon_fence_ops = {
>      .get_timeline_name = radeon_fence_get_timeline_name,
>      .enable_signaling = radeon_fence_enable_signaling,
>      .signaled = __radeon_fence_signaled,
> -    .wait = fence_default_wait,
> +    .wait = __radeon_fence_wait,
>      .release = NULL,
>  };
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 14:18                   ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15 14:18 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 15-05-14 15:19, Christian König schreef:
> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>> op 15-05-14 11:42, Christian König schreef:
>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>> op 15-05-14 11:21, Christian König schreef:
>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>>> +    fence_get(f);
>>>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>>>
>>>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>>>> Correct. :-)
>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>>>> and report that the function timed out.
>>>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>>>
>>>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>>>
>>>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>>>
>>>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>>>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>>>
>>> Overriding the default wait function sounds better, please implement it this way.
>>>
>>> Thanks,
>>> Christian. 
>>
>> Does this modification look sane?
> Adding the timeout is on my todo list for quite some time as well, so this part makes sense.
>
>> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
>> +{
>> +    struct radeon_fence *fence = to_radeon_fence(f);
>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>> +
>> +    target_seq[fence->ring] = fence->seq;
>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
>> +}
> When this call is comming from outside the radeon driver you need to lock rdev->exclusive_lock here to make sure not to interfere with a possible reset.
Ah thanks, I'll add that.

>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>      .enable_signaling = radeon_fence_enable_signaling,
>>      .signaled = __radeon_fence_signaled,
> Do we still need those callback when we implemented the wait callback?
.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use it. ;-)

~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 14:18                   ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15 14:18 UTC (permalink / raw)
  To: Christian König, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

op 15-05-14 15:19, Christian König schreef:
> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>> op 15-05-14 11:42, Christian König schreef:
>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>> op 15-05-14 11:21, Christian König schreef:
>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>>> +    fence_get(f);
>>>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>>>
>>>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>>>> Correct. :-)
>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>>>> and report that the function timed out.
>>>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>>>
>>>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>>>
>>>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>>>
>>>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>>>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>>>
>>> Overriding the default wait function sounds better, please implement it this way.
>>>
>>> Thanks,
>>> Christian. 
>>
>> Does this modification look sane?
> Adding the timeout is on my todo list for quite some time as well, so this part makes sense.
>
>> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
>> +{
>> +    struct radeon_fence *fence = to_radeon_fence(f);
>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>> +
>> +    target_seq[fence->ring] = fence->seq;
>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
>> +}
> When this call is comming from outside the radeon driver you need to lock rdev->exclusive_lock here to make sure not to interfere with a possible reset.
Ah thanks, I'll add that.

>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>      .enable_signaling = radeon_fence_enable_signaling,
>>      .signaled = __radeon_fence_signaled,
> Do we still need those callback when we implemented the wait callback?
.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use it. ;-)

~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 15:48                     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15 15:48 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
> op 15-05-14 15:19, Christian König schreef:
>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>> op 15-05-14 11:42, Christian König schreef:
>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>> +    if 
>>>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) 
>>>>>>>>> >= fence->seq) {
>>>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, 
>>>>>>>>> &fence->fence_wake);
>>>>>>>>> +    fence_get(f);
>>>>>>>> That looks like a race condition to me. The fence needs to be 
>>>>>>>> added to the wait queue before the check, not after.
>>>>>>>>
>>>>>>>> Apart from that the whole approach looks like a really bad idea 
>>>>>>>> to me. How for example is lockup detection supposed to happen 
>>>>>>>> with this? 
>>>>>>> It's not a race condition because fence_queue.lock is held when 
>>>>>>> this function is called.
>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all 
>>>>>> out of the processing function.
>>>>> Correct. :-)
>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow 
>>>>>>> core ttm code to handle the lockup any more,
>>>>>>> but any driver specific wait code would still handle this. I did 
>>>>>>> this by design, because in future patches the wait
>>>>>>> function may be called from outside of the radeon driver. The 
>>>>>>> official wait function takes a timeout parameter,
>>>>>>> so lockups wouldn't be fatal if the timeout is set to something 
>>>>>>> like 30*HZ for example, it would still return
>>>>>>> and report that the function timed out.
>>>>>> Timeouts help with the detection of the lockup, but not at all 
>>>>>> with the handling of them.
>>>>>>
>>>>>> What we essentially need is a wait callback into the driver that 
>>>>>> is called in non atomic context without any locks held.
>>>>>>
>>>>>> This way we can block for the fence to become signaled with a 
>>>>>> timeout and can then also initiate the reset handling if necessary.
>>>>>>
>>>>>> The way you designed the interface now means that the driver 
>>>>>> never gets a chance to wait for the hardware to become idle and 
>>>>>> so never has the opportunity to the reset the whole thing.
>>>>> You could set up a hangcheck timer like intel does, and end up 
>>>>> with a reliable hangcheck detection that doesn't depend on cpu 
>>>>> waits. :-) Or override the default wait function and restore the 
>>>>> old behavior.
>>>>
>>>> Overriding the default wait function sounds better, please 
>>>> implement it this way.
>>>>
>>>> Thanks,
>>>> Christian. 
>>>
>>> Does this modification look sane?
>> Adding the timeout is on my todo list for quite some time as well, so 
>> this part makes sense.
>>
>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
>>> timeout)
>>> +{
>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>> +
>>> +    target_seq[fence->ring] = fence->seq;
>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
>>> intr, timeout);
>>> +}
>> When this call is comming from outside the radeon driver you need to 
>> lock rdev->exclusive_lock here to make sure not to interfere with a 
>> possible reset.
> Ah thanks, I'll add that.
>
>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>      .signaled = __radeon_fence_signaled,
>> Do we still need those callback when we implemented the wait callback?
> .get_timeline_name is used for debugging (trace events).
> .signaled is the non-blocking call to check if the fence is signaled 
> or not.
> .enable_signaling is used for adding callbacks upon fence completion, 
> the default 'fence_default_wait' uses it, so
> when it works no separate implementation is needed unless you want to 
> do more than just waiting.
> It's also used when fence_add_callback is called. i915 can be patched 
> to use it. ;-)

I just meant enable_signaling, the other ones are fine with me. The 
problem with enable_signaling is that it's called with a spin lock held, 
so we can't sleep.

While resetting the GPU could be moved out into a timer the problem here 
is that I can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a 
fence we can't make sure that there is not GPU reset running on another 
CPU. And touching the IRQ registers while a reset is going on is a 
really good recipe to lockup the whole system.

Christian.

>
> ~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-15 15:48                     ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-05-15 15:48 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
> op 15-05-14 15:19, Christian König schreef:
>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>> op 15-05-14 11:42, Christian König schreef:
>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>> +    if 
>>>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) 
>>>>>>>>> >= fence->seq) {
>>>>>>>>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, 
>>>>>>>>> &fence->fence_wake);
>>>>>>>>> +    fence_get(f);
>>>>>>>> That looks like a race condition to me. The fence needs to be 
>>>>>>>> added to the wait queue before the check, not after.
>>>>>>>>
>>>>>>>> Apart from that the whole approach looks like a really bad idea 
>>>>>>>> to me. How for example is lockup detection supposed to happen 
>>>>>>>> with this? 
>>>>>>> It's not a race condition because fence_queue.lock is held when 
>>>>>>> this function is called.
>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all 
>>>>>> out of the processing function.
>>>>> Correct. :-)
>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow 
>>>>>>> core ttm code to handle the lockup any more,
>>>>>>> but any driver specific wait code would still handle this. I did 
>>>>>>> this by design, because in future patches the wait
>>>>>>> function may be called from outside of the radeon driver. The 
>>>>>>> official wait function takes a timeout parameter,
>>>>>>> so lockups wouldn't be fatal if the timeout is set to something 
>>>>>>> like 30*HZ for example, it would still return
>>>>>>> and report that the function timed out.
>>>>>> Timeouts help with the detection of the lockup, but not at all 
>>>>>> with the handling of them.
>>>>>>
>>>>>> What we essentially need is a wait callback into the driver that 
>>>>>> is called in non atomic context without any locks held.
>>>>>>
>>>>>> This way we can block for the fence to become signaled with a 
>>>>>> timeout and can then also initiate the reset handling if necessary.
>>>>>>
>>>>>> The way you designed the interface now means that the driver 
>>>>>> never gets a chance to wait for the hardware to become idle and 
>>>>>> so never has the opportunity to the reset the whole thing.
>>>>> You could set up a hangcheck timer like intel does, and end up 
>>>>> with a reliable hangcheck detection that doesn't depend on cpu 
>>>>> waits. :-) Or override the default wait function and restore the 
>>>>> old behavior.
>>>>
>>>> Overriding the default wait function sounds better, please 
>>>> implement it this way.
>>>>
>>>> Thanks,
>>>> Christian. 
>>>
>>> Does this modification look sane?
>> Adding the timeout is on my todo list for quite some time as well, so 
>> this part makes sense.
>>
>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
>>> timeout)
>>> +{
>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>> +
>>> +    target_seq[fence->ring] = fence->seq;
>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
>>> intr, timeout);
>>> +}
>> When this call is comming from outside the radeon driver you need to 
>> lock rdev->exclusive_lock here to make sure not to interfere with a 
>> possible reset.
> Ah thanks, I'll add that.
>
>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>      .signaled = __radeon_fence_signaled,
>> Do we still need those callback when we implemented the wait callback?
> .get_timeline_name is used for debugging (trace events).
> .signaled is the non-blocking call to check if the fence is signaled 
> or not.
> .enable_signaling is used for adding callbacks upon fence completion, 
> the default 'fence_default_wait' uses it, so
> when it works no separate implementation is needed unless you want to 
> do more than just waiting.
> It's also used when fence_add_callback is called. i915 can be patched 
> to use it. ;-)

I just meant enable_signaling, the other ones are fine with me. The 
problem with enable_signaling is that it's called with a spin lock held, 
so we can't sleep.

While resetting the GPU could be moved out into a timer the problem here 
is that I can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a 
fence we can't make sure that there is not GPU reset running on another 
CPU. And touching the IRQ registers while a reset is going on is a 
really good recipe to lockup the whole system.

Christian.

>
> ~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-15 15:48                     ` Christian König
  (?)
@ 2014-05-15 15:58                     ` Maarten Lankhorst
  2014-05-15 16:13                       ` Christian König
  -1 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-15 15:58 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 15-05-14 17:48, Christian König schreef:
> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>> op 15-05-14 15:19, Christian König schreef:
>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>> op 15-05-14 11:42, Christian König schreef:
>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>> +        return false;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>>>>> +    fence_get(f);
>>>>>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>>>>>
>>>>>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>>>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>>>>>> Correct. :-)
>>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>>>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>>>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>>>>>> and report that the function timed out.
>>>>>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>>>>>
>>>>>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>>>>>
>>>>>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>>>>>
>>>>>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>>>>>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>>>>>
>>>>> Overriding the default wait function sounds better, please implement it this way.
>>>>>
>>>>> Thanks,
>>>>> Christian. 
>>>>
>>>> Does this modification look sane?
>>> Adding the timeout is on my todo list for quite some time as well, so this part makes sense.
>>>
>>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
>>>> +{
>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>> +
>>>> +    target_seq[fence->ring] = fence->seq;
>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
>>>> +}
>>> When this call is comming from outside the radeon driver you need to lock rdev->exclusive_lock here to make sure not to interfere with a possible reset.
>> Ah thanks, I'll add that.
>>
>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>      .signaled = __radeon_fence_signaled,
>>> Do we still need those callback when we implemented the wait callback?
>> .get_timeline_name is used for debugging (trace events).
>> .signaled is the non-blocking call to check if the fence is signaled or not.
>> .enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so
>> when it works no separate implementation is needed unless you want to do more than just waiting.
>> It's also used when fence_add_callback is called. i915 can be patched to use it. ;-)
>
> I just meant enable_signaling, the other ones are fine with me. The problem with enable_signaling is that it's called with a spin lock held, so we can't sleep.
>
> While resetting the GPU could be moved out into a timer the problem here is that I can't lock rdev->exclusive_lock in such situations.
>
> This means when i915 would call into radeon to enable signaling for a fence we can't make sure that there is not GPU reset running on another CPU. And touching the IRQ registers while a reset is going on is a really good recipe to lockup the whole system.
If you increase the irq counter on all rings before doing a gpu reset, adjust the state and call sw_irq_put when done this race could never happen. Or am I missing something?

~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-15 15:58                     ` Maarten Lankhorst
@ 2014-05-15 16:13                       ` Christian König
  2014-05-19  8:00                         ` Maarten Lankhorst
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2014-05-15 16:13 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 15.05.2014 17:58, schrieb Maarten Lankhorst:
> op 15-05-14 17:48, Christian König schreef:
>> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>>> op 15-05-14 15:19, Christian König schreef:
>>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 11:42, Christian König schreef:
>>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>>>> +    if 
>>>>>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
>>>>>>>>>>> fence->seq) {
>>>>>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>>> +        return false;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, 
>>>>>>>>>>> &fence->fence_wake);
>>>>>>>>>>> +    fence_get(f);
>>>>>>>>>> That looks like a race condition to me. The fence needs to be 
>>>>>>>>>> added to the wait queue before the check, not after.
>>>>>>>>>>
>>>>>>>>>> Apart from that the whole approach looks like a really bad 
>>>>>>>>>> idea to me. How for example is lockup detection supposed to 
>>>>>>>>>> happen with this? 
>>>>>>>>> It's not a race condition because fence_queue.lock is held 
>>>>>>>>> when this function is called.
>>>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all 
>>>>>>>> out of the processing function.
>>>>>>> Correct. :-)
>>>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow 
>>>>>>>>> core ttm code to handle the lockup any more,
>>>>>>>>> but any driver specific wait code would still handle this. I 
>>>>>>>>> did this by design, because in future patches the wait
>>>>>>>>> function may be called from outside of the radeon driver. The 
>>>>>>>>> official wait function takes a timeout parameter,
>>>>>>>>> so lockups wouldn't be fatal if the timeout is set to 
>>>>>>>>> something like 30*HZ for example, it would still return
>>>>>>>>> and report that the function timed out.
>>>>>>>> Timeouts help with the detection of the lockup, but not at all 
>>>>>>>> with the handling of them.
>>>>>>>>
>>>>>>>> What we essentially need is a wait callback into the driver 
>>>>>>>> that is called in non atomic context without any locks held.
>>>>>>>>
>>>>>>>> This way we can block for the fence to become signaled with a 
>>>>>>>> timeout and can then also initiate the reset handling if 
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> The way you designed the interface now means that the driver 
>>>>>>>> never gets a chance to wait for the hardware to become idle and 
>>>>>>>> so never has the opportunity to the reset the whole thing.
>>>>>>> You could set up a hangcheck timer like intel does, and end up 
>>>>>>> with a reliable hangcheck detection that doesn't depend on cpu 
>>>>>>> waits. :-) Or override the default wait function and restore the 
>>>>>>> old behavior.
>>>>>>
>>>>>> Overriding the default wait function sounds better, please 
>>>>>> implement it this way.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian. 
>>>>>
>>>>> Does this modification look sane?
>>>> Adding the timeout is on my todo list for quite some time as well, 
>>>> so this part makes sense.
>>>>
>>>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
>>>>> timeout)
>>>>> +{
>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>>> +
>>>>> +    target_seq[fence->ring] = fence->seq;
>>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, 
>>>>> intr, timeout);
>>>>> +}
>>>> When this call is comming from outside the radeon driver you need 
>>>> to lock rdev->exclusive_lock here to make sure not to interfere 
>>>> with a possible reset.
>>> Ah thanks, I'll add that.
>>>
>>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>>      .signaled = __radeon_fence_signaled,
>>>> Do we still need those callback when we implemented the wait callback?
>>> .get_timeline_name is used for debugging (trace events).
>>> .signaled is the non-blocking call to check if the fence is signaled 
>>> or not.
>>> .enable_signaling is used for adding callbacks upon fence 
>>> completion, the default 'fence_default_wait' uses it, so
>>> when it works no separate implementation is needed unless you want 
>>> to do more than just waiting.
>>> It's also used when fence_add_callback is called. i915 can be 
>>> patched to use it. ;-)
>>
>> I just meant enable_signaling, the other ones are fine with me. The 
>> problem with enable_signaling is that it's called with a spin lock 
>> held, so we can't sleep.
>>
>> While resetting the GPU could be moved out into a timer the problem 
>> here is that I can't lock rdev->exclusive_lock in such situations.
>>
>> This means when i915 would call into radeon to enable signaling for a 
>> fence we can't make sure that there is not GPU reset running on 
>> another CPU. And touching the IRQ registers while a reset is going on 
>> is a really good recipe to lockup the whole system.
> If you increase the irq counter on all rings before doing a gpu reset, 
> adjust the state and call sw_irq_put when done this race could never 
> happen. Or am I missing something?
>
Beside that's being extremely ugly in the case of a hard PCI reset even 
touching any register or just accessing VRAM in this moment can crash 
the box. Just working around the enable/disable of the interrupt here 
won't help us much.

Adding another spin lock won't work so well either, because the reset 
function itself wants to sleep as well.

The only solution I see off hand is making the critical reset code path 
work in atomic context as well, but that's not really doable cause AFAIK 
we need to work with functions from the PCI subsystem and spinning on a 
lock for up to a second is not really desirable also.

Christian.

> ~Maarten
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-15 16:13                       ` Christian König
@ 2014-05-19  8:00                         ` Maarten Lankhorst
  2014-05-19  8:27                           ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-19  8:00 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 15-05-14 18:13, Christian König schreef:
> Am 15.05.2014 17:58, schrieb Maarten Lankhorst:
>> op 15-05-14 17:48, Christian König schreef:
>>> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>>>> op 15-05-14 15:19, Christian König schreef:
>>>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>>>> op 15-05-14 11:42, Christian König schreef:
>>>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>>>> +        return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>>>>>>> +    fence_get(f);
>>>>>>>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>>>>>>>
>>>>>>>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>>>>>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>>>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>>>>>>>> Correct. :-)
>>>>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>>>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>>>>>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>>>>>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>>>>>>>> and report that the function timed out.
>>>>>>>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>>>>>>>
>>>>>>>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>>>>>>>
>>>>>>>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>>>>>>>
>>>>>>>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>>>>>>>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>>>>>>>
>>>>>>> Overriding the default wait function sounds better, please implement it this way.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian. 
>>>>>>
>>>>>> Does this modification look sane?
>>>>> Adding the timeout is on my todo list for quite some time as well, so this part makes sense.
>>>>>
>>>>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
>>>>>> +{
>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>>>> +
>>>>>> +    target_seq[fence->ring] = fence->seq;
>>>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
>>>>>> +}
>>>>> When this call is comming from outside the radeon driver you need to lock rdev->exclusive_lock here to make sure not to interfere with a possible reset.
>>>> Ah thanks, I'll add that.
>>>>
>>>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>>>      .signaled = __radeon_fence_signaled,
>>>>> Do we still need those callback when we implemented the wait callback?
>>>> .get_timeline_name is used for debugging (trace events).
>>>> .signaled is the non-blocking call to check if the fence is signaled or not.
>>>> .enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so
>>>> when it works no separate implementation is needed unless you want to do more than just waiting.
>>>> It's also used when fence_add_callback is called. i915 can be patched to use it. ;-)
>>>
>>> I just meant enable_signaling, the other ones are fine with me. The problem with enable_signaling is that it's called with a spin lock held, so we can't sleep.
>>>
>>> While resetting the GPU could be moved out into a timer the problem here is that I can't lock rdev->exclusive_lock in such situations.
>>>
>>> This means when i915 would call into radeon to enable signaling for a fence we can't make sure that there is not GPU reset running on another CPU. And touching the IRQ registers while a reset is going on is a really good recipe to lockup the whole system.
>> If you increase the irq counter on all rings before doing a gpu reset, adjust the state and call sw_irq_put when done this race could never happen. Or am I missing something?
>>
> Beside that's being extremely ugly in the case of a hard PCI reset even touching any register or just accessing VRAM in this moment can crash the box. Just working around the enable/disable of the interrupt here won't help us much.
>
> Adding another spin lock won't work so well either, because the reset function itself wants to sleep as well.
>
> The only solution I see off hand is making the critical reset code path work in atomic context as well, but that's not really doable cause AFAIK we need to work with functions from the PCI subsystem and spinning on a lock for up to a second is not really desirable also.
I've checked the code a little but that can be the case now as well. the new implementation's __radeon_fence_wait will be protected by the exclusive_lock,, but enable_signaling is only protected by the fence_queue.lock and is_signaled is not protected. But this is not a change from the current situation, so it would only become a problem if the gpu hangs in a cross-device situation.

I think adding 1 to the irq refcount in the reset sequence and adding a down_read_trylock on the exclusive lock would help. If the trylock fails we could perform only the safe actions without touching any of the gpu registers or vram, adding the refcount is needed to ensure enable_signaling works as intended.

~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-19  8:00                         ` Maarten Lankhorst
@ 2014-05-19  8:27                           ` Christian König
  2014-05-19 10:10                             ` Maarten Lankhorst
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2014-05-19  8:27 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
> op 15-05-14 18:13, Christian König schreef:
>> Am 15.05.2014 17:58, schrieb Maarten Lankhorst:
>>> op 15-05-14 17:48, Christian König schreef:
>>>> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 15:19, Christian König schreef:
>>>>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>>>>> op 15-05-14 11:42, Christian König schreef:
>>>>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>>>>>> +    /* did fence get signaled after we enabled the sw 
>>>>>>>>>>>>> irq? */
>>>>>>>>>>>>> +    if 
>>>>>>>>>>>>> (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) 
>>>>>>>>>>>>> >= fence->seq) {
>>>>>>>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>>>>> +        return false;
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, 
>>>>>>>>>>>>> &fence->fence_wake);
>>>>>>>>>>>>> +    fence_get(f);
>>>>>>>>>>>> That looks like a race condition to me. The fence needs to 
>>>>>>>>>>>> be added to the wait queue before the check, not after.
>>>>>>>>>>>>
>>>>>>>>>>>> Apart from that the whole approach looks like a really bad 
>>>>>>>>>>>> idea to me. How for example is lockup detection supposed to 
>>>>>>>>>>>> happen with this? 
>>>>>>>>>>> It's not a race condition because fence_queue.lock is held 
>>>>>>>>>>> when this function is called.
>>>>>>>>>> Ah, I see. That's also the reason why you moved the 
>>>>>>>>>> wake_up_all out of the processing function.
>>>>>>>>> Correct. :-)
>>>>>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't 
>>>>>>>>>>> allow core ttm code to handle the lockup any more,
>>>>>>>>>>> but any driver specific wait code would still handle this. I 
>>>>>>>>>>> did this by design, because in future patches the wait
>>>>>>>>>>> function may be called from outside of the radeon driver. 
>>>>>>>>>>> The official wait function takes a timeout parameter,
>>>>>>>>>>> so lockups wouldn't be fatal if the timeout is set to 
>>>>>>>>>>> something like 30*HZ for example, it would still return
>>>>>>>>>>> and report that the function timed out.
>>>>>>>>>> Timeouts help with the detection of the lockup, but not at 
>>>>>>>>>> all with the handling of them.
>>>>>>>>>>
>>>>>>>>>> What we essentially need is a wait callback into the driver 
>>>>>>>>>> that is called in non atomic context without any locks held.
>>>>>>>>>>
>>>>>>>>>> This way we can block for the fence to become signaled with a 
>>>>>>>>>> timeout and can then also initiate the reset handling if 
>>>>>>>>>> necessary.
>>>>>>>>>>
>>>>>>>>>> The way you designed the interface now means that the driver 
>>>>>>>>>> never gets a chance to wait for the hardware to become idle 
>>>>>>>>>> and so never has the opportunity to the reset the whole thing.
>>>>>>>>> You could set up a hangcheck timer like intel does, and end up 
>>>>>>>>> with a reliable hangcheck detection that doesn't depend on cpu 
>>>>>>>>> waits. :-) Or override the default wait function and restore 
>>>>>>>>> the old behavior.
>>>>>>>>
>>>>>>>> Overriding the default wait function sounds better, please 
>>>>>>>> implement it this way.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian. 
>>>>>>>
>>>>>>> Does this modification look sane?
>>>>>> Adding the timeout is on my todo list for quite some time as 
>>>>>> well, so this part makes sense.
>>>>>>
>>>>>>> +static long __radeon_fence_wait(struct fence *f, bool intr, 
>>>>>>> long timeout)
>>>>>>> +{
>>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>>>>> +
>>>>>>> +    target_seq[fence->ring] = fence->seq;
>>>>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, 
>>>>>>> target_seq, intr, timeout);
>>>>>>> +}
>>>>>> When this call is comming from outside the radeon driver you need 
>>>>>> to lock rdev->exclusive_lock here to make sure not to interfere 
>>>>>> with a possible reset.
>>>>> Ah thanks, I'll add that.
>>>>>
>>>>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>>>>      .signaled = __radeon_fence_signaled,
>>>>>> Do we still need those callback when we implemented the wait 
>>>>>> callback?
>>>>> .get_timeline_name is used for debugging (trace events).
>>>>> .signaled is the non-blocking call to check if the fence is 
>>>>> signaled or not.
>>>>> .enable_signaling is used for adding callbacks upon fence 
>>>>> completion, the default 'fence_default_wait' uses it, so
>>>>> when it works no separate implementation is needed unless you want 
>>>>> to do more than just waiting.
>>>>> It's also used when fence_add_callback is called. i915 can be 
>>>>> patched to use it. ;-)
>>>>
>>>> I just meant enable_signaling, the other ones are fine with me. The 
>>>> problem with enable_signaling is that it's called with a spin lock 
>>>> held, so we can't sleep.
>>>>
>>>> While resetting the GPU could be moved out into a timer the problem 
>>>> here is that I can't lock rdev->exclusive_lock in such situations.
>>>>
>>>> This means when i915 would call into radeon to enable signaling for 
>>>> a fence we can't make sure that there is not GPU reset running on 
>>>> another CPU. And touching the IRQ registers while a reset is going 
>>>> on is a really good recipe to lockup the whole system.
>>> If you increase the irq counter on all rings before doing a gpu 
>>> reset, adjust the state and call sw_irq_put when done this race 
>>> could never happen. Or am I missing something?
>>>
>> Beside that's being extremely ugly in the case of a hard PCI reset 
>> even touching any register or just accessing VRAM in this moment can 
>> crash the box. Just working around the enable/disable of the 
>> interrupt here won't help us much.
>>
>> Adding another spin lock won't work so well either, because the reset 
>> function itself wants to sleep as well.
>>
>> The only solution I see off hand is making the critical reset code 
>> path work in atomic context as well, but that's not really doable 
>> cause AFAIK we need to work with functions from the PCI subsystem and 
>> spinning on a lock for up to a second is not really desirable also.
> I've checked the code a little but that can be the case now as well. 
> the new implementation's __radeon_fence_wait will be protected by the 
> exclusive_lock,, but enable_signaling is only protected by the 
> fence_queue.lock and is_signaled is not protected. But this is not a 
> change from the current situation, so it would only become a problem 
> if the gpu hangs in a cross-device situation.
>
> I think adding 1 to the irq refcount in the reset sequence and adding 
> a down_read_trylock on the exclusive lock would help. If the trylock 
> fails we could perform only the safe actions without touching any of 
> the gpu registers or vram, adding the refcount is needed to ensure 
> enable_signaling works as intended.

The problem here is that the whole approach collides with the way we do 
reset handling from a conceptual point of view. Every IOCTL or other 
call chain into the driver is protected by the read side of the 
exclusive_lock semaphore. So in the case of a GPU lockup we can take the 
write side of the semaphore and so make sure that we have nobody else 
accessing the hardware or internal driver structures only changed at 
init time.

Leaking a drivers IRQ context into another driver as well as calling 
into a driver in atomic context is just a quite uncommon approach and 
should be considered very carefully.

I would rather vote for a completely synchronous interface only allowing 
blocking waits and checks if a fence is signaled from not atomic context.

If a driver needs to avoid blocking it should just use a workqueue and 
checking a fence outside your own driver is probably be better done in a 
bottom halve handler anyway.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-19  8:27                           ` Christian König
@ 2014-05-19 10:10                             ` Maarten Lankhorst
  2014-05-19 12:30                               ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-19 10:10 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 19-05-14 10:27, Christian König schreef:
> Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
>> op 15-05-14 18:13, Christian König schreef:
>>> Am 15.05.2014 17:58, schrieb Maarten Lankhorst:
>>>> op 15-05-14 17:48, Christian König schreef:
>>>>> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>>>>>> op 15-05-14 15:19, Christian König schreef:
>>>>>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>>>>>> op 15-05-14 11:42, Christian König schreef:
>>>>>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>>>>>> op 15-05-14 11:21, Christian König schreef:
>>>>>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>>>>>> op 14-05-14 17:29, Christian König schreef:
>>>>>>>>>>>>>> +    /* did fence get signaled after we enabled the sw irq? */
>>>>>>>>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>>>>>>>>>>>>>> + radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>>>>>> +        return false;
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>>>>>>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>>>>>>>>>>>>>> + __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>>>>>>>>>>>>>> +    fence_get(f);
>>>>>>>>>>>>> That looks like a race condition to me. The fence needs to be added to the wait queue before the check, not after.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 
>>>>>>>>>>>> It's not a race condition because fence_queue.lock is held when this function is called.
>>>>>>>>>>> Ah, I see. That's also the reason why you moved the wake_up_all out of the processing function.
>>>>>>>>>> Correct. :-)
>>>>>>>>>>>> Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>>>>>>>> but any driver specific wait code would still handle this. I did this by design, because in future patches the wait
>>>>>>>>>>>> function may be called from outside of the radeon driver. The official wait function takes a timeout parameter,
>>>>>>>>>>>> so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for example, it would still return
>>>>>>>>>>>> and report that the function timed out.
>>>>>>>>>>> Timeouts help with the detection of the lockup, but not at all with the handling of them.
>>>>>>>>>>>
>>>>>>>>>>> What we essentially need is a wait callback into the driver that is called in non atomic context without any locks held.
>>>>>>>>>>>
>>>>>>>>>>> This way we can block for the fence to become signaled with a timeout and can then also initiate the reset handling if necessary.
>>>>>>>>>>>
>>>>>>>>>>> The way you designed the interface now means that the driver never gets a chance to wait for the hardware to become idle and so never has the opportunity to the reset the whole thing.
>>>>>>>>>> You could set up a hangcheck timer like intel does, and end up with a reliable hangcheck detection that doesn't depend on cpu waits. :-) Or override the default wait function and restore the old behavior.
>>>>>>>>>
>>>>>>>>> Overriding the default wait function sounds better, please implement it this way.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Christian. 
>>>>>>>>
>>>>>>>> Does this modification look sane?
>>>>>>> Adding the timeout is on my todo list for quite some time as well, so this part makes sense.
>>>>>>>
>>>>>>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
>>>>>>>> +{
>>>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>>>>>> +
>>>>>>>> +    target_seq[fence->ring] = fence->seq;
>>>>>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
>>>>>>>> +}
>>>>>>> When this call is comming from outside the radeon driver you need to lock rdev->exclusive_lock here to make sure not to interfere with a possible reset.
>>>>>> Ah thanks, I'll add that.
>>>>>>
>>>>>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>>>>>      .signaled = __radeon_fence_signaled,
>>>>>>> Do we still need those callback when we implemented the wait callback?
>>>>>> .get_timeline_name is used for debugging (trace events).
>>>>>> .signaled is the non-blocking call to check if the fence is signaled or not.
>>>>>> .enable_signaling is used for adding callbacks upon fence completion, the default 'fence_default_wait' uses it, so
>>>>>> when it works no separate implementation is needed unless you want to do more than just waiting.
>>>>>> It's also used when fence_add_callback is called. i915 can be patched to use it. ;-)
>>>>>
>>>>> I just meant enable_signaling, the other ones are fine with me. The problem with enable_signaling is that it's called with a spin lock held, so we can't sleep.
>>>>>
>>>>> While resetting the GPU could be moved out into a timer the problem here is that I can't lock rdev->exclusive_lock in such situations.
>>>>>
>>>>> This means when i915 would call into radeon to enable signaling for a fence we can't make sure that there is not GPU reset running on another CPU. And touching the IRQ registers while a reset is going on is a really good recipe to lockup the whole system.
>>>> If you increase the irq counter on all rings before doing a gpu reset, adjust the state and call sw_irq_put when done this race could never happen. Or am I missing something?
>>>>
>>> Beside that's being extremely ugly in the case of a hard PCI reset even touching any register or just accessing VRAM in this moment can crash the box. Just working around the enable/disable of the interrupt here won't help us much.
>>>
>>> Adding another spin lock won't work so well either, because the reset function itself wants to sleep as well.
>>>
>>> The only solution I see off hand is making the critical reset code path work in atomic context as well, but that's not really doable cause AFAIK we need to work with functions from the PCI subsystem and spinning on a lock for up to a second is not really desirable also.
>> I've checked the code a little but that can be the case now as well. the new implementation's __radeon_fence_wait will be protected by the exclusive_lock,, but enable_signaling is only protected by the fence_queue.lock and is_signaled is not protected. But this is not a change from the current situation, so it would only become a problem if the gpu hangs in a cross-device situation.
>>
>> I think adding 1 to the irq refcount in the reset sequence and adding a down_read_trylock on the exclusive lock would help. If the trylock fails we could perform only the safe actions without touching any of the gpu registers or vram, adding the refcount is needed to ensure enable_signaling works as intended.
>
> The problem here is that the whole approach collides with the way we do reset handling from a conceptual point of view. Every IOCTL or other call chain into the driver is protected by the read side of the exclusive_lock semaphore. So in the case of a GPU lockup we can take the write side of the semaphore and so make sure that we have nobody else accessing the hardware or internal driver structures only changed at init time.
>
> Leaking a drivers IRQ context into another driver as well as calling into a driver in atomic context is just a quite uncommon approach and should be considered very carefully.
>
> I would rather vote for a completely synchronous interface only allowing blocking waits and checks if a fence is signaled from not atomic context.
>
> If a driver needs to avoid blocking it should just use a workqueue and checking a fence outside your own driver is probably be better done in a bottom halve handler anyway.

Except that you might want to do something like
fence_is_signaled() in another driver to check whether you need to
defer, or can submit the batch buffer immediately, saving a bunch of
context switches. Running the is_signaled atomic is really useful here
because it means you can't do too many scary things in your is_signaled
handler.

In case of enable_signaling it was the only sane solution, because
fence_signal can be called from irq context, and any calls after that to
fence_add_callback and fence_wait aren't allowed to do anything, so
fence_enable_sw_signaling and the default wait implementation must be
atomic. fence_wait itself doesn't have to be, so it's easy to grab
exclusive_lock there.

Simple fence drivers may drop some state after calling fence_signal, so
calling .enable_signaling after fence_signal is bad, and fence_signal
must also wait for any previous call to enable_signaling to be
completed. This means those functions have to be implemented with the
atomic spinlock held and irqs disabled. :-) The .signaled callback could
strictly speaking still be called after fence_signal is called, but
this function is optional.

I tried other locking approaches, but when I used a separate spinlock
for the fence things got even messier and I ended up with impossible to
eliminate locking inversions, or I removed the guarantee that calling
fence_signal meant that enable_signaling had either been not called or
completed, or other bugs and much harder to read code.

~Maarten

Revised diff below.
---
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
  #include <linux/wait.h>
  #include <linux/list.h>
  #include <linux/kref.h>
+#include <linux/fence.h>
  
  #include <ttm/ttm_bo_api.h>
  #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
  #define RADEONFB_CONN_LIMIT			4
  #define RADEON_BIOS_NUM_SCRATCH			8
  
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
  /* internal ring indices */
  /* r1xx+ has gfx CP ring */
  #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
  };
  
  struct radeon_fence {
+	struct fence base;
+
  	struct radeon_device		*rdev;
-	struct kref			kref;
  	/* protected by radeon_fence.lock */
  	uint64_t			seq;
  	/* RB, DMA, etc. */
  	unsigned			ring;
+
+	wait_queue_t fence_wake;
  };
  
  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2256,6 +2257,7 @@ struct radeon_device {
  	struct radeon_mman		mman;
  	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
  	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
  	struct mutex			ring_lock;
  	struct radeon_ring		ring[RADEON_NUM_RINGS];
  	bool				ib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
  
  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
   * Registers read & write functions.
   */
  #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..19c6911ed49f 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
  	for (i = 0; i < RADEON_NUM_RINGS; i++) {
  		rdev->ring[i].idx = i;
  	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
  
  	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
  		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
  	return 0;
  }
  
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+	uint32_t mask = 0;
+	int i;
+
+	if (!rdev->ddev->irq_enabled)
+		return mask;
+
+	/*
+	 * increase refcount on sw interrupts for all rings to stop
+	 * enabling interrupts in radeon_fence_enable_signaling during
+	 * gpu reset.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!rdev->ring[i].ready)
+			continue;
+
+		atomic_inc(&rdev->irq.ring_int[i]);
+		mask |= 1 << i;
+	}
+	return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+	unsigned long irqflags;
+	int i;
+
+	if (!mask)
+		return;
+
+	/*
+	 * undo refcount increase, and reset irqs to correct value.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!(mask & (1 << i)))
+			continue;
+
+		atomic_dec(&rdev->irq.ring_int[i]);
+	}
+
+	spin_lock_irqsave(&rdev->irq.lock, irqflags);
+	radeon_irq_set(rdev);
+	spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
  /**
   * radeon_gpu_reset - reset the asic
   *
@@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  
  	int i, r;
  	int resched;
+	uint32_t sw_mask;
  
  	down_write(&rdev->exclusive_lock);
  
@@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  	radeon_save_bios_scratch_regs(rdev);
  	/* block TTM */
  	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+	sw_mask = radeon_gpu_mask_sw_irq(rdev);
  	radeon_pm_suspend(rdev);
  	radeon_suspend(rdev);
  
@@ -1644,6 +1695,7 @@ retry:
  	radeon_pm_resume(rdev);
  	drm_helper_resume_force_mode(rdev->ddev);
  
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);
  	ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
  	if (r) {
  		/* bad news, how to tell it to userspace ? */
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..db1f3b4708fa 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
  #include "radeon.h"
  #include "radeon_trace.h"
  
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  		      struct radeon_fence **fence,
  		      int ring)
  {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
  	/* we are protected by the ring emission mutex */
  	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
  	if ((*fence) == NULL) {
  		return -ENOMEM;
  	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
  	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
  	radeon_fence_ring_emit(rdev, ring, *fence);
  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
  	return 0;
  }
  
  /**
- * radeon_fence_process - process a fence
- *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
+ * radeon_fence_check_signaled - callback from fence_queue
   *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
   */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq, last_seq, last_emitted;
  	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
  		}
  	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
  
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
  }
  
  /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
   *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
   *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
   */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
  {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
  }
  
  /**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
  	return false;
  }
  
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		if (__radeon_fence_process(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		up_read(&rdev->exclusive_lock);
+	}
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
  /**
   * radeon_fence_signaled - check if a fence has signaled
   *
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
  	if (!fence) {
  		return true;
  	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
  	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
  		return true;
  	}
  	return false;
@@ -283,28 +381,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited = timeout;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,13 +424,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
+		timeout -= waited - r;
+
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  			if (!target_seq[i])
  				continue;
@@ -337,6 +444,12 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -379,14 +492,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
   * radeon_fence_wait - wait for a fence to signal
   *
   * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
   *
   * Wait for the requested fence to signal (all asics).
   * @intr selects whether to use interruptable (true) or non-interruptable
@@ -398,20 +511,17 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	uint64_t seq[RADEON_NUM_RINGS] = {};
  	int r;
  
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
  		return 0;
  
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	seq[fence->ring] = fence->seq;
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
-
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	}
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
  	return 0;
  }
  
@@ -434,7 +544,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -443,20 +553,21 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  			continue;
  		}
  
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
  		seq[i] = fences[i]->seq;
  		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
  	}
  
  	/* nothing to wait for ? */
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -475,6 +586,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +594,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -504,8 +619,8 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
@@ -525,7 +640,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
   */
  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
  {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
  	return fence;
  }
  
@@ -541,9 +656,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
  	struct radeon_fence *tmp = *fence;
  
  	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
  }
  
  /**
@@ -832,3 +946,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  	return 0;
  #endif
  }
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+	struct radeon_device *rdev = fence->rdev;
+	unsigned long r;
+
+	target_seq[fence->ring] = fence->seq;
+
+	down_read(&rdev->exclusive_lock);
+	r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+	if (r > 0 && !fence_signal(&fence->base))
+		FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+	up_read(&rdev->exclusive_lock);
+	return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = __radeon_fence_wait,
+	.release = NULL,
+};


----


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-19 10:10                             ` Maarten Lankhorst
@ 2014-05-19 12:30                               ` Christian König
  2014-05-19 13:35                                   ` Maarten Lankhorst
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2014-05-19 12:30 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 19.05.2014 12:10, schrieb Maarten Lankhorst:
> op 19-05-14 10:27, Christian König schreef:
>> Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
>> [SNIP]
>> The problem here is that the whole approach collides with the way we 
>> do reset handling from a conceptual point of view. Every IOCTL or 
>> other call chain into the driver is protected by the read side of the 
>> exclusive_lock semaphore. So in the case of a GPU lockup we can take 
>> the write side of the semaphore and so make sure that we have nobody 
>> else accessing the hardware or internal driver structures only 
>> changed at init time.
>>
>> Leaking a drivers IRQ context into another driver as well as calling 
>> into a driver in atomic context is just a quite uncommon approach and 
>> should be considered very carefully.
>>
>> I would rather vote for a completely synchronous interface only 
>> allowing blocking waits and checks if a fence is signaled from not 
>> atomic context.
>>
>> If a driver needs to avoid blocking it should just use a workqueue 
>> and checking a fence outside your own driver is probably be better 
>> done in a bottom halve handler anyway.
>
> Except that you might want to do something like
> fence_is_signaled() in another driver to check whether you need to
> defer, or can submit the batch buffer immediately, saving a bunch of
> context switches. Running the is_signaled atomic is really useful here
> because it means you can't do too many scary things in your is_signaled
> handler.

This is indeed a nice optimization, but nothing more. If you want to 
provide a is_signalled interface for atomic context then this should be 
optional, not mandatory.

>
> In case of enable_signaling it was the only sane solution, because
> fence_signal can be called from irq context, and any calls after that to
> fence_add_callback and fence_wait aren't allowed to do anything, so
> fence_enable_sw_signaling and the default wait implementation must be
> atomic. fence_wait itself doesn't have to be, so it's easy to grab
> exclusive_lock there.

I don't think you understood my point here: Completely drop 
enable_signaling, it's unnecessary and only complicates the interface.

We purposely avoided exactly this paradigm in the past and I haven't 
seen any good argument to start with it now.

Christian.

>
> Simple fence drivers may drop some state after calling fence_signal, so
> calling .enable_signaling after fence_signal is bad, and fence_signal
> must also wait for any previous call to enable_signaling to be
> completed. This means those functions have to be implemented with the
> atomic spinlock held and irqs disabled. :-) The .signaled callback could
> strictly speaking still be called after fence_signal is called, but
> this function is optional.
>
> I tried other locking approaches, but when I used a separate spinlock
> for the fence things got even messier and I ended up with impossible to
> eliminate locking inversions, or I removed the guarantee that calling
> fence_signal meant that enable_signaling had either been not called or
> completed, or other bugs and much harder to read code.
>
> ~Maarten
>
> Revised diff below.
> ---
> diff --git a/drivers/gpu/drm/radeon/radeon.h 
> b/drivers/gpu/drm/radeon/radeon.h
> index 68528619834a..a7d839a158ae 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
>  #include <linux/wait.h>
>  #include <linux/list.h>
>  #include <linux/kref.h>
> +#include <linux/fence.h>
>
>  #include <ttm/ttm_bo_api.h>
>  #include <ttm/ttm_bo_driver.h>
> @@ -113,9 +114,6 @@ extern int radeon_hard_reset;
>  #define RADEONFB_CONN_LIMIT            4
>  #define RADEON_BIOS_NUM_SCRATCH            8
>
> -/* fence seq are set to this number when signaled */
> -#define RADEON_FENCE_SIGNALED_SEQ        0LL
> -
>  /* internal ring indices */
>  /* r1xx+ has gfx CP ring */
>  #define RADEON_RING_TYPE_GFX_INDEX        0
> @@ -347,12 +345,15 @@ struct radeon_fence_driver {
>  };
>
>  struct radeon_fence {
> +    struct fence base;
> +
>      struct radeon_device        *rdev;
> -    struct kref            kref;
>      /* protected by radeon_fence.lock */
>      uint64_t            seq;
>      /* RB, DMA, etc. */
>      unsigned            ring;
> +
> +    wait_queue_t fence_wake;
>  };
>
>  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int 
> ring);
> @@ -2256,6 +2257,7 @@ struct radeon_device {
>      struct radeon_mman        mman;
>      struct radeon_fence_driver    fence_drv[RADEON_NUM_RINGS];
>      wait_queue_head_t        fence_queue;
> +    unsigned            fence_context;
>      struct mutex            ring_lock;
>      struct radeon_ring        ring[RADEON_NUM_RINGS];
>      bool                ib_pool_ready;
> @@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device 
> *rdev, u32 index);
>  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>
>  /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
>   * Registers read & write functions.
>   */
>  #define RREG8(reg) readb((rdev->rmmio) + (reg))
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
> b/drivers/gpu/drm/radeon/radeon_device.c
> index 0e770bbf7e29..19c6911ed49f 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
>      for (i = 0; i < RADEON_NUM_RINGS; i++) {
>          rdev->ring[i].idx = i;
>      }
> +    rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
>
>      DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 
> 0x%04X:0x%04X).\n",
>          radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> @@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, 
> bool resume, bool fbcon)
>      return 0;
>  }
>
> +static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
> +{
> +    uint32_t mask = 0;
> +    int i;
> +
> +    if (!rdev->ddev->irq_enabled)
> +        return mask;
> +
> +    /*
> +     * increase refcount on sw interrupts for all rings to stop
> +     * enabling interrupts in radeon_fence_enable_signaling during
> +     * gpu reset.
> +     */
> +
> +    for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +        if (!rdev->ring[i].ready)
> +            continue;
> +
> +        atomic_inc(&rdev->irq.ring_int[i]);
> +        mask |= 1 << i;
> +    }
> +    return mask;
> +}
> +
> +static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, 
> uint32_t mask)
> +{
> +    unsigned long irqflags;
> +    int i;
> +
> +    if (!mask)
> +        return;
> +
> +    /*
> +     * undo refcount increase, and reset irqs to correct value.
> +     */
> +
> +    for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +        if (!(mask & (1 << i)))
> +            continue;
> +
> +        atomic_dec(&rdev->irq.ring_int[i]);
> +    }
> +
> +    spin_lock_irqsave(&rdev->irq.lock, irqflags);
> +    radeon_irq_set(rdev);
> +    spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +}
> +
>  /**
>   * radeon_gpu_reset - reset the asic
>   *
> @@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>
>      int i, r;
>      int resched;
> +    uint32_t sw_mask;
>
>      down_write(&rdev->exclusive_lock);
>
> @@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>      radeon_save_bios_scratch_regs(rdev);
>      /* block TTM */
>      resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
> +    sw_mask = radeon_gpu_mask_sw_irq(rdev);
>      radeon_pm_suspend(rdev);
>      radeon_suspend(rdev);
>
> @@ -1644,6 +1695,7 @@ retry:
>      radeon_pm_resume(rdev);
>      drm_helper_resume_force_mode(rdev->ddev);
>
> +    radeon_gpu_unmask_sw_irq(rdev, sw_mask);
>      ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
>      if (r) {
>          /* bad news, how to tell it to userspace ? */
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index a77b1c13ea43..db1f3b4708fa 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -39,6 +39,15 @@
>  #include "radeon.h"
>  #include "radeon_trace.h"
>
> +static const struct fence_ops radeon_fence_ops;
> +
> +#define to_radeon_fence(p) \
> +    ({                                \
> +        struct radeon_fence *__f;                \
> +        __f = container_of((p), struct radeon_fence, base);    \
> +        __f->base.ops == &radeon_fence_ops ? __f : NULL;    \
> +    })
> +
>  /*
>   * Fences
>   * Fences mark an event in the GPUs pipeline and are used
> @@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
>                struct radeon_fence **fence,
>                int ring)
>  {
> +    u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
>      /* we are protected by the ring emission mutex */
>      *fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
>      if ((*fence) == NULL) {
>          return -ENOMEM;
>      }
> -    kref_init(&((*fence)->kref));
> -    (*fence)->rdev = rdev;
> -    (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
>      (*fence)->ring = ring;
> +    __fence_init(&(*fence)->base, &radeon_fence_ops,
> +             &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
> +    (*fence)->rdev = rdev;
> +    (*fence)->seq = seq;
>      radeon_fence_ring_emit(rdev, ring, *fence);
>      trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>      return 0;
>  }
>
>  /**
> - * radeon_fence_process - process a fence
> - *
> - * @rdev: radeon_device pointer
> - * @ring: ring index the fence is associated with
> + * radeon_fence_check_signaled - callback from fence_queue
>   *
> - * Checks the current fence value and wakes the fence queue
> - * if the sequence number has increased (all asics).
> + * this function is called with fence_queue lock held, which is also 
> used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
>   */
> -void radeon_fence_process(struct radeon_device *rdev, int ring)
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned 
> mode, int flags, void *key)
> +{
> +    struct radeon_fence *fence;
> +    u64 seq;
> +
> +    fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> +    seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> +    if (seq >= fence->seq) {
> +        int ret = __fence_signal(&fence->base);
> +
> +        if (!ret)
> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +        else
> +            FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +        __remove_wait_queue(&fence->rdev->fence_queue, 
> &fence->fence_wake);
> +        fence_put(&fence->base);
> +    } else
> +        FENCE_TRACE(&fence->base, "pending\n");
> +    return 0;
> +}
> +
> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq, last_seq, last_emitted;
>      unsigned count_loop = 0;
> @@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device 
> *rdev, int ring)
>          }
>      } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>
> -    if (wake)
> -        wake_up_all(&rdev->fence_queue);
> +    return wake;
>  }
>
>  /**
> - * radeon_fence_destroy - destroy a fence
> + * radeon_fence_process - process a fence
>   *
> - * @kref: fence kref
> + * @rdev: radeon_device pointer
> + * @ring: ring index the fence is associated with
>   *
> - * Frees the fence object (all asics).
> + * Checks the current fence value and wakes the fence queue
> + * if the sequence number has increased (all asics).
>   */
> -static void radeon_fence_destroy(struct kref *kref)
> +void radeon_fence_process(struct radeon_device *rdev, int ring)
>  {
> -    struct radeon_fence *fence;
> -
> -    fence = container_of(kref, struct radeon_fence, kref);
> -    kfree(fence);
> +    if (__radeon_fence_process(rdev, ring))
> +        wake_up_all(&rdev->fence_queue);
>  }
>
>  /**
> @@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct 
> radeon_device *rdev,
>      return false;
>  }
>
> +static bool __radeon_fence_signaled(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    struct radeon_device *rdev = fence->rdev;
> +    unsigned ring = fence->ring;
> +    u64 seq = fence->seq;
> +
> +    if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +        return true;
> +    }
> +
> +    if (down_read_trylock(&rdev->exclusive_lock)) {
> +        radeon_fence_process(rdev, ring);
> +        up_read(&rdev->exclusive_lock);
> +
> +        if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a 
> callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    struct radeon_device *rdev = fence->rdev;
> +
> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= 
> fence->seq ||
> +        !rdev->ddev->irq_enabled)
> +        return false;
> +
> +    radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> +    if (down_read_trylock(&rdev->exclusive_lock)) {
> +        if (__radeon_fence_process(rdev, fence->ring))
> +            wake_up_all_locked(&rdev->fence_queue);
> +
> +        up_read(&rdev->exclusive_lock);
> +    }
> +
> +    /* did fence get signaled after we enabled the sw irq? */
> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= 
> fence->seq) {
> +        radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> +        return false;
> +    }
> +
> +    fence->fence_wake.flags = 0;
> +    fence->fence_wake.private = NULL;
> +    fence->fence_wake.func = radeon_fence_check_signaled;
> +    __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> +    fence_get(f);
> +
> +    return true;
> +}
> +
>  /**
>   * radeon_fence_signaled - check if a fence has signaled
>   *
> @@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence 
> *fence)
>      if (!fence) {
>          return true;
>      }
> -    if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
> -        return true;
> -    }
> +
>      if (radeon_fence_seq_signaled(fence->rdev, fence->seq, 
> fence->ring)) {
> -        fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +        int ret;
> +
> +        ret = fence_signal(&fence->base);
> +        if (!ret)
> +            FENCE_TRACE(&fence->base, "signaled from 
> radeon_fence_signaled\n");
>          return true;
>      }
>      return false;
> @@ -283,28 +381,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
>   *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;
> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -319,13 +424,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -337,6 +444,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */
> +        if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -379,14 +492,14 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
>   * radeon_fence_wait - wait for a fence to signal
>   *
>   * @fence: radeon fence object
> - * @intr: use interruptable sleep
> + * @intr: use interruptible sleep
>   *
>   * Wait for the requested fence to signal (all asics).
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
> @@ -398,20 +511,17 @@ int radeon_fence_wait(struct radeon_fence 
> *fence, bool intr)
>      uint64_t seq[RADEON_NUM_RINGS] = {};
>      int r;
>
> -    if (fence == NULL) {
> -        WARN(1, "Querying an invalid fence : %p !\n", fence);
> -        return -EINVAL;
> -    }
> -
> -    seq[fence->ring] = fence->seq;
> -    if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
> +    if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
>          return 0;
>
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r)
> +    seq[fence->ring] = fence->seq;
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
> -
> -    fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +    }
> +    r = fence_signal(&fence->base);
> +    if (!r)
> +        FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
>      return 0;
>  }
>
> @@ -434,7 +544,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -443,20 +553,21 @@ int radeon_fence_wait_any(struct radeon_device 
> *rdev,
>              continue;
>          }
>
> +        if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
> +            /* already signaled */
> +            return 0;
> +        }
> +
>          seq[i] = fences[i]->seq;
>          ++num_rings;
> -
> -        /* test if something was allready signaled */
> -        if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
> -            return 0;
>      }
>
>      /* nothing to wait for ? */
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -475,6 +586,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -482,7 +594,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -504,8 +619,8 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> @@ -525,7 +640,7 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>   */
>  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
>  {
> -    kref_get(&fence->kref);
> +    fence_get(&fence->base);
>      return fence;
>  }
>
> @@ -541,9 +656,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
>      struct radeon_fence *tmp = *fence;
>
>      *fence = NULL;
> -    if (tmp) {
> -        kref_put(&tmp->kref, radeon_fence_destroy);
> -    }
> +    if (tmp)
> +        fence_put(&tmp->base);
>  }
>
>  /**
> @@ -832,3 +946,51 @@ int radeon_debugfs_fence_init(struct 
> radeon_device *rdev)
>      return 0;
>  #endif
>  }
> +
> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +    struct radeon_device *rdev = fence->rdev;
> +    unsigned long r;
> +
> +    target_seq[fence->ring] = fence->seq;
> +
> +    down_read(&rdev->exclusive_lock);
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
> timeout);
> +
> +    if (r > 0 && !fence_signal(&fence->base))
> +        FENCE_TRACE(&fence->base, "signaled from 
> __radeon_fence_wait\n");
> +
> +    up_read(&rdev->exclusive_lock);
> +    return r;
> +
> +}
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> +    return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    switch (fence->ring) {
> +    case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> +    case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> +    case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> +    case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> +    case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> +    case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> +    default: WARN_ON_ONCE(1); return "radeon.unk";
> +    }
> +}
> +
> +static const struct fence_ops radeon_fence_ops = {
> +    .get_driver_name = radeon_fence_get_driver_name,
> +    .get_timeline_name = radeon_fence_get_timeline_name,
> +    .enable_signaling = radeon_fence_enable_signaling,
> +    .signaled = __radeon_fence_signaled,
> +    .wait = __radeon_fence_wait,
> +    .release = NULL,
> +};
>
>
> ----
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-19 13:35                                   ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-19 13:35 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

op 19-05-14 14:30, Christian König schreef:
> Am 19.05.2014 12:10, schrieb Maarten Lankhorst:
>> op 19-05-14 10:27, Christian König schreef:
>>> Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
>>> [SNIP]
>>> The problem here is that the whole approach collides with the way we do reset handling from a conceptual point of view. Every IOCTL or other call chain into the driver is protected by the read side of the exclusive_lock semaphore. So in the case of a GPU lockup we can take the write side of the semaphore and so make sure that we have nobody else accessing the hardware or internal driver structures only changed at init time.
>>>
>>> Leaking a drivers IRQ context into another driver as well as calling into a driver in atomic context is just a quite uncommon approach and should be considered very carefully.
>>>
>>> I would rather vote for a completely synchronous interface only allowing blocking waits and checks if a fence is signaled from not atomic context.
>>>
>>> If a driver needs to avoid blocking it should just use a workqueue and checking a fence outside your own driver is probably be better done in a bottom halve handler anyway.
>>
>> Except that you might want to do something like
>> fence_is_signaled() in another driver to check whether you need to
>> defer, or can submit the batch buffer immediately, saving a bunch of
>> context switches. Running the is_signaled atomic is really useful here
>> because it means you can't do too many scary things in your is_signaled
>> handler.
>
> This is indeed a nice optimization, but nothing more. If you want to provide a is_signalled interface for atomic context then this should be optional, not mandatory.
See below.
>> In case of enable_signaling it was the only sane solution, because
>> fence_signal can be called from irq context, and any calls after that to
>> fence_add_callback and fence_wait aren't allowed to do anything, so
>> fence_enable_sw_signaling and the default wait implementation must be
>> atomic. fence_wait itself doesn't have to be, so it's easy to grab
>> exclusive_lock there.
>
> I don't think you understood my point here: Completely drop enable_signaling, it's unnecessary and only complicates the interface.
>
> We purposely avoided exactly this paradigm in the past and I haven't seen any good argument to start with it now.

In the common case a lot more fences will be emitted than will be waited on.
This means it makes sense to delay signaling a fence with fence_signal for
as long as possible. But when a fence user wants to work with a fence
some way is needed to ensure that the fence will complete. This is the idea
behind .enable_signaling, it tells the fence driver to call fence_signal on
the fence 'soon' because there are now waiters for it.

The atomic .signaled is optional, and can be set to NULL, but there is
no guarantee that fence_is_signaled will ever return true in that case,
unless fence_enable_sw_signaling is called (which calls .enable_signaling).

Providing a custom wait function is optional in the interface, if the default wait
function is used all waiters are signaled when fence_signal is called.

Removing enable_signaling would only make sense if fence_signal was removed too,
but that would mean that fence_is_signaled could no longer exist in the core fence
code, and would mean completely rewriting the interface.

~Maarten


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
@ 2014-05-19 13:35                                   ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-05-19 13:35 UTC (permalink / raw)
  To: Christian König, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

op 19-05-14 14:30, Christian König schreef:
> Am 19.05.2014 12:10, schrieb Maarten Lankhorst:
>> op 19-05-14 10:27, Christian König schreef:
>>> Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
>>> [SNIP]
>>> The problem here is that the whole approach collides with the way we do reset handling from a conceptual point of view. Every IOCTL or other call chain into the driver is protected by the read side of the exclusive_lock semaphore. So in the case of a GPU lockup we can take the write side of the semaphore and so make sure that we have nobody else accessing the hardware or internal driver structures only changed at init time.
>>>
>>> Leaking a drivers IRQ context into another driver as well as calling into a driver in atomic context is just a quite uncommon approach and should be considered very carefully.
>>>
>>> I would rather vote for a completely synchronous interface only allowing blocking waits and checks if a fence is signaled from not atomic context.
>>>
>>> If a driver needs to avoid blocking it should just use a workqueue and checking a fence outside your own driver is probably be better done in a bottom halve handler anyway.
>>
>> Except that you might want to do something like
>> fence_is_signaled() in another driver to check whether you need to
>> defer, or can submit the batch buffer immediately, saving a bunch of
>> context switches. Running the is_signaled atomic is really useful here
>> because it means you can't do too many scary things in your is_signaled
>> handler.
>
> This is indeed a nice optimization, but nothing more. If you want to provide a is_signalled interface for atomic context then this should be optional, not mandatory.
See below.
>> In case of enable_signaling it was the only sane solution, because
>> fence_signal can be called from irq context, and any calls after that to
>> fence_add_callback and fence_wait aren't allowed to do anything, so
>> fence_enable_sw_signaling and the default wait implementation must be
>> atomic. fence_wait itself doesn't have to be, so it's easy to grab
>> exclusive_lock there.
>
> I don't think you understood my point here: Completely drop enable_signaling, it's unnecessary and only complicates the interface.
>
> We purposely avoided exactly this paradigm in the past and I haven't seen any good argument to start with it now.

In the common case a lot more fences will be emitted than will be waited on.
This means it makes sense to delay signaling a fence with fence_signal for
as long as possible. But when a fence user wants to work with a fence
some way is needed to ensure that the fence will complete. This is the idea
behind .enable_signaling, it tells the fence driver to call fence_signal on
the fence 'soon' because there are now waiters for it.

The atomic .signaled is optional, and can be set to NULL, but there is
no guarantee that fence_is_signaled will ever return true in that case,
unless fence_enable_sw_signaling is called (which calls .enable_signaling).

Providing a custom wait function is optional in the interface, if the default wait
function is used all waiters are signaled when fence_signal is called.

Removing enable_signaling would only make sense if fence_signal was removed too,
but that would mean that fence_is_signaled could no longer exist in the core fence
code, and would mean completely rewriting the interface.

~Maarten

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
  2014-05-19 13:35                                   ` Maarten Lankhorst
  (?)
@ 2014-05-19 14:25                                   ` Christian König
  2014-06-02 10:09                                       ` Maarten Lankhorst
  -1 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2014-05-19 14:25 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 19.05.2014 15:35, schrieb Maarten Lankhorst:
> op 19-05-14 14:30, Christian König schreef:
>> Am 19.05.2014 12:10, schrieb Maarten Lankhorst:
>>> op 19-05-14 10:27, Christian König schreef:
>>>> Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
>>>> [SNIP]
>>>> The problem here is that the whole approach collides with the way 
>>>> we do reset handling from a conceptual point of view. Every IOCTL 
>>>> or other call chain into the driver is protected by the read side 
>>>> of the exclusive_lock semaphore. So in the case of a GPU lockup we 
>>>> can take the write side of the semaphore and so make sure that we 
>>>> have nobody else accessing the hardware or internal driver 
>>>> structures only changed at init time.
>>>>
>>>> Leaking a drivers IRQ context into another driver as well as 
>>>> calling into a driver in atomic context is just a quite uncommon 
>>>> approach and should be considered very carefully.
>>>>
>>>> I would rather vote for a completely synchronous interface only 
>>>> allowing blocking waits and checks if a fence is signaled from not 
>>>> atomic context.
>>>>
>>>> If a driver needs to avoid blocking it should just use a workqueue 
>>>> and checking a fence outside your own driver is probably be better 
>>>> done in a bottom halve handler anyway.
>>>
>>> Except that you might want to do something like
>>> fence_is_signaled() in another driver to check whether you need to
>>> defer, or can submit the batch buffer immediately, saving a bunch of
>>> context switches. Running the is_signaled atomic is really useful here
>>> because it means you can't do too many scary things in your is_signaled
>>> handler.
>>
>> This is indeed a nice optimization, but nothing more. If you want to 
>> provide a is_signalled interface for atomic context then this should 
>> be optional, not mandatory.
> See below.
>>> In case of enable_signaling it was the only sane solution, because
>>> fence_signal can be called from irq context, and any calls after 
>>> that to
>>> fence_add_callback and fence_wait aren't allowed to do anything, so
>>> fence_enable_sw_signaling and the default wait implementation must be
>>> atomic. fence_wait itself doesn't have to be, so it's easy to grab
>>> exclusive_lock there.
>>
>> I don't think you understood my point here: Completely drop 
>> enable_signaling, it's unnecessary and only complicates the interface.
>>
>> We purposely avoided exactly this paradigm in the past and I haven't 
>> seen any good argument to start with it now.
>
> In the common case a lot more fences will be emitted than will be 
> waited on.
> This means it makes sense to delay signaling a fence with fence_signal 
> for
> as long as possible. But when a fence user wants to work with a fence
> some way is needed to ensure that the fence will complete. This is the 
> idea
> behind .enable_signaling, it tells the fence driver to call 
> fence_signal on
> the fence 'soon' because there are now waiters for it.
>
> The atomic .signaled is optional, and can be set to NULL, but there is
> no guarantee that fence_is_signaled will ever return true in that case,
> unless fence_enable_sw_signaling is called (which calls 
> .enable_signaling).
>
> Providing a custom wait function is optional in the interface, if the 
> default wait
> function is used all waiters are signaled when fence_signal is called.
>
> Removing enable_signaling would only make sense if fence_signal was 
> removed too,
> but that would mean that fence_is_signaled could no longer exist in 
> the core fence
> code, and would mean completely rewriting the interface.
>
And this is what I'm suggesting here.

We have avoided quite hard adding any form of those callbacks in the 
past and I don't really see a reason why that would have changed. For 
example see the discussion here: 
http://lists.freedesktop.org/archives/dri-devel/2012-May/022388.html

Jerome and Dave rejected my approach for handling the sub allocator 
through a callback for exactly the same reason. And that was even for 
call chains inside the same driver, you're suggesting this for cross 
driver synchronization.

Christian.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences
@ 2014-06-02 10:09                                       ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-02 10:09 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

Changes since v1:
- Fixed interaction with reset handling.
   + Use exclusive_lock, either with trylock or blocking.
   + Bump sw irq refcount in the recovery function to prevent fiddling
     with irq registers during gpu recovery.
- Add radeon lockup detection to the default fence wait function.
---
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
  #include <linux/wait.h>
  #include <linux/list.h>
  #include <linux/kref.h>
+#include <linux/fence.h>
  
  #include <ttm/ttm_bo_api.h>
  #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
  #define RADEONFB_CONN_LIMIT			4
  #define RADEON_BIOS_NUM_SCRATCH			8
  
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
  /* internal ring indices */
  /* r1xx+ has gfx CP ring */
  #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
  };
  
  struct radeon_fence {
+	struct fence base;
+
  	struct radeon_device		*rdev;
-	struct kref			kref;
  	/* protected by radeon_fence.lock */
  	uint64_t			seq;
  	/* RB, DMA, etc. */
  	unsigned			ring;
+
+	wait_queue_t fence_wake;
  };
  
  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2256,6 +2257,7 @@ struct radeon_device {
  	struct radeon_mman		mman;
  	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
  	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
  	struct mutex			ring_lock;
  	struct radeon_ring		ring[RADEON_NUM_RINGS];
  	bool				ib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
  
  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
   * Registers read & write functions.
   */
  #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..6800a0f6dd33 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
  	for (i = 0; i < RADEON_NUM_RINGS; i++) {
  		rdev->ring[i].idx = i;
  	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
  
  	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
  		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
  	return 0;
  }
  
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+	uint32_t mask = 0;
+	int i;
+
+	if (!rdev->ddev->irq_enabled)
+		return mask;
+
+	/*
+	 * increase refcount on sw interrupts for all rings to stop
+	 * enabling interrupts in radeon_fence_enable_signaling during
+	 * gpu reset.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!rdev->ring[i].ready)
+			continue;
+
+		atomic_inc(&rdev->irq.ring_int[i]);
+		mask |= 1 << i;
+	}
+	return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+	unsigned long irqflags;
+	int i;
+
+	if (!mask)
+		return;
+
+	/*
+	 * undo refcount increase, and reset irqs to correct value.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!(mask & (1 << i)))
+			continue;
+
+		atomic_dec(&rdev->irq.ring_int[i]);
+	}
+
+	spin_lock_irqsave(&rdev->irq.lock, irqflags);
+	radeon_irq_set(rdev);
+	spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
  /**
   * radeon_gpu_reset - reset the asic
   *
@@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  
  	int i, r;
  	int resched;
+	uint32_t sw_mask;
  
  	down_write(&rdev->exclusive_lock);
  
@@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  	radeon_save_bios_scratch_regs(rdev);
  	/* block TTM */
  	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+	sw_mask = radeon_gpu_mask_sw_irq(rdev);
  	radeon_pm_suspend(rdev);
  	radeon_suspend(rdev);
  
@@ -1644,13 +1695,20 @@ retry:
  	radeon_pm_resume(rdev);
  	drm_helper_resume_force_mode(rdev->ddev);
  
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);
  	ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
  	if (r) {
  		/* bad news, how to tell it to userspace ? */
  		dev_info(rdev->dev, "GPU reset failed\n");
  	}
  
-	up_write(&rdev->exclusive_lock);
+	/*
+	 * force all waiters to recheck, some may have been
+	 * added while the exclusive_lock was unavailable
+	 */
+	downgrade_write(&rdev->exclusive_lock);
+	wake_up_all(&rdev->fence_queue);
+	up_read(&rdev->exclusive_lock);
  	return r;
  }
  
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..db1f3b4708fa 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
  #include "radeon.h"
  #include "radeon_trace.h"
  
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  		      struct radeon_fence **fence,
  		      int ring)
  {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
  	/* we are protected by the ring emission mutex */
  	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
  	if ((*fence) == NULL) {
  		return -ENOMEM;
  	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
  	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
  	radeon_fence_ring_emit(rdev, ring, *fence);
  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
  	return 0;
  }
  
  /**
- * radeon_fence_process - process a fence
- *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
+ * radeon_fence_check_signaled - callback from fence_queue
   *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
   */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq, last_seq, last_emitted;
  	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
  		}
  	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
  
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
  }
  
  /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
   *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
   *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
   */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
  {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
  }
  
  /**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
  	return false;
  }
  
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		if (__radeon_fence_process(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		up_read(&rdev->exclusive_lock);
+	}
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
  /**
   * radeon_fence_signaled - check if a fence has signaled
   *
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
  	if (!fence) {
  		return true;
  	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
  	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
  		return true;
  	}
  	return false;
@@ -283,28 +381,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited = timeout;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,13 +424,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
+		timeout -= waited - r;
+
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  			if (!target_seq[i])
  				continue;
@@ -337,6 +444,12 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -379,14 +492,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
   * radeon_fence_wait - wait for a fence to signal
   *
   * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
   *
   * Wait for the requested fence to signal (all asics).
   * @intr selects whether to use interruptable (true) or non-interruptable
@@ -398,20 +511,17 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	uint64_t seq[RADEON_NUM_RINGS] = {};
  	int r;
  
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
  		return 0;
  
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	seq[fence->ring] = fence->seq;
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
-
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	}
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
  	return 0;
  }
  
@@ -434,7 +544,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -443,20 +553,21 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  			continue;
  		}
  
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
  		seq[i] = fences[i]->seq;
  		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
  	}
  
  	/* nothing to wait for ? */
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -475,6 +586,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +594,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -504,8 +619,8 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
@@ -525,7 +640,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
   */
  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
  {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
  	return fence;
  }
  
@@ -541,9 +656,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
  	struct radeon_fence *tmp = *fence;
  
  	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
  }
  
  /**
@@ -832,3 +946,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  	return 0;
  #endif
  }
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+	struct radeon_device *rdev = fence->rdev;
+	unsigned long r;
+
+	target_seq[fence->ring] = fence->seq;
+
+	down_read(&rdev->exclusive_lock);
+	r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+	if (r > 0 && !fence_signal(&fence->base))
+		FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+	up_read(&rdev->exclusive_lock);
+	return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = __radeon_fence_wait,
+	.release = NULL,
+};


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences
@ 2014-06-02 10:09                                       ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-02 10:09 UTC (permalink / raw)
  To: Christian König, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Changes since v1:
- Fixed interaction with reset handling.
   + Use exclusive_lock, either with trylock or blocking.
   + Bump sw irq refcount in the recovery function to prevent fiddling
     with irq registers during gpu recovery.
- Add radeon lockup detection to the default fence wait function.
---
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
  #include <linux/wait.h>
  #include <linux/list.h>
  #include <linux/kref.h>
+#include <linux/fence.h>
  
  #include <ttm/ttm_bo_api.h>
  #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
  #define RADEONFB_CONN_LIMIT			4
  #define RADEON_BIOS_NUM_SCRATCH			8
  
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
  /* internal ring indices */
  /* r1xx+ has gfx CP ring */
  #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
  };
  
  struct radeon_fence {
+	struct fence base;
+
  	struct radeon_device		*rdev;
-	struct kref			kref;
  	/* protected by radeon_fence.lock */
  	uint64_t			seq;
  	/* RB, DMA, etc. */
  	unsigned			ring;
+
+	wait_queue_t fence_wake;
  };
  
  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2256,6 +2257,7 @@ struct radeon_device {
  	struct radeon_mman		mman;
  	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
  	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
  	struct mutex			ring_lock;
  	struct radeon_ring		ring[RADEON_NUM_RINGS];
  	bool				ib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
  
  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
   * Registers read & write functions.
   */
  #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..6800a0f6dd33 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
  	for (i = 0; i < RADEON_NUM_RINGS; i++) {
  		rdev->ring[i].idx = i;
  	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
  
  	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
  		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
  	return 0;
  }
  
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+	uint32_t mask = 0;
+	int i;
+
+	if (!rdev->ddev->irq_enabled)
+		return mask;
+
+	/*
+	 * increase refcount on sw interrupts for all rings to stop
+	 * enabling interrupts in radeon_fence_enable_signaling during
+	 * gpu reset.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!rdev->ring[i].ready)
+			continue;
+
+		atomic_inc(&rdev->irq.ring_int[i]);
+		mask |= 1 << i;
+	}
+	return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+	unsigned long irqflags;
+	int i;
+
+	if (!mask)
+		return;
+
+	/*
+	 * undo refcount increase, and reset irqs to correct value.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!(mask & (1 << i)))
+			continue;
+
+		atomic_dec(&rdev->irq.ring_int[i]);
+	}
+
+	spin_lock_irqsave(&rdev->irq.lock, irqflags);
+	radeon_irq_set(rdev);
+	spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
  /**
   * radeon_gpu_reset - reset the asic
   *
@@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  
  	int i, r;
  	int resched;
+	uint32_t sw_mask;
  
  	down_write(&rdev->exclusive_lock);
  
@@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  	radeon_save_bios_scratch_regs(rdev);
  	/* block TTM */
  	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+	sw_mask = radeon_gpu_mask_sw_irq(rdev);
  	radeon_pm_suspend(rdev);
  	radeon_suspend(rdev);
  
@@ -1644,13 +1695,20 @@ retry:
  	radeon_pm_resume(rdev);
  	drm_helper_resume_force_mode(rdev->ddev);
  
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);
  	ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
  	if (r) {
  		/* bad news, how to tell it to userspace ? */
  		dev_info(rdev->dev, "GPU reset failed\n");
  	}
  
-	up_write(&rdev->exclusive_lock);
+	/*
+	 * force all waiters to recheck, some may have been
+	 * added while the exclusive_lock was unavailable
+	 */
+	downgrade_write(&rdev->exclusive_lock);
+	wake_up_all(&rdev->fence_queue);
+	up_read(&rdev->exclusive_lock);
  	return r;
  }
  
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..db1f3b4708fa 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
  #include "radeon.h"
  #include "radeon_trace.h"
  
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  		      struct radeon_fence **fence,
  		      int ring)
  {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
  	/* we are protected by the ring emission mutex */
  	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
  	if ((*fence) == NULL) {
  		return -ENOMEM;
  	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
  	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
  	radeon_fence_ring_emit(rdev, ring, *fence);
  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
  	return 0;
  }
  
  /**
- * radeon_fence_process - process a fence
- *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
+ * radeon_fence_check_signaled - callback from fence_queue
   *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
   */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq, last_seq, last_emitted;
  	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
  		}
  	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
  
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
  }
  
  /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
   *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
   *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
   */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
  {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
  }
  
  /**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
  	return false;
  }
  
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		if (__radeon_fence_process(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		up_read(&rdev->exclusive_lock);
+	}
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
  /**
   * radeon_fence_signaled - check if a fence has signaled
   *
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
  	if (!fence) {
  		return true;
  	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
  	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
  		return true;
  	}
  	return false;
@@ -283,28 +381,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited = timeout;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,13 +424,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
+		timeout -= waited - r;
+
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  			if (!target_seq[i])
  				continue;
@@ -337,6 +444,12 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -379,14 +492,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
   * radeon_fence_wait - wait for a fence to signal
   *
   * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
   *
   * Wait for the requested fence to signal (all asics).
   * @intr selects whether to use interruptable (true) or non-interruptable
@@ -398,20 +511,17 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	uint64_t seq[RADEON_NUM_RINGS] = {};
  	int r;
  
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
  		return 0;
  
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	seq[fence->ring] = fence->seq;
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
-
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	}
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
  	return 0;
  }
  
@@ -434,7 +544,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -443,20 +553,21 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  			continue;
  		}
  
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
  		seq[i] = fences[i]->seq;
  		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
  	}
  
  	/* nothing to wait for ? */
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -475,6 +586,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +594,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -504,8 +619,8 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
@@ -525,7 +640,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
   */
  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
  {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
  	return fence;
  }
  
@@ -541,9 +656,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
  	struct radeon_fence *tmp = *fence;
  
  	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
  }
  
  /**
@@ -832,3 +946,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  	return 0;
  #endif
  }
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+	struct radeon_device *rdev = fence->rdev;
+	unsigned long r;
+
+	target_seq[fence->ring] = fence->seq;
+
+	down_read(&rdev->exclusive_lock);
+	r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+	if (r > 0 && !fence_signal(&fence->base))
+		FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+	up_read(&rdev->exclusive_lock);
+	return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = __radeon_fence_wait,
+	.release = NULL,
+};

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences
  2014-06-02 10:09                                       ` Maarten Lankhorst
  (?)
@ 2014-06-02 10:45                                       ` Christian König
  2014-06-02 13:14                                         ` [RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq Maarten Lankhorst
  2014-06-02 13:16                                           ` Maarten Lankhorst
  -1 siblings, 2 replies; 50+ messages in thread
From: Christian König @ 2014-06-02 10:45 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 02.06.2014 12:09, schrieb Maarten Lankhorst:
> Changes since v1:
> - Fixed interaction with reset handling.
>   + Use exclusive_lock, either with trylock or blocking.
>   + Bump sw irq refcount in the recovery function to prevent fiddling
>     with irq registers during gpu recovery.
> - Add radeon lockup detection to the default fence wait function.

First of all please separate out adding the timeout parameter to the 
fence lock function, so we can review it on it's own.

Thanks,
Christian.

> ---
> diff --git a/drivers/gpu/drm/radeon/radeon.h 
> b/drivers/gpu/drm/radeon/radeon.h
> index 68528619834a..a7d839a158ae 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -64,6 +64,7 @@
>  #include <linux/wait.h>
>  #include <linux/list.h>
>  #include <linux/kref.h>
> +#include <linux/fence.h>
>
>  #include <ttm/ttm_bo_api.h>
>  #include <ttm/ttm_bo_driver.h>
> @@ -113,9 +114,6 @@ extern int radeon_hard_reset;
>  #define RADEONFB_CONN_LIMIT            4
>  #define RADEON_BIOS_NUM_SCRATCH            8
>
> -/* fence seq are set to this number when signaled */
> -#define RADEON_FENCE_SIGNALED_SEQ        0LL
> -
>  /* internal ring indices */
>  /* r1xx+ has gfx CP ring */
>  #define RADEON_RING_TYPE_GFX_INDEX        0
> @@ -347,12 +345,15 @@ struct radeon_fence_driver {
>  };
>
>  struct radeon_fence {
> +    struct fence base;
> +
>      struct radeon_device        *rdev;
> -    struct kref            kref;
>      /* protected by radeon_fence.lock */
>      uint64_t            seq;
>      /* RB, DMA, etc. */
>      unsigned            ring;
> +
> +    wait_queue_t fence_wake;
>  };
>
>  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int 
> ring);
> @@ -2256,6 +2257,7 @@ struct radeon_device {
>      struct radeon_mman        mman;
>      struct radeon_fence_driver    fence_drv[RADEON_NUM_RINGS];
>      wait_queue_head_t        fence_queue;
> +    unsigned            fence_context;
>      struct mutex            ring_lock;
>      struct radeon_ring        ring[RADEON_NUM_RINGS];
>      bool                ib_pool_ready;
> @@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device 
> *rdev, u32 index);
>  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>
>  /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
>   * Registers read & write functions.
>   */
>  #define RREG8(reg) readb((rdev->rmmio) + (reg))
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
> b/drivers/gpu/drm/radeon/radeon_device.c
> index 0e770bbf7e29..6800a0f6dd33 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
>      for (i = 0; i < RADEON_NUM_RINGS; i++) {
>          rdev->ring[i].idx = i;
>      }
> +    rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
>
>      DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 
> 0x%04X:0x%04X).\n",
>          radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> @@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, 
> bool resume, bool fbcon)
>      return 0;
>  }
>
> +static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
> +{
> +    uint32_t mask = 0;
> +    int i;
> +
> +    if (!rdev->ddev->irq_enabled)
> +        return mask;
> +
> +    /*
> +     * increase refcount on sw interrupts for all rings to stop
> +     * enabling interrupts in radeon_fence_enable_signaling during
> +     * gpu reset.
> +     */
> +
> +    for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +        if (!rdev->ring[i].ready)
> +            continue;
> +
> +        atomic_inc(&rdev->irq.ring_int[i]);
> +        mask |= 1 << i;
> +    }
> +    return mask;
> +}
> +
> +static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, 
> uint32_t mask)
> +{
> +    unsigned long irqflags;
> +    int i;
> +
> +    if (!mask)
> +        return;
> +
> +    /*
> +     * undo refcount increase, and reset irqs to correct value.
> +     */
> +
> +    for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +        if (!(mask & (1 << i)))
> +            continue;
> +
> +        atomic_dec(&rdev->irq.ring_int[i]);
> +    }
> +
> +    spin_lock_irqsave(&rdev->irq.lock, irqflags);
> +    radeon_irq_set(rdev);
> +    spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +}
> +
>  /**
>   * radeon_gpu_reset - reset the asic
>   *
> @@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>
>      int i, r;
>      int resched;
> +    uint32_t sw_mask;
>
>      down_write(&rdev->exclusive_lock);
>
> @@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
>      radeon_save_bios_scratch_regs(rdev);
>      /* block TTM */
>      resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
> +    sw_mask = radeon_gpu_mask_sw_irq(rdev);
>      radeon_pm_suspend(rdev);
>      radeon_suspend(rdev);
>
> @@ -1644,13 +1695,20 @@ retry:
>      radeon_pm_resume(rdev);
>      drm_helper_resume_force_mode(rdev->ddev);
>
> +    radeon_gpu_unmask_sw_irq(rdev, sw_mask);
>      ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
>      if (r) {
>          /* bad news, how to tell it to userspace ? */
>          dev_info(rdev->dev, "GPU reset failed\n");
>      }
>
> -    up_write(&rdev->exclusive_lock);
> +    /*
> +     * force all waiters to recheck, some may have been
> +     * added while the exclusive_lock was unavailable
> +     */
> +    downgrade_write(&rdev->exclusive_lock);
> +    wake_up_all(&rdev->fence_queue);
> +    up_read(&rdev->exclusive_lock);
>      return r;
>  }
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index a77b1c13ea43..db1f3b4708fa 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -39,6 +39,15 @@
>  #include "radeon.h"
>  #include "radeon_trace.h"
>
> +static const struct fence_ops radeon_fence_ops;
> +
> +#define to_radeon_fence(p) \
> +    ({                                \
> +        struct radeon_fence *__f;                \
> +        __f = container_of((p), struct radeon_fence, base);    \
> +        __f->base.ops == &radeon_fence_ops ? __f : NULL;    \
> +    })
> +
>  /*
>   * Fences
>   * Fences mark an event in the GPUs pipeline and are used
> @@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
>                struct radeon_fence **fence,
>                int ring)
>  {
> +    u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
>      /* we are protected by the ring emission mutex */
>      *fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
>      if ((*fence) == NULL) {
>          return -ENOMEM;
>      }
> -    kref_init(&((*fence)->kref));
> -    (*fence)->rdev = rdev;
> -    (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
>      (*fence)->ring = ring;
> +    __fence_init(&(*fence)->base, &radeon_fence_ops,
> +             &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
> +    (*fence)->rdev = rdev;
> +    (*fence)->seq = seq;
>      radeon_fence_ring_emit(rdev, ring, *fence);
>      trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>      return 0;
>  }
>
>  /**
> - * radeon_fence_process - process a fence
> - *
> - * @rdev: radeon_device pointer
> - * @ring: ring index the fence is associated with
> + * radeon_fence_check_signaled - callback from fence_queue
>   *
> - * Checks the current fence value and wakes the fence queue
> - * if the sequence number has increased (all asics).
> + * this function is called with fence_queue lock held, which is also 
> used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
>   */
> -void radeon_fence_process(struct radeon_device *rdev, int ring)
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned 
> mode, int flags, void *key)
> +{
> +    struct radeon_fence *fence;
> +    u64 seq;
> +
> +    fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> +    seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> +    if (seq >= fence->seq) {
> +        int ret = __fence_signal(&fence->base);
> +
> +        if (!ret)
> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +        else
> +            FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +        __remove_wait_queue(&fence->rdev->fence_queue, 
> &fence->fence_wake);
> +        fence_put(&fence->base);
> +    } else
> +        FENCE_TRACE(&fence->base, "pending\n");
> +    return 0;
> +}
> +
> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq, last_seq, last_emitted;
>      unsigned count_loop = 0;
> @@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device 
> *rdev, int ring)
>          }
>      } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>
> -    if (wake)
> -        wake_up_all(&rdev->fence_queue);
> +    return wake;
>  }
>
>  /**
> - * radeon_fence_destroy - destroy a fence
> + * radeon_fence_process - process a fence
>   *
> - * @kref: fence kref
> + * @rdev: radeon_device pointer
> + * @ring: ring index the fence is associated with
>   *
> - * Frees the fence object (all asics).
> + * Checks the current fence value and wakes the fence queue
> + * if the sequence number has increased (all asics).
>   */
> -static void radeon_fence_destroy(struct kref *kref)
> +void radeon_fence_process(struct radeon_device *rdev, int ring)
>  {
> -    struct radeon_fence *fence;
> -
> -    fence = container_of(kref, struct radeon_fence, kref);
> -    kfree(fence);
> +    if (__radeon_fence_process(rdev, ring))
> +        wake_up_all(&rdev->fence_queue);
>  }
>
>  /**
> @@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct 
> radeon_device *rdev,
>      return false;
>  }
>
> +static bool __radeon_fence_signaled(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    struct radeon_device *rdev = fence->rdev;
> +    unsigned ring = fence->ring;
> +    u64 seq = fence->seq;
> +
> +    if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +        return true;
> +    }
> +
> +    if (down_read_trylock(&rdev->exclusive_lock)) {
> +        radeon_fence_process(rdev, ring);
> +        up_read(&rdev->exclusive_lock);
> +
> +        if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a 
> callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    struct radeon_device *rdev = fence->rdev;
> +
> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= 
> fence->seq ||
> +        !rdev->ddev->irq_enabled)
> +        return false;
> +
> +    radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> +    if (down_read_trylock(&rdev->exclusive_lock)) {
> +        if (__radeon_fence_process(rdev, fence->ring))
> +            wake_up_all_locked(&rdev->fence_queue);
> +
> +        up_read(&rdev->exclusive_lock);
> +    }
> +
> +    /* did fence get signaled after we enabled the sw irq? */
> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= 
> fence->seq) {
> +        radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> +        return false;
> +    }
> +
> +    fence->fence_wake.flags = 0;
> +    fence->fence_wake.private = NULL;
> +    fence->fence_wake.func = radeon_fence_check_signaled;
> +    __add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> +    fence_get(f);
> +
> +    return true;
> +}
> +
>  /**
>   * radeon_fence_signaled - check if a fence has signaled
>   *
> @@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence 
> *fence)
>      if (!fence) {
>          return true;
>      }
> -    if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
> -        return true;
> -    }
> +
>      if (radeon_fence_seq_signaled(fence->rdev, fence->seq, 
> fence->ring)) {
> -        fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +        int ret;
> +
> +        ret = fence_signal(&fence->base);
> +        if (!ret)
> +            FENCE_TRACE(&fence->base, "signaled from 
> radeon_fence_signaled\n");
>          return true;
>      }
>      return false;
> @@ -283,28 +381,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
>   *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;
> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -319,13 +424,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -337,6 +444,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */
> +        if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -379,14 +492,14 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
>   * radeon_fence_wait - wait for a fence to signal
>   *
>   * @fence: radeon fence object
> - * @intr: use interruptable sleep
> + * @intr: use interruptible sleep
>   *
>   * Wait for the requested fence to signal (all asics).
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
> @@ -398,20 +511,17 @@ int radeon_fence_wait(struct radeon_fence 
> *fence, bool intr)
>      uint64_t seq[RADEON_NUM_RINGS] = {};
>      int r;
>
> -    if (fence == NULL) {
> -        WARN(1, "Querying an invalid fence : %p !\n", fence);
> -        return -EINVAL;
> -    }
> -
> -    seq[fence->ring] = fence->seq;
> -    if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
> +    if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
>          return 0;
>
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r)
> +    seq[fence->ring] = fence->seq;
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
> -
> -    fence->seq = RADEON_FENCE_SIGNALED_SEQ;
> +    }
> +    r = fence_signal(&fence->base);
> +    if (!r)
> +        FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
>      return 0;
>  }
>
> @@ -434,7 +544,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -443,20 +553,21 @@ int radeon_fence_wait_any(struct radeon_device 
> *rdev,
>              continue;
>          }
>
> +        if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
> +            /* already signaled */
> +            return 0;
> +        }
> +
>          seq[i] = fences[i]->seq;
>          ++num_rings;
> -
> -        /* test if something was allready signaled */
> -        if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
> -            return 0;
>      }
>
>      /* nothing to wait for ? */
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -475,6 +586,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -482,7 +594,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -504,8 +619,8 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> @@ -525,7 +640,7 @@ int radeon_fence_wait_empty(struct radeon_device 
> *rdev, int ring)
>   */
>  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
>  {
> -    kref_get(&fence->kref);
> +    fence_get(&fence->base);
>      return fence;
>  }
>
> @@ -541,9 +656,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
>      struct radeon_fence *tmp = *fence;
>
>      *fence = NULL;
> -    if (tmp) {
> -        kref_put(&tmp->kref, radeon_fence_destroy);
> -    }
> +    if (tmp)
> +        fence_put(&tmp->base);
>  }
>
>  /**
> @@ -832,3 +946,51 @@ int radeon_debugfs_fence_init(struct 
> radeon_device *rdev)
>      return 0;
>  #endif
>  }
> +
> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
> timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    u64 target_seq[RADEON_NUM_RINGS] = {};
> +    struct radeon_device *rdev = fence->rdev;
> +    unsigned long r;
> +
> +    target_seq[fence->ring] = fence->seq;
> +
> +    down_read(&rdev->exclusive_lock);
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
> timeout);
> +
> +    if (r > 0 && !fence_signal(&fence->base))
> +        FENCE_TRACE(&fence->base, "signaled from 
> __radeon_fence_wait\n");
> +
> +    up_read(&rdev->exclusive_lock);
> +    return r;
> +
> +}
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> +    return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    switch (fence->ring) {
> +    case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> +    case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> +    case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> +    case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> +    case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> +    case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> +    default: WARN_ON_ONCE(1); return "radeon.unk";
> +    }
> +}
> +
> +static const struct fence_ops radeon_fence_ops = {
> +    .get_driver_name = radeon_fence_get_driver_name,
> +    .get_timeline_name = radeon_fence_get_timeline_name,
> +    .enable_signaling = radeon_fence_enable_signaling,
> +    .signaled = __radeon_fence_signaled,
> +    .wait = __radeon_fence_wait,
> +    .release = NULL,
> +};
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq
  2014-06-02 10:45                                       ` Christian König
@ 2014-06-02 13:14                                         ` Maarten Lankhorst
  2014-06-02 13:27                                             ` Christian König
  2014-06-02 13:16                                           ` Maarten Lankhorst
  1 sibling, 1 reply; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-02 13:14 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
  Splitted out version, I've noticed that I forgot to convert radeon_fence_wait_empty to long r, fixed.
  drivers/gpu/drm/radeon/radeon_fence.c | 60 +++++++++++++++++++++++------------
  1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..bf4bfe65a050 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited = timeout;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,13 +326,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
+		timeout -= waited - r;
+
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  			if (!target_seq[i])
  				continue;
@@ -337,6 +346,12 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
   * radeon_fence_wait - wait for a fence to signal
   *
   * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
   *
   * Wait for the requested fence to signal (all asics).
   * @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
-	int r;
+	long r;
  
  	if (fence == NULL) {
  		WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
  		return 0;
  
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
+	}
  
  	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
  	return 0;
@@ -434,7 +450,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -455,8 +471,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -475,6 +491,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +499,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -498,18 +518,18 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
-	int r;
+	long r;
  
  	seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
-		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n",
+		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%ld)\n",
  			ring, r);
  	}
  	return 0;
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.3 08/16 2/2] drm/radeon: use common fence implementation for fences
@ 2014-06-02 13:16                                           ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-02 13:16 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
Oops, changed unsigned long  in __radeon_fence_wait to long, fixing a subtle bug.

  drivers/gpu/drm/radeon/radeon.h        |  15 +--
  drivers/gpu/drm/radeon/radeon_device.c |  60 ++++++++-
  drivers/gpu/drm/radeon/radeon_fence.c  | 223 +++++++++++++++++++++++++++------
  3 files changed, 248 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8149e7cf4303..32a3f2fe70c5 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
  #include <linux/wait.h>
  #include <linux/list.h>
  #include <linux/kref.h>
+#include <linux/fence.h>
  
  #include <ttm/ttm_bo_api.h>
  #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
  #define RADEONFB_CONN_LIMIT			4
  #define RADEON_BIOS_NUM_SCRATCH			8
  
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
  /* internal ring indices */
  /* r1xx+ has gfx CP ring */
  #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
  };
  
  struct radeon_fence {
+	struct fence base;
+
  	struct radeon_device		*rdev;
-	struct kref			kref;
  	/* protected by radeon_fence.lock */
  	uint64_t			seq;
  	/* RB, DMA, etc. */
  	unsigned			ring;
+
+	wait_queue_t fence_wake;
  };
  
  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2257,6 +2258,7 @@ struct radeon_device {
  	struct radeon_mman		mman;
  	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
  	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
  	struct mutex			ring_lock;
  	struct radeon_ring		ring[RADEON_NUM_RINGS];
  	bool				ib_pool_ready;
@@ -2347,11 +2349,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
  
  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
   * Registers read & write functions.
   */
  #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 14671406212f..9a7d9f63203e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
  	for (i = 0; i < RADEON_NUM_RINGS; i++) {
  		rdev->ring[i].idx = i;
  	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
  
  	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
  		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1566,6 +1567,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
  	return 0;
  }
  
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+	uint32_t mask = 0;
+	int i;
+
+	if (!rdev->ddev->irq_enabled)
+		return mask;
+
+	/*
+	 * increase refcount on sw interrupts for all rings to stop
+	 * enabling interrupts in radeon_fence_enable_signaling during
+	 * gpu reset.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!rdev->ring[i].ready)
+			continue;
+
+		atomic_inc(&rdev->irq.ring_int[i]);
+		mask |= 1 << i;
+	}
+	return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+	unsigned long irqflags;
+	int i;
+
+	if (!mask)
+		return;
+
+	/*
+	 * undo refcount increase, and reset irqs to correct value.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!(mask & (1 << i)))
+			continue;
+
+		atomic_dec(&rdev->irq.ring_int[i]);
+	}
+
+	spin_lock_irqsave(&rdev->irq.lock, irqflags);
+	radeon_irq_set(rdev);
+	spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
  /**
   * radeon_gpu_reset - reset the asic
   *
@@ -1583,6 +1632,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  
  	int i, r;
  	int resched;
+	uint32_t sw_mask;
  
  	down_write(&rdev->exclusive_lock);
  
@@ -1596,6 +1646,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  	radeon_save_bios_scratch_regs(rdev);
  	/* block TTM */
  	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+	sw_mask = radeon_gpu_mask_sw_irq(rdev);
  	radeon_pm_suspend(rdev);
  	radeon_suspend(rdev);
  
@@ -1645,13 +1696,20 @@ retry:
  	radeon_pm_resume(rdev);
  	drm_helper_resume_force_mode(rdev->ddev);
  
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);
  	ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
  	if (r) {
  		/* bad news, how to tell it to userspace ? */
  		dev_info(rdev->dev, "GPU reset failed\n");
  	}
  
-	up_write(&rdev->exclusive_lock);
+	/*
+	 * force all waiters to recheck, some may have been
+	 * added while the exclusive_lock was unavailable
+	 */
+	downgrade_write(&rdev->exclusive_lock);
+	wake_up_all(&rdev->fence_queue);
+	up_read(&rdev->exclusive_lock);
  	return r;
  }
  
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index bf4bfe65a050..ea7a65e564fe 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
  #include "radeon.h"
  #include "radeon_trace.h"
  
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  		      struct radeon_fence **fence,
  		      int ring)
  {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
  	/* we are protected by the ring emission mutex */
  	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
  	if ((*fence) == NULL) {
  		return -ENOMEM;
  	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
  	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
  	radeon_fence_ring_emit(rdev, ring, *fence);
  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
  	return 0;
  }
  
  /**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
   *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
   */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq, last_seq, last_emitted;
  	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
  		}
  	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
  
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
  }
  
  /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
   *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
   *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
   */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
  {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
  }
  
  /**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
  	return false;
  }
  
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		if (__radeon_fence_process(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		up_read(&rdev->exclusive_lock);
+	}
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
  /**
   * radeon_fence_signaled - check if a fence has signaled
   *
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
  	if (!fence) {
  		return true;
  	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
  	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
  		return true;
  	}
  	return false;
@@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	uint64_t seq[RADEON_NUM_RINGS] = {};
  	long r;
  
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
  		return 0;
  
+	seq[fence->ring] = fence->seq;
  	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
  	if (r < 0) {
  		return r;
  	}
  
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
  	return 0;
  }
  
@@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  			continue;
  		}
  
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
  		seq[i] = fences[i]->seq;
  		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
  	}
  
  	/* nothing to wait for ? */
@@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
   */
  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
  {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
  	return fence;
  }
  
@@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
  	struct radeon_fence *tmp = *fence;
  
  	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
  }
  
  /**
@@ -852,3 +947,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  	return 0;
  #endif
  }
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+	struct radeon_device *rdev = fence->rdev;
+	long r;
+
+	target_seq[fence->ring] = fence->seq;
+
+	down_read(&rdev->exclusive_lock);
+	r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+	if (r > 0 && !fence_signal(&fence->base))
+		FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+	up_read(&rdev->exclusive_lock);
+	return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = __radeon_fence_wait,
+	.release = NULL,
+};
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.3 08/16 2/2] drm/radeon: use common fence implementation for fences
@ 2014-06-02 13:16                                           ` Maarten Lankhorst
  0 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-02 13:16 UTC (permalink / raw)
  To: Christian König, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Signed-off-by: Maarten Lankhorst <maarten.lankhorst-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
Oops, changed unsigned long  in __radeon_fence_wait to long, fixing a subtle bug.

  drivers/gpu/drm/radeon/radeon.h        |  15 +--
  drivers/gpu/drm/radeon/radeon_device.c |  60 ++++++++-
  drivers/gpu/drm/radeon/radeon_fence.c  | 223 +++++++++++++++++++++++++++------
  3 files changed, 248 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8149e7cf4303..32a3f2fe70c5 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
  #include <linux/wait.h>
  #include <linux/list.h>
  #include <linux/kref.h>
+#include <linux/fence.h>
  
  #include <ttm/ttm_bo_api.h>
  #include <ttm/ttm_bo_driver.h>
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
  #define RADEONFB_CONN_LIMIT			4
  #define RADEON_BIOS_NUM_SCRATCH			8
  
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
  /* internal ring indices */
  /* r1xx+ has gfx CP ring */
  #define RADEON_RING_TYPE_GFX_INDEX		0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
  };
  
  struct radeon_fence {
+	struct fence base;
+
  	struct radeon_device		*rdev;
-	struct kref			kref;
  	/* protected by radeon_fence.lock */
  	uint64_t			seq;
  	/* RB, DMA, etc. */
  	unsigned			ring;
+
+	wait_queue_t fence_wake;
  };
  
  int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2257,6 +2258,7 @@ struct radeon_device {
  	struct radeon_mman		mman;
  	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
  	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
  	struct mutex			ring_lock;
  	struct radeon_ring		ring[RADEON_NUM_RINGS];
  	bool				ib_pool_ready;
@@ -2347,11 +2349,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 index);
  void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
  
  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
   * Registers read & write functions.
   */
  #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 14671406212f..9a7d9f63203e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
  	for (i = 0; i < RADEON_NUM_RINGS; i++) {
  		rdev->ring[i].idx = i;
  	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
  
  	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
  		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1566,6 +1567,54 @@ int radeon_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
  	return 0;
  }
  
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+	uint32_t mask = 0;
+	int i;
+
+	if (!rdev->ddev->irq_enabled)
+		return mask;
+
+	/*
+	 * increase refcount on sw interrupts for all rings to stop
+	 * enabling interrupts in radeon_fence_enable_signaling during
+	 * gpu reset.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!rdev->ring[i].ready)
+			continue;
+
+		atomic_inc(&rdev->irq.ring_int[i]);
+		mask |= 1 << i;
+	}
+	return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+	unsigned long irqflags;
+	int i;
+
+	if (!mask)
+		return;
+
+	/*
+	 * undo refcount increase, and reset irqs to correct value.
+	 */
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!(mask & (1 << i)))
+			continue;
+
+		atomic_dec(&rdev->irq.ring_int[i]);
+	}
+
+	spin_lock_irqsave(&rdev->irq.lock, irqflags);
+	radeon_irq_set(rdev);
+	spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
  /**
   * radeon_gpu_reset - reset the asic
   *
@@ -1583,6 +1632,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  
  	int i, r;
  	int resched;
+	uint32_t sw_mask;
  
  	down_write(&rdev->exclusive_lock);
  
@@ -1596,6 +1646,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
  	radeon_save_bios_scratch_regs(rdev);
  	/* block TTM */
  	resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+	sw_mask = radeon_gpu_mask_sw_irq(rdev);
  	radeon_pm_suspend(rdev);
  	radeon_suspend(rdev);
  
@@ -1645,13 +1696,20 @@ retry:
  	radeon_pm_resume(rdev);
  	drm_helper_resume_force_mode(rdev->ddev);
  
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);
  	ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
  	if (r) {
  		/* bad news, how to tell it to userspace ? */
  		dev_info(rdev->dev, "GPU reset failed\n");
  	}
  
-	up_write(&rdev->exclusive_lock);
+	/*
+	 * force all waiters to recheck, some may have been
+	 * added while the exclusive_lock was unavailable
+	 */
+	downgrade_write(&rdev->exclusive_lock);
+	wake_up_all(&rdev->fence_queue);
+	up_read(&rdev->exclusive_lock);
  	return r;
  }
  
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index bf4bfe65a050..ea7a65e564fe 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
  #include "radeon.h"
  #include "radeon_trace.h"
  
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  		      struct radeon_fence **fence,
  		      int ring)
  {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
  	/* we are protected by the ring emission mutex */
  	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
  	if ((*fence) == NULL) {
  		return -ENOMEM;
  	}
-	kref_init(&((*fence)->kref));
-	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
  	(*fence)->ring = ring;
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+	(*fence)->rdev = rdev;
+	(*fence)->seq = seq;
  	radeon_fence_ring_emit(rdev, ring, *fence);
  	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
  	return 0;
  }
  
  /**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
   *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
   */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq, last_seq, last_emitted;
  	unsigned count_loop = 0;
@@ -190,23 +224,22 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
  		}
  	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
  
-	if (wake)
-		wake_up_all(&rdev->fence_queue);
+	return wake;
  }
  
  /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
   *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
   *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
   */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
  {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
  }
  
  /**
@@ -237,6 +270,69 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
  	return false;
  }
  
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		if (__radeon_fence_process(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		up_read(&rdev->exclusive_lock);
+	}
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
  /**
   * radeon_fence_signaled - check if a fence has signaled
   *
@@ -250,11 +346,13 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
  	if (!fence) {
  		return true;
  	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
  	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
  		return true;
  	}
  	return false;
@@ -413,21 +511,18 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	uint64_t seq[RADEON_NUM_RINGS] = {};
  	long r;
  
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
-	seq[fence->ring] = fence->seq;
-	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
  		return 0;
  
+	seq[fence->ring] = fence->seq;
  	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
  	if (r < 0) {
  		return r;
  	}
  
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
  	return 0;
  }
  
@@ -459,12 +554,13 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  			continue;
  		}
  
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
+			return 0;
+		}
+
  		seq[i] = fences[i]->seq;
  		++num_rings;
-
-		/* test if something was allready signaled */
-		if (seq[i] == RADEON_FENCE_SIGNALED_SEQ)
-			return 0;
  	}
  
  	/* nothing to wait for ? */
@@ -545,7 +641,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
   */
  struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
  {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
  	return fence;
  }
  
@@ -561,9 +657,8 @@ void radeon_fence_unref(struct radeon_fence **fence)
  	struct radeon_fence *tmp = *fence;
  
  	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
  }
  
  /**
@@ -852,3 +947,51 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
  	return 0;
  #endif
  }
+
+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	u64 target_seq[RADEON_NUM_RINGS] = {};
+	struct radeon_device *rdev = fence->rdev;
+	long r;
+
+	target_seq[fence->ring] = fence->seq;
+
+	down_read(&rdev->exclusive_lock);
+	r = radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, timeout);
+
+	if (r > 0 && !fence_signal(&fence->base))
+		FENCE_TRACE(&fence->base, "signaled from __radeon_fence_wait\n");
+
+	up_read(&rdev->exclusive_lock);
+	return r;
+
+}
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = __radeon_fence_wait,
+	.release = NULL,
+};
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq
@ 2014-06-02 13:27                                             ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-06-02 13:27 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied; +Cc: nouveau, linux-kernel, dri-devel

Am 02.06.2014 15:14, schrieb Maarten Lankhorst:
> This makes it possible to wait for a specific amount of time,
> rather than wait until infinity.
>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
>  Splitted out version, I've noticed that I forgot to convert 
> radeon_fence_wait_empty to long r, fixed.
>  drivers/gpu/drm/radeon/radeon_fence.c | 60 
> +++++++++++++++++++++++------------
>  1 file changed, 40 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index a77b1c13ea43..bf4bfe65a050 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers

Not necessary a hard requirement, but I would like to keep the name 
since it's already long enough.

> *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;

The initialize seems to be unnecessary here.

> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -319,13 +326,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -337,6 +346,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */

Please move the "timeout -= waited..." here, after the "if (unlikely(r < 
0))". It doesn't really matter for the logic, but my feeling says that 
we should check for errors first and then make the calculation.

Apart from that the patch has my rb with those two minor things fixed.

Regards,
Christian.

> + if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
>   * radeon_fence_wait - wait for a fence to signal
>   *
>   * @fence: radeon fence object
> - * @intr: use interruptable sleep
> + * @intr: use interruptible sleep
>   *
>   * Wait for the requested fence to signal (all asics).
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
> @@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> -    int r;
> +    long r;
>
>      if (fence == NULL) {
>          WARN(1, "Querying an invalid fence : %p !\n", fence);
> @@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, 
> bool intr)
>      if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
>          return 0;
>
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r)
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
> +    }
>
>      fence->seq = RADEON_FENCE_SIGNALED_SEQ;
>      return 0;
> @@ -434,7 +450,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -455,8 +471,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -475,6 +491,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -482,7 +499,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -498,18 +518,18 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>  int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> -    int r;
> +    long r;
>
>      seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> -        dev_err(rdev->dev, "error waiting for ring[%d] to become idle 
> (%d)\n",
> +        dev_err(rdev->dev, "error waiting for ring[%d] to become idle 
> (%ld)\n",
>              ring, r);
>      }
>      return 0;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq
@ 2014-06-02 13:27                                             ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2014-06-02 13:27 UTC (permalink / raw)
  To: Maarten Lankhorst, airlied-cv59FeDIM0c
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 02.06.2014 15:14, schrieb Maarten Lankhorst:
> This makes it possible to wait for a specific amount of time,
> rather than wait until infinity.
>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
>  Splitted out version, I've noticed that I forgot to convert 
> radeon_fence_wait_empty to long r, fixed.
>  drivers/gpu/drm/radeon/radeon_fence.c | 60 
> +++++++++++++++++++++++------------
>  1 file changed, 40 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index a77b1c13ea43..bf4bfe65a050 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct 
> radeon_device *rdev, u64 *seq)
>  }
>
>  /**
> - * radeon_fence_wait_seq - wait for a specific sequence numbers
> + * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers

Not necessary a hard requirement, but I would like to keep the name 
since it's already long enough.

> *
>   * @rdev: radeon device pointer
>   * @target_seq: sequence number(s) we want to wait for
>   * @intr: use interruptable sleep
> + * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for 
> infinite wait
>   *
>   * Wait for the requested sequence number(s) to be written by any ring
>   * (all asics).  Sequnce number array is indexed by ring id.
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
>   * (false) sleep when waiting for the sequence number.  Helper function
>   * for radeon_fence_wait_*().
> - * Returns 0 if the sequence number has passed, error for all other 
> cases.
> + * Returns remaining time if the sequence number has passed, 0 when
> + * the wait timeout, or an error for all other cases.
>   * -EDEADLK is returned when a GPU lockup has been detected.
>   */
> -static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 
> *target_seq,
> -                 bool intr)
> +static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
> +                     u64 *target_seq, bool intr,
> +                     long timeout)
>  {
>      uint64_t last_seq[RADEON_NUM_RINGS];
>      bool signaled;
> -    int i, r;
> +    int i;
>
>      while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
> +        long r, waited = timeout;

The initialize seems to be unnecessary here.

> +
> +        waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
> +             timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
>
>          /* Save current sequence values, used to check for GPU 
> lockups */
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> @@ -319,13 +326,15 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (intr) {
>              r = wait_event_interruptible_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          } else {
>              r = wait_event_timeout(rdev->fence_queue, (
>                  (signaled = radeon_fence_any_seq_signaled(rdev, 
> target_seq))
> -                 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
> +                 || rdev->needs_reset), waited);
>          }
>
> +        timeout -= waited - r;
> +
>          for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>              if (!target_seq[i])
>                  continue;
> @@ -337,6 +346,12 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>          if (unlikely(r < 0))
>              return r;
>
> +        /*
> +         * If this is a timed wait and the wait completely timed out 
> just return.
> +         */

Please move the "timeout -= waited..." here, after the "if (unlikely(r < 
0))". It doesn't really matter for the logic, but my feeling says that 
we should check for errors first and then make the calculation.

Apart from that the patch has my rb with those two minor things fixed.

Regards,
Christian.

> + if (!timeout)
> +            break;
> +
>          if (unlikely(!signaled)) {
>              if (rdev->needs_reset)
>                  return -EDEADLK;
> @@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>              }
>          }
>      }
> -    return 0;
> +    return timeout;
>  }
>
>  /**
>   * radeon_fence_wait - wait for a fence to signal
>   *
>   * @fence: radeon fence object
> - * @intr: use interruptable sleep
> + * @intr: use interruptible sleep
>   *
>   * Wait for the requested fence to signal (all asics).
>   * @intr selects whether to use interruptable (true) or 
> non-interruptable
> @@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct 
> radeon_device *rdev, u64 *target_seq,
>  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> -    int r;
> +    long r;
>
>      if (fence == NULL) {
>          WARN(1, "Querying an invalid fence : %p !\n", fence);
> @@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, 
> bool intr)
>      if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
>          return 0;
>
> -    r = radeon_fence_wait_seq(fence->rdev, seq, intr);
> -    if (r)
> +    r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
> +    }
>
>      fence->seq = RADEON_FENCE_SIGNALED_SEQ;
>      return 0;
> @@ -434,7 +450,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  {
>      uint64_t seq[RADEON_NUM_RINGS];
>      unsigned i, num_rings = 0;
> -    int r;
> +    long r;
>
>      for (i = 0; i < RADEON_NUM_RINGS; ++i) {
>          seq[i] = 0;
> @@ -455,8 +471,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>      if (num_rings == 0)
>          return -ENOENT;
>
> -    r = radeon_fence_wait_seq(rdev, seq, intr);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, intr, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          return r;
>      }
>      return 0;
> @@ -475,6 +491,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
>  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> +    long r;
>
>      seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
>      if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
> @@ -482,7 +499,10 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>             already the last emited fence */
>          return -ENOENT;
>      }
> -    return radeon_fence_wait_seq(rdev, seq, false);
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0)
> +        return r;
> +    return 0;
>  }
>
>  /**
> @@ -498,18 +518,18 @@ int radeon_fence_wait_next(struct radeon_device 
> *rdev, int ring)
>  int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
>  {
>      uint64_t seq[RADEON_NUM_RINGS] = {};
> -    int r;
> +    long r;
>
>      seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
>      if (!seq[ring])
>          return 0;
>
> -    r = radeon_fence_wait_seq(rdev, seq, false);
> -    if (r) {
> +    r = radeon_fence_wait_seq_timeout(rdev, seq, false, 
> MAX_SCHEDULE_TIMEOUT);
> +    if (r < 0) {
>          if (r == -EDEADLK)
>              return -EDEADLK;
>
> -        dev_err(rdev->dev, "error waiting for ring[%d] to become idle 
> (%d)\n",
> +        dev_err(rdev->dev, "error waiting for ring[%d] to become idle 
> (%ld)\n",
>              ring, r);
>      }
>      return 0;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v1.4 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq
  2014-06-02 13:27                                             ` Christian König
  (?)
@ 2014-06-03  7:50                                             ` Maarten Lankhorst
  -1 siblings, 0 replies; 50+ messages in thread
From: Maarten Lankhorst @ 2014-06-03  7:50 UTC (permalink / raw)
  To: Christian König, airlied; +Cc: nouveau, linux-kernel, dri-devel

This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <deathsimple@vodafone.de>
---
  drivers/gpu/drm/radeon/radeon_fence.c | 60 +++++++++++++++++++++++------------
  1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..b25b14231421 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
  }
  
  /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
   *
   * @rdev: radeon device pointer
   * @target_seq: sequence number(s) we want to wait for
   * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
   *
   * Wait for the requested sequence number(s) to be written by any ring
   * (all asics).  Sequnce number array is indexed by ring id.
   * @intr selects whether to use interruptable (true) or non-interruptable
   * (false) sleep when waiting for the sequence number.  Helper function
   * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
   * -EDEADLK is returned when a GPU lockup has been detected.
   */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-				 bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+					 u64 *target_seq, bool intr,
+					 long timeout)
  {
  	uint64_t last_seq[RADEON_NUM_RINGS];
  	bool signaled;
-	int i, r;
+	int i;
  
  	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		long r, waited;
+
+		waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+			 timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
  
  		/* Save current sequence values, used to check for GPU lockups */
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,11 +326,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (intr) {
  			r = wait_event_interruptible_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		} else {
  			r = wait_event_timeout(rdev->fence_queue, (
  				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq))
-				 || rdev->needs_reset), RADEON_FENCE_JIFFIES_TIMEOUT);
+				 || rdev->needs_reset), waited);
  		}
  
  		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -337,6 +344,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  		if (unlikely(r < 0))
  			return r;
  
+		timeout -= waited - r;
+
+		/*
+		 * If this is a timed wait and the wait completely timed out just return.
+		 */
+		if (!timeout)
+			break;
+
  		if (unlikely(!signaled)) {
  			if (rdev->needs_reset)
  				return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  			}
  		}
  	}
-	return 0;
+	return timeout;
  }
  
  /**
   * radeon_fence_wait - wait for a fence to signal
   *
   * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
   *
   * Wait for the requested fence to signal (all asics).
   * @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
-	int r;
+	long r;
  
  	if (fence == NULL) {
  		WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
  	if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
  		return 0;
  
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-	if (r)
+	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
+	}
  
  	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
  	return 0;
@@ -434,7 +450,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  {
  	uint64_t seq[RADEON_NUM_RINGS];
  	unsigned i, num_rings = 0;
-	int r;
+	long r;
  
  	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
  		seq[i] = 0;
@@ -455,8 +471,8 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  	if (num_rings == 0)
  		return -ENOENT;
  
-	r = radeon_fence_wait_seq(rdev, seq, intr);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		return r;
  	}
  	return 0;
@@ -475,6 +491,7 @@ int radeon_fence_wait_any(struct radeon_device *rdev,
  int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
+	long r;
  
  	seq[ring] = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
  	if (seq[ring] >= rdev->fence_drv[ring].sync_seq[ring]) {
@@ -482,7 +499,10 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  		   already the last emited fence */
  		return -ENOENT;
  	}
-	return radeon_fence_wait_seq(rdev, seq, false);
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0)
+		return r;
+	return 0;
  }
  
  /**
@@ -498,18 +518,18 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
  int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  {
  	uint64_t seq[RADEON_NUM_RINGS] = {};
-	int r;
+	long r;
  
  	seq[ring] = rdev->fence_drv[ring].sync_seq[ring];
  	if (!seq[ring])
  		return 0;
  
-	r = radeon_fence_wait_seq(rdev, seq, false);
-	if (r) {
+	r = radeon_fence_wait_seq_timeout(rdev, seq, false, MAX_SCHEDULE_TIMEOUT);
+	if (r < 0) {
  		if (r == -EDEADLK)
  			return -EDEADLK;
  
-		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%d)\n",
+		dev_err(rdev->dev, "error waiting for ring[%d] to become idle (%ld)\n",
  			ring, r);
  	}
  	return 0;
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-06-03  7:50 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14 14:57 [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 02/16] drm/ttm: kill off some members to ttm_validate_buffer Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 03/16] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 04/16] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 05/16] drm/ttm: call ttm_bo_wait while inside a reservation Maarten Lankhorst
2014-05-14 14:57 ` [RFC PATCH v1 06/16] drm/ttm: kill fence_lock Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 07/16] drm/nouveau: rework to new fence interface Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences Maarten Lankhorst
2014-05-14 15:29   ` Christian König
2014-05-15  1:06     ` Maarten Lankhorst
2014-05-15  1:06       ` Maarten Lankhorst
2014-05-15  9:21       ` Christian König
2014-05-15  9:21         ` Christian König
2014-05-15  9:38         ` Maarten Lankhorst
2014-05-15  9:42           ` Christian König
2014-05-15  9:42             ` Christian König
2014-05-15 13:04             ` Maarten Lankhorst
2014-05-15 13:19               ` Christian König
2014-05-15 13:19                 ` Christian König
2014-05-15 14:18                 ` Maarten Lankhorst
2014-05-15 14:18                   ` Maarten Lankhorst
2014-05-15 15:48                   ` Christian König
2014-05-15 15:48                     ` Christian König
2014-05-15 15:58                     ` Maarten Lankhorst
2014-05-15 16:13                       ` Christian König
2014-05-19  8:00                         ` Maarten Lankhorst
2014-05-19  8:27                           ` Christian König
2014-05-19 10:10                             ` Maarten Lankhorst
2014-05-19 12:30                               ` Christian König
2014-05-19 13:35                                 ` Maarten Lankhorst
2014-05-19 13:35                                   ` Maarten Lankhorst
2014-05-19 14:25                                   ` Christian König
2014-06-02 10:09                                     ` [RFC PATCH v1.2 " Maarten Lankhorst
2014-06-02 10:09                                       ` Maarten Lankhorst
2014-06-02 10:45                                       ` Christian König
2014-06-02 13:14                                         ` [RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq Maarten Lankhorst
2014-06-02 13:27                                           ` Christian König
2014-06-02 13:27                                             ` Christian König
2014-06-03  7:50                                             ` [RFC PATCH v1.4 " Maarten Lankhorst
2014-06-02 13:16                                         ` [RFC PATCH v1.3 08/16 2/2] drm/radeon: use common fence implementation for fences Maarten Lankhorst
2014-06-02 13:16                                           ` Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 09/16] drm/qxl: rework to new fence interface Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 10/16] drm/vmwgfx: get rid of different types of fence_flags entirely Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 11/16] drm/vmwgfx: rework to new fence interface Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 12/16] drm/ttm: flip the switch, and convert to dma_fence Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 13/16] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 14/16] drm/radeon: use rcu waits in some ioctls Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 15/16] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab Maarten Lankhorst
2014-05-14 14:58 ` [RFC PATCH v1 16/16] drm/ttm: use rcu in core ttm Maarten Lankhorst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.