Put RCU request lookup to use

All of lore.kernel.org
 help / color / mirror / Atom feed

* Put RCU request lookup to use
@ 2016-08-01 18:22 Chris Wilson
  2016-08-01 18:22 ` [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
                   ` (16 more replies)
  0 siblings, 17 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

Having introduced a means for performing lockless request lookup, put it
to use and implement lockless waiting and friends. The goal is to try
and avoid holding onto the struct_mutex and avoid it entirely if
possible as execbuf likes to hog it.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-03 11:41   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

It is useful to be able to wait on pending rendering without grabbing
the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
primitive to acquire a reference to the pending request without
requiring struct_mutex, just the RCU read lock, and then call
i915_wait_request().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.h | 36 +++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 48382ac401fd..d077b023a89f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -546,6 +546,42 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
 }
 
 /**
+ * i915_gem_active_wait_unlocked - waits until the request is completed
+ * @active - the active request on which to wait
+ * @interruptible - whether the wait can be woken by a userspace signal
+ * @timeout - how long to wait at most
+ * @rps - userspace client to charge for a waitboost
+ *
+ * i915_gem_active_wait_unlocked() waits until the request is completed before
+ * returning, without requiring any locks to be held. Note that it does not
+ * retire any requests before returning.
+ *
+ * This function wraps i915_wait_request(), see it for the full details.
+ *
+ * Returns 0 if successful, or a negative error code.
+ */
+static inline int
+i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
+			      bool interruptible,
+			      s64 *timeout,
+			      struct intel_rps_client *rps)
+{
+	struct drm_i915_gem_request *request;
+	int ret = 0;
+
+	rcu_read_lock();
+	request = i915_gem_active_get_rcu(active);
+	rcu_read_unlock();
+
+	if (request) {
+		ret = i915_wait_request(request, interruptible, timeout, rps);
+		i915_gem_request_put(request);
+	}
+
+	return ret;
+}
+
+/**
  * i915_gem_active_retire - waits until the request is retired
  * @active - the active request on which to wait
  *
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
  2016-08-01 18:22 ` [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-03 13:23   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 03/16] drm/i915: Convert non-blocking userptr " Chris Wilson
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 112 +++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d8d3e728230d..f164ad482bdc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -346,24 +346,20 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-/* A nonblocking variant of the above wait. This is a highly dangerous routine
- * as the object state may change during this call.
+/* A nonblocking variant of the above wait. Must be called prior to
+ * acquiring the mutex for the object, as the object state may change
+ * during this call. A reference must be held by the caller for the object.
  */
 static __must_check int
-i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
-					    struct intel_rps_client *rps,
-					    bool readonly)
+__unsafe_wait_rendering(struct drm_i915_gem_object *obj,
+			struct intel_rps_client *rps,
+			bool readonly)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
 	struct i915_gem_active *active;
 	unsigned long active_mask;
-	int ret, i, n = 0;
-
-	lockdep_assert_held(&dev->struct_mutex);
-	GEM_BUG_ON(!to_i915(dev)->mm.interruptible);
+	int idx;
 
-	active_mask = i915_gem_object_get_active(obj);
+	active_mask = __I915_BO_ACTIVE(obj);
 	if (!active_mask)
 		return 0;
 
@@ -374,25 +370,16 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 		active = &obj->last_write;
 	}
 
-	for_each_active(active_mask, i) {
-		struct drm_i915_gem_request *req;
+	for_each_active(active_mask, idx) {
+		int ret;
 
-		req = i915_gem_active_get(&active[i],
-					  &obj->base.dev->struct_mutex);
-		if (req)
-			requests[n++] = req;
+		ret = i915_gem_active_wait_unlocked(&active[idx],
+						    true, NULL, rps);
+		if (ret)
+			return ret;
 	}
 
-	mutex_unlock(&dev->struct_mutex);
-	ret = 0;
-	for (i = 0; ret == 0 && i < n; i++)
-		ret = i915_wait_request(requests[i], true, NULL, rps);
-	mutex_lock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++)
-		i915_gem_request_put(requests[i]);
-
-	return ret;
+	return 0;
 }
 
 static struct intel_rps_client *to_rps_client(struct drm_file *file)
@@ -1461,10 +1448,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	int ret;
 
 	/* Only handle setting domains to types used by the CPU. */
-	if (write_domain & I915_GEM_GPU_DOMAINS)
-		return -EINVAL;
-
-	if (read_domains & I915_GEM_GPU_DOMAINS)
+	if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
 		return -EINVAL;
 
 	/* Having something in the write domain implies it's in the read
@@ -1473,25 +1457,21 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	if (write_domain != 0 && read_domains != write_domain)
 		return -EINVAL;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = i915_gem_object_lookup(file, args->handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (!obj)
+		return -ENOENT;
 
 	/* Try to flush the object off the GPU without holding the lock.
 	 * We will repeat the flush holding the lock in the normal manner
 	 * to catch cases where we are gazumped.
 	 */
-	ret = i915_gem_object_wait_rendering__nonblocking(obj,
-							  to_rps_client(file),
-							  !write_domain);
+	ret = __unsafe_wait_rendering(obj, to_rps_client(file), !write_domain);
 	if (ret)
-		goto unref;
+		goto out_unlocked;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		goto out_unlocked;
 
 	if (read_domains & I915_GEM_DOMAIN_GTT)
 		ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
@@ -1501,11 +1481,13 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	if (write_domain != 0)
 		intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
 
-unref:
 	i915_gem_object_put(obj);
-unlock:
 	mutex_unlock(&dev->struct_mutex);
 	return ret;
+
+out_unlocked:
+	i915_gem_object_put_unlocked(obj);
+	return ret;
 }
 
 /**
@@ -1647,6 +1629,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	int ret = 0;
 	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
 
+	/* Try to flush the object off the GPU first without holding the lock.
+	 * Upon acquiring the lock, we will perform our sanity checks and then
+	 * repeat the flush holding the lock in the normal manner to catch cases
+	 * where we are gazumped.
+	 */
+	ret = __unsafe_wait_rendering(obj, NULL, !write);
+	if (ret)
+		goto err;
+
 	intel_runtime_pm_get(dev_priv);
 
 	/* We don't use vmf->pgoff since that has the fake offset */
@@ -1655,23 +1646,14 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto out;
+		goto err_rpm;
 
 	trace_i915_gem_object_fault(obj, page_offset, true, write);
 
-	/* Try to flush the object off the GPU first without holding the lock.
-	 * Upon reacquiring the lock, we will perform our sanity checks and then
-	 * repeat the flush holding the lock in the normal manner to catch cases
-	 * where we are gazumped.
-	 */
-	ret = i915_gem_object_wait_rendering__nonblocking(obj, NULL, !write);
-	if (ret)
-		goto unlock;
-
 	/* Access to snoopable pages through the GTT is incoherent. */
 	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev)) {
 		ret = -EFAULT;
-		goto unlock;
+		goto err_unlock;
 	}
 
 	/* Use a partial view if the object is bigger than the aperture. */
@@ -1692,15 +1674,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	/* Now pin it into the GTT if needed */
 	ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
 	if (ret)
-		goto unlock;
+		goto err_unlock;
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, write);
 	if (ret)
-		goto unpin;
+		goto err_unpin;
 
 	ret = i915_gem_object_get_fence(obj);
 	if (ret)
-		goto unpin;
+		goto err_unpin;
 
 	/* Finally, remap it using the new GTT offset */
 	pfn = ggtt->mappable_base +
@@ -1745,11 +1727,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 					    (unsigned long)vmf->virtual_address,
 					    pfn + page_offset);
 	}
-unpin:
+err_unpin:
 	i915_gem_object_ggtt_unpin_view(obj, &view);
-unlock:
+err_unlock:
 	mutex_unlock(&dev->struct_mutex);
-out:
+err_rpm:
+	intel_runtime_pm_put(dev_priv);
+err:
 	switch (ret) {
 	case -EIO:
 		/*
@@ -1790,8 +1774,6 @@ out:
 		ret = VM_FAULT_SIGBUS;
 		break;
 	}
-
-	intel_runtime_pm_put(dev_priv);
 	return ret;
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/16] drm/i915: Convert non-blocking userptr waits for requests over to using RCU
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
  2016-08-01 18:22 ` [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
  2016-08-01 18:22 ` [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-03 13:27   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 34 +++++++--------------------------
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 53f64fcc89ef..96ab6161903a 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -63,32 +63,12 @@ struct i915_mmu_object {
 
 static void wait_rendering(struct drm_i915_gem_object *obj)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
-	int i, n;
-
-	if (!i915_gem_object_is_active(obj))
-		return;
-
-	n = 0;
-	for (i = 0; i < I915_NUM_ENGINES; i++) {
-		struct drm_i915_gem_request *req;
+	unsigned long active = __I915_BO_ACTIVE(obj);
+	int idx;
 
-		req = i915_gem_active_get(&obj->last_read[i],
-					  &obj->base.dev->struct_mutex);
-		if (req)
-			requests[n++] = req;
-	}
-
-	mutex_unlock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++)
-		i915_wait_request(requests[i], false, NULL, NULL);
-
-	mutex_lock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++)
-		i915_gem_request_put(requests[i]);
+	for_each_active(active, idx)
+		i915_gem_active_wait_unlocked(&obj->last_read[idx],
+					      false, NULL, NULL);
 }
 
 static void cancel_userptr(struct work_struct *work)
@@ -97,6 +77,8 @@ static void cancel_userptr(struct work_struct *work)
 	struct drm_i915_gem_object *obj = mo->obj;
 	struct drm_device *dev = obj->base.dev;
 
+	wait_rendering(obj);
+
 	mutex_lock(&dev->struct_mutex);
 	/* Cancel any active worker and force us to re-evaluate gup */
 	obj->userptr.work = NULL;
@@ -105,8 +87,6 @@ static void cancel_userptr(struct work_struct *work)
 		struct drm_i915_private *dev_priv = to_i915(dev);
 		bool was_interruptible;
 
-		wait_rendering(obj);
-
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (2 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 03/16] drm/i915: Convert non-blocking userptr " Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-03 13:43   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

Inside the kthread context, we can't be interrupted by signals so
touching the mm.interruptible flag is pointless and wait-request now
consumes EIO itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 96ab6161903a..57218cca7e05 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -84,16 +84,9 @@ static void cancel_userptr(struct work_struct *work)
 	obj->userptr.work = NULL;
 
 	if (obj->pages != NULL) {
-		struct drm_i915_private *dev_priv = to_i915(dev);
-		bool was_interruptible;
-
-		was_interruptible = dev_priv->mm.interruptible;
-		dev_priv->mm.interruptible = false;
-
+		/* We are inside a kthread context and can't be interrupted */
 		WARN_ON(i915_gem_object_unbind(obj));
 		WARN_ON(i915_gem_object_put_pages(obj));
-
-		dev_priv->mm.interruptible = was_interruptible;
 	}
 
 	i915_gem_object_put(obj);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (3 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-01 19:28   ` Chris Wilson
  2016-08-01 18:22 ` [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 37 ++++++++++-----------------------
 drivers/gpu/drm/i915/i915_gem_evict.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_request.c |  5 +++--
 drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/i915_irq.c         |  3 +--
 drivers/gpu/drm/i915/intel_engine_cs.c  |  8 ++++++-
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 16 +++-----------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 24 +++++++++++----------
 10 files changed, 52 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f164ad482bdc..8ac9605d5125 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2432,13 +2432,18 @@ static void i915_gem_reset_engine_status(struct intel_engine_cs *engine)
 
 static void i915_gem_reset_engine_cleanup(struct intel_engine_cs *engine)
 {
+	struct drm_i915_gem_request *request;
 	struct intel_ring *ring;
 
+	request = i915_gem_active_peek(&engine->last_request,
+				       &engine->i915->drm.struct_mutex);
+
 	/* Mark all pending requests as complete so that any concurrent
 	 * (lockless) lookup doesn't try and wait upon the request as we
 	 * reset it.
 	 */
-	intel_engine_init_seqno(engine, engine->last_submitted_seqno);
+	if (request)
+		intel_engine_init_seqno(engine, request->fence.seqno);
 
 	/*
 	 * Clear the execlists queue up before freeing the requests, as those
@@ -2460,15 +2465,9 @@ static void i915_gem_reset_engine_cleanup(struct intel_engine_cs *engine)
 	 * implicit references on things like e.g. ppgtt address spaces through
 	 * the request.
 	 */
-	if (!list_empty(&engine->request_list)) {
-		struct drm_i915_gem_request *request;
-
-		request = list_last_entry(&engine->request_list,
-					  struct drm_i915_gem_request,
-					  link);
-
+	if (request)
 		i915_gem_request_retire_upto(request);
-	}
+	GEM_BUG_ON(intel_engine_is_active(engine));
 
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
@@ -2896,8 +2895,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 	int ret;
 
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
 	for_each_engine(engine, dev_priv) {
 		if (engine->last_context == NULL)
 			continue;
@@ -4069,21 +4066,10 @@ struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
 	return NULL;
 }
 
-static void
-i915_gem_stop_engines(struct drm_device *dev)
+int i915_gem_suspend(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct intel_engine_cs *engine;
-
-	for_each_engine(engine, dev_priv)
-		dev_priv->gt.stop_engine(engine);
-}
-
-int
-i915_gem_suspend(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	int ret = 0;
+	int ret;
 
 	intel_suspend_gt_powersave(dev_priv);
 
@@ -4112,7 +4098,6 @@ i915_gem_suspend(struct drm_device *dev)
 	 * and similar for all logical context images (to ensure they are
 	 * all ready for hibernation).
 	 */
-	i915_gem_stop_engines(dev);
 	i915_gem_context_lost(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
@@ -4128,7 +4113,7 @@ i915_gem_suspend(struct drm_device *dev)
 	return 0;
 
 err:
-	mutex_unlock(&dev->struct_mutex);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 7be425826539..624c0f016c84 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -39,7 +39,7 @@ gpu_is_idle(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 
 	for_each_engine(engine, dev_priv) {
-		if (!list_empty(&engine->request_list))
+		if (intel_engine_is_active(engine))
 			return false;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 3fecb8f0e041..8289d31c0ef5 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -486,7 +486,8 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->emitted_jiffies = jiffies;
 	request->previous_seqno = engine->last_submitted_seqno;
-	smp_store_mb(engine->last_submitted_seqno, request->fence.seqno);
+	engine->last_submitted_seqno = request->fence.seqno;
+	i915_gem_active_set(&engine->last_request, request);
 	list_add_tail(&request->link, &engine->request_list);
 	list_add_tail(&request->ring_link, &ring->request_list);
 
@@ -757,7 +758,7 @@ void i915_gem_retire_requests(struct drm_i915_private *dev_priv)
 
 	for_each_engine(engine, dev_priv) {
 		engine_retire_requests(engine);
-		if (list_empty(&engine->request_list))
+		if (!intel_engine_is_active(engine))
 			dev_priv->gt.active_engines &= ~intel_engine_flag(engine);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index d077b023a89f..3d7c63dec5b3 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -29,6 +29,17 @@
 
 #include "i915_gem.h"
 
+struct intel_wait {
+	struct rb_node node;
+	struct task_struct *tsk;
+	u32 seqno;
+};
+
+struct intel_signal_node {
+	struct rb_node node;
+	struct intel_wait wait;
+};
+
 /**
  * Request queue structure.
  *
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index cc28ad429dd8..0edae7d0207d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1002,7 +1002,7 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
-	ee->last_seqno = engine->last_submitted_seqno;
+	ee->last_seqno = __active_get_seqno(&engine->last_request);
 	ee->start = I915_READ_START(engine);
 	ee->head = I915_READ_HEAD(engine);
 	ee->tail = I915_READ_TAIL(engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e58650096426..e35af9ba1546 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2807,8 +2807,7 @@ static void gen8_disable_vblank(struct drm_device *dev, unsigned int pipe)
 static bool
 ring_idle(struct intel_engine_cs *engine, u32 seqno)
 {
-	return i915_seqno_passed(seqno,
-				 READ_ONCE(engine->last_submitted_seqno));
+	return !intel_engine_is_active(engine);
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 6665068e583c..b2bcd468028d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -166,6 +166,12 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
 }
 
+static void intel_engine_init_requests(struct intel_engine_cs *engine)
+{
+	init_request_active(&engine->last_request, NULL);
+	INIT_LIST_HEAD(&engine->request_list);
+}
+
 /**
  * intel_engines_setup_common - setup engine state not requiring hw access
  * @engine: Engine to setup.
@@ -177,13 +183,13 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
  */
 void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
-	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->buffers);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	spin_lock_init(&engine->execlist_lock);
 
 	engine->fence_context = fence_context_alloc(1);
 
+	intel_engine_init_requests(engine);
 	intel_engine_init_hangcheck(engine);
 	i915_gem_batch_pool_init(engine, &engine->batch_pool);
 }
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 1ac32428d4db..d8a41c7acb3d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6328,7 +6328,7 @@ bool i915_gpu_busy(void)
 	dev_priv = i915_mch_dev;
 
 	for_each_engine(engine, dev_priv)
-		ret |= !list_empty(&engine->request_list);
+		ret |= intel_engine_is_active(engine);
 
 out_unlock:
 	spin_unlock_irq(&mchdev_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cb1131d3a9d0..7f6633b03916 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2232,20 +2232,10 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
 
 int intel_engine_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_request *req;
-
 	/* Wait upon the last request to be completed */
-	if (list_empty(&engine->request_list))
-		return 0;
-
-	req = list_entry(engine->request_list.prev,
-			 struct drm_i915_gem_request,
-			 link);
-
-	/* Make sure we do not trigger any retires */
-	return i915_wait_request(req,
-				 req->i915->mm.interruptible,
-				 NULL, NULL);
+	return i915_gem_active_wait_unlocked(&engine->last_request,
+					     engine->i915->mm.interruptible,
+					     NULL, NULL);
 }
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6c240356cccb..903c06ef6fff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -3,6 +3,7 @@
 
 #include <linux/hashtable.h>
 #include "i915_gem_batch_pool.h"
+#include "i915_gem_request.h"
 
 #define I915_CMD_HASH_ORDER 9
 
@@ -307,6 +308,13 @@ struct intel_engine_cs {
 	 */
 	u32 last_submitted_seqno;
 
+	/* An RCU guarded pointer to the last request. No reference is
+	 * held to the request, users must carefully acquire a reference to
+	 * the request using i915_gem_active_get_request_rcu(), or hold the
+	 * struct_mutex.
+	 */
+	struct i915_gem_active last_request;
+
 	struct i915_gem_context *last_context;
 
 	struct intel_engine_hangcheck hangcheck;
@@ -503,17 +511,6 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 }
 
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-struct intel_wait {
-	struct rb_node node;
-	struct task_struct *tsk;
-	u32 seqno;
-};
-
-struct intel_signal_node {
-	struct rb_node node;
-	struct intel_wait wait;
-};
-
 int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
 
 static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
@@ -560,4 +557,9 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned int intel_kick_waiters(struct drm_i915_private *i915);
 unsigned int intel_kick_signalers(struct drm_i915_private *i915);
 
+static inline bool intel_engine_is_active(struct intel_engine_cs *engine)
+{
+	return __i915_gem_active_is_busy(&engine->last_request);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (4 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04  6:46   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

We can now wait for the GPU (all engines) to become idle without
requiring the struct_mutex. Inside the shrinker, we need to currently
take the struct_mutex in order to purge objects and to purge the objects
we need the GPU to be idle - causing a stall whilst we hold the
struct_mutex. We can hide most of that stall by performing the wait
before taking the struct_mutex and only doing essential waits for
new rendering on objects to be freed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 1341cb55b6f1..43e53e419982 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -326,9 +326,14 @@ i915_gem_shrinker_lock_uninterruptible(struct drm_i915_private *dev_priv,
 	unsigned long timeout = msecs_to_jiffies(timeout_ms) + 1;
 
 	while (!i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock)) {
+		if (i915_gem_wait_for_idle(dev_priv) == 0 &&
+		    i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock))
+			break;
+
 		schedule_timeout_killable(1);
 		if (fatal_signal_pending(current))
 			return false;
+
 		if (--timeout == 0) {
 			pr_err("Unable to lock GPU to purge memory.\n");
 			return false;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (5 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04  7:25   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal Chris Wilson
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.

One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 65 +++++++++++++----------------------------
 1 file changed, 20 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8ac9605d5125..0c5d2872649c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1891,35 +1891,27 @@ u64 i915_gem_get_ggtt_alignment(struct drm_i915_private *dev_priv, u64 size,
 
 static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	struct drm_device *dev = obj->base.dev;
 	int ret;
 
-	dev_priv->mm.shrinker_no_lock_stealing = true;
-
 	ret = drm_gem_create_mmap_offset(&obj->base);
-	if (ret != -ENOSPC)
-		goto out;
+	if (ret == 0)
+		return 0;
 
-	/* Badly fragmented mmap space? The only way we can recover
-	 * space is by destroying unwanted objects. We can't randomly release
-	 * mmap_offsets as userspace expects them to be persistent for the
-	 * lifetime of the objects. The closest we can is to release the
-	 * offsets on purgeable objects by truncating it and marking it purged,
-	 * which prevents userspace from ever using that object again.
+	/* We can idle the GPU locklessly to flush stale objects, but in order
+	 * to claim that space for ourselves, we need to take the big
+	 * struct_mutex to free the requests+objects and allocate our slot.
 	 */
-	i915_gem_shrink(dev_priv,
-			obj->base.size >> PAGE_SHIFT,
-			I915_SHRINK_BOUND |
-			I915_SHRINK_UNBOUND |
-			I915_SHRINK_PURGEABLE);
-	ret = drm_gem_create_mmap_offset(&obj->base);
-	if (ret != -ENOSPC)
-		goto out;
+	ret = i915_gem_wait_for_idle(to_i915(dev));
+	if (ret)
+		return ret;
 
-	i915_gem_shrink_all(dev_priv);
-	ret = drm_gem_create_mmap_offset(&obj->base);
-out:
-	dev_priv->mm.shrinker_no_lock_stealing = false;
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret == 0) {
+		i915_gem_retire_requests(to_i915(dev));
+		ret = drm_gem_create_mmap_offset(&obj->base);
+		mutex_unlock(&dev->struct_mutex);
+	}
 
 	return ret;
 }
@@ -1938,32 +1930,15 @@ i915_gem_mmap_gtt(struct drm_file *file,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = i915_gem_object_lookup(file, handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
-
-	if (obj->madv != I915_MADV_WILLNEED) {
-		DRM_DEBUG("Attempting to mmap a purgeable buffer\n");
-		ret = -EFAULT;
-		goto out;
-	}
+	if (!obj)
+		return -ENOENT;
 
 	ret = i915_gem_object_create_mmap_offset(obj);
-	if (ret)
-		goto out;
+	if (ret == 0)
+		*offset = drm_vma_node_offset_addr(&obj->base.vma_node);
 
-	*offset = drm_vma_node_offset_addr(&obj->base.vma_node);
-
-out:
-	i915_gem_object_put(obj);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
+	i915_gem_object_put_unlocked(obj);
 	return ret;
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (6 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04  7:26   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

After removing the user of this wart, we can remove the wart entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 1 -
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index db5dc5bd78d8..d09c87fb207d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1318,7 +1318,6 @@ struct i915_gem_mm {
 	struct notifier_block oom_notifier;
 	struct notifier_block vmap_notifier;
 	struct shrinker shrinker;
-	bool shrinker_no_lock_stealing;
 
 	/** LRU list of objects with fence regs on them. */
 	struct list_head fence_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 43e53e419982..7a604813226d 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -244,9 +244,6 @@ static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
 		if (!mutex_is_locked_by(&dev->struct_mutex, current))
 			return false;
 
-		if (to_i915(dev)->mm.shrinker_no_lock_stealing)
-			return false;
-
 		*unlock = false;
 	} else
 		*unlock = true;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (7 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04  7:53   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upn
completion, we will reacquire the mutex and only then start the
operation (i.e. we do not rely on state from before dropping the mutex).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 62 +++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0c5d2872649c..a19bbcf0c19b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -953,25 +953,26 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		       args->size))
 		return -EFAULT;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = i915_gem_object_lookup(file, args->handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (!obj)
+		return -ENOENT;
 
 	/* Bounds check source.  */
 	if (args->offset > obj->base.size ||
 	    args->size > obj->base.size - args->offset) {
 		ret = -EINVAL;
-		goto out;
+		goto out_unlocked;
 	}
 
-	trace_i915_gem_object_pread(obj, args->offset, args->size);
+	ret = __unsafe_wait_rendering(obj, to_rps_client(file), true);
+	if (ret)
+		goto out_unlocked;
 
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		goto out_unlocked;
+
+	trace_i915_gem_object_pread(obj, args->offset, args->size);
 	ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 	/* pread for non shmem backed objects */
@@ -979,10 +980,13 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		ret = i915_gem_gtt_pread(dev, obj, args->size,
 					args->offset, args->data_ptr);
 
-out:
 	i915_gem_object_put(obj);
-unlock:
 	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+
+out_unlocked:
+	i915_gem_object_put_unlocked(obj);
 	return ret;
 }
 
@@ -1368,27 +1372,28 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 			return -EFAULT;
 	}
 
-	intel_runtime_pm_get(dev_priv);
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		goto put_rpm;
-
 	obj = i915_gem_object_lookup(file, args->handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (!obj)
+		return -ENOENT;
 
 	/* Bounds check destination. */
 	if (args->offset > obj->base.size ||
 	    args->size > obj->base.size - args->offset) {
 		ret = -EINVAL;
-		goto out;
+		goto out_unlocked;
 	}
 
-	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
+	ret = __unsafe_wait_rendering(obj, to_rps_client(file), false);
+	if (ret)
+		goto out_unlocked;
+
+	intel_runtime_pm_get(dev_priv);
 
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		goto out_rpm;
+
+	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 	ret = -EFAULT;
 	/* We can only do the GTT pwrite on untiled buffers, as otherwise
 	 * it would end up going through the fenced access, and we'll get
@@ -1413,14 +1418,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 			ret = -ENODEV;
 	}
 
-out:
 	i915_gem_object_put(obj);
-unlock:
 	mutex_unlock(&dev->struct_mutex);
-put_rpm:
 	intel_runtime_pm_put(dev_priv);
 
 	return ret;
+
+out_rpm:
+	intel_runtime_pm_put(dev_priv);
+out_unlocked:
+	i915_gem_object_put_unlocked(obj);
+	return ret;
 }
 
 static enum fb_op_origin
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (8 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04  8:26   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME) - this is handled by
__i915_wait_request.

The biggest danger in the code is walking the object without holding any
locks. We iterate over the set of last requests and carefully grab a
reference upon it. (If it is changing beneath us, that is the usual
userspace race and even with locking you get the same indeterminate
results.) If the request is unreferenced beneath us, it will be disposed
of into the request cache - so we have to carefully order the retrieval
of the request pointer with its removal, and to do this we employ RCU on
the request cache and upon the last_request pointer tracking.

The impact of this is actually quite small - the return to userspace
following the wait was already lockless. What we achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 42 +++++++++++------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a19bbcf0c19b..43069b05bdd2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2615,46 +2615,26 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
-	int i, n = 0;
-	int ret;
+	unsigned long active;
+	int idx, ret = 0;
 
 	if (args->flags != 0)
 		return -EINVAL;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = i915_gem_object_lookup(file, args->bo_handle);
-	if (!obj) {
-		mutex_unlock(&dev->struct_mutex);
+	if (!obj)
 		return -ENOENT;
-	}
-
-	if (!i915_gem_object_is_active(obj))
-		goto out;
 
-	for (i = 0; i < I915_NUM_ENGINES; i++) {
-		struct drm_i915_gem_request *req;
-
-		req = i915_gem_active_get(&obj->last_read[i],
-					  &obj->base.dev->struct_mutex);
-		if (req)
-			requests[n++] = req;
+	active = __I915_BO_ACTIVE(obj);
+	for_each_active(active, idx) {
+		ret = i915_gem_active_wait_unlocked(&obj->last_read[idx], true,
+						    args->timeout_ns >= 0 ? &args->timeout_ns : NULL,
+						    to_rps_client(file));
+		if (ret)
+			break;
 	}
 
-out:
-	i915_gem_object_put(obj);
-	mutex_unlock(&dev->struct_mutex);
-
-	for (i = 0; i < n; i++) {
-		if (ret == 0)
-			ret = i915_wait_request(requests[i], true,
-						args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-						to_rps_client(file));
-		i915_gem_request_put(requests[i]);
-	}
+	i915_gem_object_put_unlocked(obj);
 	return ret;
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (9 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 10:25   ` Joonas Lahtinen
  2016-08-05  7:05   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 110 +++++++++++++++++++++++++++++-----------
 1 file changed, 80 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43069b05bdd2..f2f70f5ff9f4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3721,49 +3721,99 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
 }
 
+static __always_inline unsigned
+__busy_read_flag(const struct drm_i915_gem_request *request)
+{
+	return 0x10000 << request->engine->exec_id;
+}
+
+static __always_inline unsigned int
+__busy_write_flag(const struct drm_i915_gem_request *request)
+{
+	return request->engine->exec_id;
+}
+
+static __always_inline unsigned
+__busy_flag(const struct i915_gem_active *active,
+	    unsigned int (*flag)(const struct drm_i915_gem_request *))
+{
+	struct drm_i915_gem_request *request;
+
+	request = rcu_dereference(active->request);
+	if (!request || i915_gem_request_completed(request))
+		return 0;
+
+	return flag(request);
+}
+
+static inline unsigned
+busy_read_flag(const struct i915_gem_active *active)
+{
+	return __busy_flag(active, __busy_read_flag);
+}
+
+static inline unsigned
+busy_write_flag(const struct i915_gem_active *active)
+{
+	return __busy_flag(active, __busy_write_flag);
+}
+
 int
 i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 		    struct drm_file *file)
 {
 	struct drm_i915_gem_busy *args = data;
 	struct drm_i915_gem_object *obj;
-	int ret;
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
+	unsigned long active;
 
 	obj = i915_gem_object_lookup(file, args->handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (!obj)
+		return -ENOENT;
 
-	/* Count all active objects as busy, even if they are currently not used
-	 * by the gpu. Users of this interface expect objects to eventually
-	 * become non-busy without any further actions.
-	 */
 	args->busy = 0;
-	if (i915_gem_object_is_active(obj)) {
-		struct drm_i915_gem_request *req;
-		int i;
+	active = __I915_BO_ACTIVE(obj);
+	if (active) {
+		int idx;
 
-		for (i = 0; i < I915_NUM_ENGINES; i++) {
-			req = i915_gem_active_peek(&obj->last_read[i],
-						   &obj->base.dev->struct_mutex);
-			if (req)
-				args->busy |= 1 << (16 + req->engine->exec_id);
-		}
-		req = i915_gem_active_peek(&obj->last_write,
-					   &obj->base.dev->struct_mutex);
-		if (req)
-			args->busy |= req->engine->exec_id;
+		/* Yes, the lookups are intentionally racy.
+		 *
+		 * Even though we guard the pointer lookup by RCU, that only
+		 * guarantees that the pointer and its contents remain
+		 * dereferencable and does *not* mean that the request we
+		 * have is the same as the one being tracked by the object.
+		 *
+		 * Consider that we lookup the request just as it is being
+		 * retired and free. We take a local copy of the pointer,
+		 * but before we add its engine into the busy set, the other
+		 * thread reallocates it and assigns it to a task on another
+		 * engine with a fresh and incomplete seqno.
+		 *
+		 * So after we lookup the engine's id, we double check that
+		 * the active request is the same and only then do we add it
+		 * into the busy set.
+		 */
+		rcu_read_lock();
+
+		for_each_active(active, idx)
+			args->busy |= busy_read_flag(&obj->last_read[idx]);
+
+		/* For ABI sanity, we only care that the write engine is in
+		 * the set of read engines. This is ensured by the ordering
+		 * of setting last_read/last_write in i915_vma_move_to_active,
+		 * and then in reverse in retire.
+		 *
+		 * We don't care that the set of active read/write engines
+		 * may change during construction of the result, as it is
+		 * equally liable to change before userspace can inspect
+		 * the result.
+		 */
+		args->busy |= busy_write_flag(&obj->last_write);
+
+		rcu_read_unlock();
 	}
 
-	i915_gem_object_put(obj);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
-	return ret;
+	i915_gem_object_put_unlocked(obj);
+	return 0;
 }
 
 int
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (10 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 10:32   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f2f70f5ff9f4..51ec5cd1c6ca 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1510,25 +1510,28 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_sw_finish *args = data;
 	struct drm_i915_gem_object *obj;
-	int ret = 0;
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
+	int ret;
 
 	obj = i915_gem_object_lookup(file, args->handle);
-	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (!obj)
+		return -ENOENT;
 
 	/* Pinned buffers may be scanout, so flush the cache */
-	if (obj->pin_display)
+	if (READ_ONCE(obj->pin_display)) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			goto unref;
+
 		i915_gem_object_flush_cpu_write_domain(obj);
 
-	i915_gem_object_put(obj);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
+		i915_gem_object_put(obj);
+		mutex_unlock(&dev->struct_mutex);
+	} else {
+		ret = 0;
+unref:
+		i915_gem_object_put_unlocked(obj);
+	}
+
 	return ret;
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (11 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 10:36   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 14/16] drm/i915: Remove locking for get_tiling Chris Wilson
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 51ec5cd1c6ca..4b8a391912bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3853,11 +3853,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 		goto unlock;
 	}
 
-	if (i915_gem_obj_is_pinned(obj)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	if (obj->pages &&
 	    obj->tiling_mode != I915_TILING_NONE &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -3876,7 +3871,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 
 	args->retained = obj->madv != __I915_MADV_PURGED;
 
-out:
 	i915_gem_object_put(obj);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/16] drm/i915: Remove locking for get_tiling
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (12 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 10:40   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.

get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.

v2: Make tiling_mode a full unsigned int so that we can trivially use it
with READ_ONCE(). Separating it out into manual control over the flags
field was too noisy for a simple patch. Note, that we can use the lower
bits of obj->stride for the tiling mode.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h        | 15 ++++++++-------
 drivers/gpu/drm/i915/i915_gem_tiling.c | 10 +++-------
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d09c87fb207d..88b4fd8cb275 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2184,10 +2184,6 @@ struct drm_i915_gem_object {
 	unsigned int madv:2;
 
 	/**
-	 * Current tiling mode for the object.
-	 */
-	unsigned int tiling_mode:2;
-	/**
 	 * Whether the tiling parameters for the currently associated fence
 	 * register have changed. Note that for the purposes of tracking
 	 * tiling changes we also treat the unfenced register, the register
@@ -2219,6 +2215,14 @@ struct drm_i915_gem_object {
 
 	atomic_t frontbuffer_bits;
 
+	/**
+	 * Current tiling mode for the object.
+	 */
+	unsigned int tiling_mode;
+
+	/** Current tiling stride for the object, if it's tiled. */
+	uint32_t stride;
+
 	unsigned int has_wc_mmap;
 	/** Count of VMA actually bound by this object */
 	unsigned int bind_count;
@@ -2246,9 +2250,6 @@ struct drm_i915_gem_object {
 	struct i915_gem_active last_write;
 	struct i915_gem_active last_fence;
 
-	/** Current tiling stride for the object, if it's tiled. */
-	uint32_t stride;
-
 	/** References from framebuffers, locks out tiling changes. */
 	unsigned long framebuffer_references;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index b7f9875f69b4..c0e01333bddf 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -303,10 +303,8 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	if (!obj)
 		return -ENOENT;
 
-	mutex_lock(&dev->struct_mutex);
-
-	args->tiling_mode = obj->tiling_mode;
-	switch (obj->tiling_mode) {
+	args->tiling_mode = READ_ONCE(obj->tiling_mode);
+	switch (args->tiling_mode) {
 	case I915_TILING_X:
 		args->swizzle_mode = dev_priv->mm.bit_6_swizzle_x;
 		break;
@@ -330,8 +328,6 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	if (args->swizzle_mode == I915_BIT_6_SWIZZLE_9_10_17)
 		args->swizzle_mode = I915_BIT_6_SWIZZLE_9_10;
 
-	i915_gem_object_put(obj);
-	mutex_unlock(&dev->struct_mutex);
-
+	i915_gem_object_put_unlocked(obj);
 	return 0;
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (13 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 14/16] drm/i915: Remove locking for get_tiling Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 11:17   ` Joonas Lahtinen
  2016-08-01 18:22 ` [PATCH 16/16] drm/i915: Assert that the request hasn't been retired Chris Wilson
  2016-08-02  5:00 ` ✗ Ro.CI.BAT: failure for series starting with [01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  2 +-
 drivers/gpu/drm/i915/i915_drv.h            | 29 ++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem.c            | 20 ++++++++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_fence.c      | 39 ++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_tiling.c     | 16 ++++++------
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/intel_display.c       | 34 +++++++++++++-------------
 drivers/gpu/drm/i915/intel_fbc.c           |  2 +-
 drivers/gpu/drm/i915/intel_overlay.c       |  2 +-
 drivers/gpu/drm/i915/intel_pm.c            |  2 +-
 drivers/gpu/drm/i915/intel_sprite.c        | 12 ++++-----
 12 files changed, 94 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 02c077caa282..b7d0933fbbc6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -101,7 +101,7 @@ static char get_pin_flag(struct drm_i915_gem_object *obj)
 
 static char get_tiling_flag(struct drm_i915_gem_object *obj)
 {
-	switch (obj->tiling_mode) {
+	switch (i915_gem_object_get_tiling(obj)) {
 	default:
 	case I915_TILING_NONE: return ' ';
 	case I915_TILING_X: return 'X';
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 88b4fd8cb275..74f3e08a8a14 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2215,13 +2215,10 @@ struct drm_i915_gem_object {
 
 	atomic_t frontbuffer_bits;
 
-	/**
-	 * Current tiling mode for the object.
-	 */
-	unsigned int tiling_mode;
-
 	/** Current tiling stride for the object, if it's tiled. */
-	uint32_t stride;
+	unsigned int tiling_and_stride;
+#define TILING_MASK 0x3
+#define STRIDE_MASK (~TILING_MASK)
 
 	unsigned int has_wc_mmap;
 	/** Count of VMA actually bound by this object */
@@ -2360,6 +2357,24 @@ i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
 	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
 }
 
+static inline unsigned int
+i915_gem_object_get_tiling(struct drm_i915_gem_object *obj)
+{
+	return obj->tiling_and_stride & TILING_MASK;
+}
+
+static inline bool
+i915_gem_object_is_tiled(struct drm_i915_gem_object *obj)
+{
+	return i915_gem_object_get_tiling(obj) != I915_TILING_NONE;
+}
+
+static inline unsigned int
+i915_gem_object_get_stride(struct drm_i915_gem_object *obj)
+{
+	return obj->tiling_and_stride & STRIDE_MASK;
+}
+
 /*
  * Optimised SGL iterator for GEM objects
  */
@@ -3457,7 +3472,7 @@ static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_objec
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 
 	return dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
-		obj->tiling_mode != I915_TILING_NONE;
+		i915_gem_object_is_tiled(obj);
 }
 
 /* i915_debugfs.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4b8a391912bc..50472dfb4f30 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1035,7 +1035,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 	int ret;
 	bool hit_slow_path = false;
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		return -EFAULT;
 
 	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -1669,7 +1669,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	/* Use a partial view if the object is bigger than the aperture. */
 	if (obj->base.size >= ggtt->mappable_end &&
-	    obj->tiling_mode == I915_TILING_NONE) {
+	    !i915_gem_object_is_tiled(obj)) {
 		static const unsigned int chunk_size = 256; // 1 MiB
 
 		memset(&view, 0, sizeof(view));
@@ -2187,7 +2187,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_needs_bit17_swizzle(obj))
 		i915_gem_object_do_bit_17_swizzle(obj);
 
-	if (obj->tiling_mode != I915_TILING_NONE &&
+	if (i915_gem_object_is_tiled(obj) &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
 		i915_gem_object_pin_pages(obj);
 
@@ -2929,10 +2929,12 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 
 	size = max(size, vma->size);
 	if (flags & PIN_MAPPABLE)
-		size = i915_gem_get_ggtt_size(dev_priv, size, obj->tiling_mode);
+		size = i915_gem_get_ggtt_size(dev_priv, size,
+					      i915_gem_object_get_tiling(obj));
 
 	min_alignment =
-		i915_gem_get_ggtt_alignment(dev_priv, size, obj->tiling_mode,
+		i915_gem_get_ggtt_alignment(dev_priv, size,
+					    i915_gem_object_get_tiling(obj),
 					    flags & PIN_MAPPABLE);
 	if (alignment == 0)
 		alignment = min_alignment;
@@ -3628,10 +3630,10 @@ void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
 
 	fence_size = i915_gem_get_ggtt_size(dev_priv,
 					    obj->base.size,
-					    obj->tiling_mode);
+					    i915_gem_object_get_tiling(obj));
 	fence_alignment = i915_gem_get_ggtt_alignment(dev_priv,
 						      obj->base.size,
-						      obj->tiling_mode,
+						      i915_gem_object_get_tiling(obj),
 						      true);
 
 	fenceable = (vma->node.size == fence_size &&
@@ -3854,7 +3856,7 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (obj->pages &&
-	    obj->tiling_mode != I915_TILING_NONE &&
+	    i915_gem_object_is_tiled(obj) &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
 		if (obj->madv == I915_MADV_WILLNEED)
 			i915_gem_object_unpin_pages(obj);
@@ -4024,7 +4026,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	if (obj->pages && obj->madv == I915_MADV_WILLNEED &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
-	    obj->tiling_mode != I915_TILING_NONE)
+	    i915_gem_object_is_tiled(obj))
 		i915_gem_object_unpin_pages(obj);
 
 	if (WARN_ON(obj->pages_pin_count))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f92bfafd1f49..1251810115ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -802,7 +802,7 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *engine,
 			entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE;
 		need_fence =
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
-			obj->tiling_mode != I915_TILING_NONE;
+			i915_gem_object_is_tiled(obj);
 		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (entry->flags & EXEC_OBJECT_PINNED)
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 3b462da612ca..9e8173fe2a09 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -86,20 +86,22 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		unsigned int tiling = i915_gem_object_get_tiling(obj);
+		unsigned int stride = i915_gem_object_get_stride(obj);
 		uint64_t val;
 
 		/* Adjust fence size to match tiled area */
-		if (obj->tiling_mode != I915_TILING_NONE) {
-			uint32_t row_size = obj->stride *
-				(obj->tiling_mode == I915_TILING_Y ? 32 : 8);
+		if (tiling != I915_TILING_NONE) {
+			uint32_t row_size = stride *
+				(tiling == I915_TILING_Y ? 32 : 8);
 			size = (size / row_size) * row_size;
 		}
 
 		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
 				 0xfffff000) << 32;
 		val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
-		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
-		if (obj->tiling_mode == I915_TILING_Y)
+		val |= (uint64_t)((stride / 128) - 1) << fence_pitch_shift;
+		if (tiling == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
 
@@ -122,6 +124,8 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		unsigned int tiling = i915_gem_object_get_tiling(obj);
+		unsigned int stride = i915_gem_object_get_stride(obj);
 		int pitch_val;
 		int tile_width;
 
@@ -131,17 +135,17 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08x) aligned\n",
 		     i915_gem_obj_ggtt_offset(obj), obj->map_and_fenceable, size);
 
-		if (obj->tiling_mode == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
+		if (tiling == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
 			tile_width = 128;
 		else
 			tile_width = 512;
 
 		/* Note: pitch better be a power of two tile widths */
-		pitch_val = obj->stride / tile_width;
+		pitch_val = stride / tile_width;
 		pitch_val = ffs(pitch_val) - 1;
 
 		val = i915_gem_obj_ggtt_offset(obj);
-		if (obj->tiling_mode == I915_TILING_Y)
+		if (tiling == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
 		val |= I915_FENCE_SIZE_BITS(size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -161,6 +165,8 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		unsigned int tiling = i915_gem_object_get_tiling(obj);
+		unsigned int stride = i915_gem_object_get_stride(obj);
 		uint32_t pitch_val;
 
 		WARN((i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK) ||
@@ -169,11 +175,11 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 		     "object 0x%08llx not 512K or pot-size 0x%08x aligned\n",
 		     i915_gem_obj_ggtt_offset(obj), size);
 
-		pitch_val = obj->stride / 128;
+		pitch_val = stride / 128;
 		pitch_val = ffs(pitch_val) - 1;
 
 		val = i915_gem_obj_ggtt_offset(obj);
-		if (obj->tiling_mode == I915_TILING_Y)
+		if (tiling == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
 		val |= I830_FENCE_SIZE_BITS(size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
@@ -201,9 +207,12 @@ static void i915_gem_write_fence(struct drm_device *dev, int reg,
 	if (i915_gem_object_needs_mb(dev_priv->fence_regs[reg].obj))
 		mb();
 
-	WARN(obj && (!obj->stride || !obj->tiling_mode),
+	WARN(obj &&
+	     (!i915_gem_object_get_stride(obj) ||
+	      !i915_gem_object_get_tiling(obj)),
 	     "bogus fence setup with stride: 0x%x, tiling mode: %i\n",
-	     obj->stride, obj->tiling_mode);
+	     i915_gem_object_get_stride(obj),
+	     i915_gem_object_get_tiling(obj));
 
 	if (IS_GEN2(dev))
 		i830_write_fence_reg(dev, reg, obj);
@@ -248,7 +257,7 @@ static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 
 static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
 {
-	if (obj->tiling_mode)
+	if (i915_gem_object_is_tiled(obj))
 		i915_gem_release_mmap(obj);
 
 	/* As we do not have an associated fence register, we will force
@@ -361,7 +370,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	bool enable = obj->tiling_mode != I915_TILING_NONE;
+	bool enable = i915_gem_object_is_tiled(obj);
 	struct drm_i915_fence_reg *reg;
 	int ret;
 
@@ -477,7 +486,7 @@ void i915_gem_restore_fences(struct drm_device *dev)
 		 */
 		if (reg->obj) {
 			i915_gem_object_update_fence(reg->obj, reg,
-						     reg->obj->tiling_mode);
+						     i915_gem_object_get_tiling(reg->obj));
 		} else {
 			i915_gem_write_fence(dev, i, NULL);
 		}
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index c0e01333bddf..44ac67feff92 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -214,8 +214,8 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		}
 	}
 
-	if (args->tiling_mode != obj->tiling_mode ||
-	    args->stride != obj->stride) {
+	if (args->tiling_mode != i915_gem_object_get_tiling(obj) ||
+	    args->stride != i915_gem_object_get_stride(obj)) {
 		/* We need to rebind the object if its current allocation
 		 * no longer meets the alignment restrictions for its new
 		 * tiling mode. Otherwise we can just leave it alone, but
@@ -238,7 +238,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
 				if (args->tiling_mode == I915_TILING_NONE)
 					i915_gem_object_unpin_pages(obj);
-				if (obj->tiling_mode == I915_TILING_NONE)
+				if (!i915_gem_object_is_tiled(obj))
 					i915_gem_object_pin_pages(obj);
 			}
 
@@ -247,16 +247,16 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 							 &dev->struct_mutex) ||
 				obj->fence_reg != I915_FENCE_REG_NONE;
 
-			obj->tiling_mode = args->tiling_mode;
-			obj->stride = args->stride;
+			obj->tiling_and_stride =
+				args->stride | args->tiling_mode;
 
 			/* Force the fence to be reacquired for GTT access */
 			i915_gem_release_mmap(obj);
 		}
 	}
 	/* we have to maintain this existing ABI... */
-	args->stride = obj->stride;
-	args->tiling_mode = obj->tiling_mode;
+	args->stride = i915_gem_object_get_stride(obj);
+	args->tiling_mode = i915_gem_object_get_tiling(obj);
 
 	/* Try to preallocate memory required to save swizzling on put-pages */
 	if (i915_gem_object_needs_bit17_swizzle(obj)) {
@@ -303,7 +303,7 @@ i915_gem_get_tiling(struct drm_device *dev, void *data,
 	if (!obj)
 		return -ENOENT;
 
-	args->tiling_mode = READ_ONCE(obj->tiling_mode);
+	args->tiling_mode = READ_ONCE(obj->tiling_and_stride) & TILING_MASK;
 	switch (args->tiling_mode) {
 	case I915_TILING_X:
 		args->swizzle_mode = dev_priv->mm.bit_6_swizzle_x;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 0edae7d0207d..887577eaf367 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -781,7 +781,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->pinned = 0;
 	if (i915_gem_obj_is_pinned(obj))
 		err->pinned = 1;
-	err->tiling = obj->tiling_mode;
+	err->tiling = i915_gem_object_get_tiling(obj);
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 33e815c20e55..9240f3a6508d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2465,9 +2465,8 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 		return false;
 	}
 
-	obj->tiling_mode = plane_config->tiling;
-	if (obj->tiling_mode == I915_TILING_X)
-		obj->stride = fb->pitches[0];
+	if (plane_config->tiling == I915_TILING_X)
+		obj->tiling_and_stride = fb->pitches[0] | I915_TILING_X;
 
 	mode_cmd.pixel_format = fb->pixel_format;
 	mode_cmd.width = fb->width;
@@ -2593,7 +2592,7 @@ valid_fb:
 	intel_state->dst.y2 = plane_state->crtc_y + plane_state->crtc_h;
 
 	obj = intel_fb_obj(fb);
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		dev_priv->preserve_bios_swizzle = true;
 
 	drm_framebuffer_reference(fb);
@@ -2671,8 +2670,7 @@ static void i9xx_update_primary_plane(struct drm_plane *primary,
 		BUG();
 	}
 
-	if (INTEL_INFO(dev)->gen >= 4 &&
-	    obj->tiling_mode != I915_TILING_NONE)
+	if (INTEL_INFO(dev)->gen >= 4 && i915_gem_object_is_tiled(obj))
 		dspcntr |= DISPPLANE_TILED;
 
 	if (IS_G4X(dev))
@@ -2781,7 +2779,7 @@ static void ironlake_update_primary_plane(struct drm_plane *primary,
 		BUG();
 	}
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		dspcntr |= DISPPLANE_TILED;
 
 	if (!IS_HASWELL(dev) && !IS_BROADWELL(dev))
@@ -11199,7 +11197,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
 			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
 	intel_ring_emit(ring, fb->pitches[0]);
 	intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset |
-			obj->tiling_mode);
+			i915_gem_object_get_tiling(obj));
 
 	/* XXX Enabling the panel-fitter across page-flip is so far
 	 * untested on non-native modes, so ignore it for now.
@@ -11231,7 +11229,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
 
 	intel_ring_emit(ring, MI_DISPLAY_FLIP |
 			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
-	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
+	intel_ring_emit(ring, fb->pitches[0] | i915_gem_object_get_tiling(obj));
 	intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset);
 
 	/* Contrary to the suggestions in the documentation,
@@ -11334,7 +11332,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 	}
 
 	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
-	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
+	intel_ring_emit(ring, fb->pitches[0] | i915_gem_object_get_tiling(obj));
 	intel_ring_emit(ring, intel_crtc->flip_work->gtt_offset);
 	intel_ring_emit(ring, (MI_NOOP));
 
@@ -11441,7 +11439,7 @@ static void ilk_do_mmio_flip(struct intel_crtc *intel_crtc,
 
 	dspcntr = I915_READ(reg);
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		dspcntr |= DISPPLANE_TILED;
 	else
 		dspcntr &= ~DISPPLANE_TILED;
@@ -11669,7 +11667,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 
 	if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev)) {
 		engine = &dev_priv->engine[BCS];
-		if (obj->tiling_mode != intel_fb_obj(work->old_fb)->tiling_mode)
+		if (i915_gem_object_get_tiling(obj) !=
+		    i915_gem_object_get_tiling(intel_fb_obj(work->old_fb)))
 			/* vlv: DISPLAY_FLIP fails to change tiling */
 			engine = NULL;
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
@@ -14919,15 +14918,15 @@ static int intel_framebuffer_init(struct drm_device *dev,
 	if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
 		/* Enforce that fb modifier and tiling mode match, but only for
 		 * X-tiled. This is needed for FBC. */
-		if (!!(obj->tiling_mode == I915_TILING_X) !=
+		if (!!(i915_gem_object_get_tiling(obj) == I915_TILING_X) !=
 		    !!(mode_cmd->modifier[0] == I915_FORMAT_MOD_X_TILED)) {
 			DRM_DEBUG("tiling_mode doesn't match fb modifier\n");
 			return -EINVAL;
 		}
 	} else {
-		if (obj->tiling_mode == I915_TILING_X)
+		if (i915_gem_object_get_tiling(obj) == I915_TILING_X)
 			mode_cmd->modifier[0] = I915_FORMAT_MOD_X_TILED;
-		else if (obj->tiling_mode == I915_TILING_Y) {
+		else if (i915_gem_object_get_tiling(obj) == I915_TILING_Y) {
 			DRM_DEBUG("No Y tiling for legacy addfb\n");
 			return -EINVAL;
 		}
@@ -14971,9 +14970,10 @@ static int intel_framebuffer_init(struct drm_device *dev,
 	}
 
 	if (mode_cmd->modifier[0] == I915_FORMAT_MOD_X_TILED &&
-	    mode_cmd->pitches[0] != obj->stride) {
+	    mode_cmd->pitches[0] != i915_gem_object_get_stride(obj)) {
 		DRM_DEBUG("pitch (%d) must match tiling stride (%d)\n",
-			  mode_cmd->pitches[0], obj->stride);
+			  mode_cmd->pitches[0],
+			  i915_gem_object_get_stride(obj));
 		return -EINVAL;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index 781e2f5f7cd8..6e2fe695cdc4 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -741,7 +741,7 @@ static void intel_fbc_update_state_cache(struct intel_crtc *crtc,
 	cache->fb.pixel_format = fb->pixel_format;
 	cache->fb.stride = fb->pitches[0];
 	cache->fb.fence_reg = obj->fence_reg;
-	cache->fb.tiling_mode = obj->tiling_mode;
+	cache->fb.tiling_mode = i915_gem_object_get_tiling(obj);
 }
 
 static bool intel_fbc_can_activate(struct intel_crtc *crtc)
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index ad08df49ed48..09418ae64068 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -1128,7 +1128,7 @@ int intel_overlay_put_image_ioctl(struct drm_device *dev, void *data,
 	drm_modeset_lock_all(dev);
 	mutex_lock(&dev->struct_mutex);
 
-	if (new_bo->tiling_mode) {
+	if (i915_gem_object_is_tiled(new_bo)) {
 		DRM_DEBUG_KMS("buffer used for overlay image can not be tiled\n");
 		ret = -EINVAL;
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d8a41c7acb3d..668f282c7faf 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -1580,7 +1580,7 @@ static void i9xx_update_wm(struct drm_crtc *unused_crtc)
 		obj = intel_fb_obj(enabled->primary->state->fb);
 
 		/* self-refresh seems busted with untiled */
-		if (obj->tiling_mode == I915_TILING_NONE)
+		if (!i915_gem_object_is_tiled(obj))
 			enabled = NULL;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 0de935ad01c2..87e62d07c347 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -430,7 +430,7 @@ vlv_update_plane(struct drm_plane *dplane,
 	 */
 	sprctl |= SP_GAMMA_ENABLE;
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		sprctl |= SP_TILED;
 
 	/* Sizes are 0 based */
@@ -467,7 +467,7 @@ vlv_update_plane(struct drm_plane *dplane,
 	I915_WRITE(SPSTRIDE(pipe, plane), fb->pitches[0]);
 	I915_WRITE(SPPOS(pipe, plane), (crtc_y << 16) | crtc_x);
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		I915_WRITE(SPTILEOFF(pipe, plane), (y << 16) | x);
 	else
 		I915_WRITE(SPLINOFF(pipe, plane), linear_offset);
@@ -552,7 +552,7 @@ ivb_update_plane(struct drm_plane *plane,
 	 */
 	sprctl |= SPRITE_GAMMA_ENABLE;
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		sprctl |= SPRITE_TILED;
 
 	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
@@ -606,7 +606,7 @@ ivb_update_plane(struct drm_plane *plane,
 	 * register */
 	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
 		I915_WRITE(SPROFFSET(pipe), (y << 16) | x);
-	else if (obj->tiling_mode != I915_TILING_NONE)
+	else if (i915_gem_object_is_tiled(obj))
 		I915_WRITE(SPRTILEOFF(pipe), (y << 16) | x);
 	else
 		I915_WRITE(SPRLINOFF(pipe), linear_offset);
@@ -693,7 +693,7 @@ ilk_update_plane(struct drm_plane *plane,
 	 */
 	dvscntr |= DVS_GAMMA_ENABLE;
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		dvscntr |= DVS_TILED;
 
 	if (IS_GEN6(dev))
@@ -736,7 +736,7 @@ ilk_update_plane(struct drm_plane *plane,
 	I915_WRITE(DVSSTRIDE(pipe), fb->pitches[0]);
 	I915_WRITE(DVSPOS(pipe), (crtc_y << 16) | crtc_x);
 
-	if (obj->tiling_mode != I915_TILING_NONE)
+	if (i915_gem_object_is_tiled(obj))
 		I915_WRITE(DVSTILEOFF(pipe), (y << 16) | x);
 	else
 		I915_WRITE(DVSLINOFF(pipe), linear_offset);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/16] drm/i915: Assert that the request hasn't been retired
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (14 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
@ 2016-08-01 18:22 ` Chris Wilson
  2016-08-04 11:18   ` Joonas Lahtinen
  2016-08-02  5:00 ` ✗ Ro.CI.BAT: failure for series starting with [01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork
  16 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 18:22 UTC (permalink / raw)
  To: intel-gfx

With all callers now not playing tricks with dropping the struct_mutex
between waiting and retiring, we can assert that the request is ready to
be retired.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8289d31c0ef5..b79ee5a35079 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -170,7 +170,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	struct i915_gem_active *active, *next;
 
 	trace_i915_gem_request_retire(request);
-	list_del_init(&request->link);
+	list_del(&request->link);
 
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -228,9 +228,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
 	struct drm_i915_gem_request *tmp;
 
 	lockdep_assert_held(&req->i915->drm.struct_mutex);
-
-	if (list_empty(&req->link))
-		return;
+	GEM_BUG_ON(list_empty(&req->link));
 
 	do {
 		tmp = list_first_entry(&engine->request_list,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
  2016-08-01 18:22 ` [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
@ 2016-08-01 19:28   ` Chris Wilson
  2016-08-04 11:50     ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-01 19:28 UTC (permalink / raw)
  To: intel-gfx

On Mon, Aug 01, 2016 at 07:22:27PM +0100, Chris Wilson wrote:
> The principal motivation for this was to try and eliminate the
> struct_mutex from i915_gem_suspend - but we still need to hold the mutex
> current for the i915_gem_context_lost(). (The issue there is that there
> may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
> struct_mutex via the stop_machine().) For the moment, enabling last
> request tracking for the engine, allows us to do busyness checking and
> waiting without requiring the struct_mutex - which is useful in its own
> right.

Couple of mistakes here: stop-engines tweaking still from when this was
only aiming at making i915_gem_suspend() lockless (now broken out to a
separate patc) and more importantly, accessing
dev_priv->mm.interruptible not under any controlling lock. That takes
passing around bool interruptible and suddenly we have a bigger patch.
:|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
                   ` (15 preceding siblings ...)
  2016-08-01 18:22 ` [PATCH 16/16] drm/i915: Assert that the request hasn't been retired Chris Wilson
@ 2016-08-02  5:00 ` Patchwork
  16 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2016-08-02  5:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
URL   : https://patchwork.freedesktop.org/series/10469/
State : failure

== Summary ==

Applying: drm/i915: Introduce i915_gem_active_wait_unlocked()
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_request.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem_request.h
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem_request.h
error: Failed to merge in the changes.
Patch failed at 0001 drm/i915: Introduce i915_gem_active_wait_unlocked()
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-01 18:22 ` [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
@ 2016-08-03 11:41   ` Joonas Lahtinen
  2016-08-03 11:56     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 11:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> It is useful to be able to wait on pending rendering without grabbing
> the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
> primitive to acquire a reference to the pending request without
> requiring struct_mutex, just the RCU read lock, and then call
> i915_wait_request().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_request.h | 36 +++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 48382ac401fd..d077b023a89f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -546,6 +546,42 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
>  }
>  
>  /**
> + * i915_gem_active_wait_unlocked - waits until the request is completed
> + * @active - the active request on which to wait
> + * @interruptible - whether the wait can be woken by a userspace signal
> + * @timeout - how long to wait at most
> + * @rps - userspace client to charge for a waitboost
> + *
> + * i915_gem_active_wait_unlocked() waits until the request is completed before
> + * returning, without requiring any locks to be held. Note that it does not
> + * retire any requests before returning.
> + *
> + * This function wraps i915_wait_request(), see it for the full details.
> + *
> + * Returns 0 if successful, or a negative error code.
> + */
> +static inline int
> +i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
> +			      bool interruptible,
> +			      s64 *timeout,
> +			      struct intel_rps_client *rps)
> +{
> +	struct drm_i915_gem_request *request;
> +	int ret = 0;
> +
> +	rcu_read_lock();
> +	request = i915_gem_active_get_rcu(active);
> +	rcu_read_unlock();

This looks weird compared to the usual way RCU is used, documentation
explicitly specifies that stuff obtained under rcu_read_lock() can not
be referenced after rcu_read_unlock(). I'd put the rcu_read_lock()
section inside i915_gem_active_get_rcu() to make this less confusing.

Regards, Joonas

> +
> +	if (request) {
> +		ret = i915_wait_request(request, interruptible, timeout, rps);
> +		i915_gem_request_put(request);
> +	}
> +
> +	return ret;
> +}
> +
> +/**
>   * i915_gem_active_retire - waits until the request is retired
>   * @active - the active request on which to wait
>   *
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-03 11:41   ` Joonas Lahtinen
@ 2016-08-03 11:56     ` Chris Wilson
  2016-08-03 12:04       ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-03 11:56 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 03, 2016 at 02:41:05PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > It is useful to be able to wait on pending rendering without grabbing
> > the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
> > primitive to acquire a reference to the pending request without
> > requiring struct_mutex, just the RCU read lock, and then call
> > i915_wait_request().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_request.h | 36 +++++++++++++++++++++++++++++++++
> >  1 file changed, 36 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> > index 48382ac401fd..d077b023a89f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> > @@ -546,6 +546,42 @@ i915_gem_active_wait(const struct i915_gem_active *active, struct mutex *mutex)
> >  }
> >  
> >  /**
> > + * i915_gem_active_wait_unlocked - waits until the request is completed
> > + * @active - the active request on which to wait
> > + * @interruptible - whether the wait can be woken by a userspace signal
> > + * @timeout - how long to wait at most
> > + * @rps - userspace client to charge for a waitboost
> > + *
> > + * i915_gem_active_wait_unlocked() waits until the request is completed before
> > + * returning, without requiring any locks to be held. Note that it does not
> > + * retire any requests before returning.
> > + *
> > + * This function wraps i915_wait_request(), see it for the full details.
> > + *
> > + * Returns 0 if successful, or a negative error code.
> > + */
> > +static inline int
> > +i915_gem_active_wait_unlocked(const struct i915_gem_active *active,
> > +			      bool interruptible,
> > +			      s64 *timeout,
> > +			      struct intel_rps_client *rps)
> > +{
> > +	struct drm_i915_gem_request *request;
> > +	int ret = 0;
> > +
> > +	rcu_read_lock();
> > +	request = i915_gem_active_get_rcu(active);
> > +	rcu_read_unlock();
> 
> This looks weird compared to the usual way RCU is used, documentation
> explicitly specifies that stuff obtained under rcu_read_lock() can not
> be referenced after rcu_read_unlock(). I'd put the rcu_read_lock()
> section inside i915_gem_active_get_rcu() to make this less confusing.

That's not true for i915_gem_active_get_rcu(). I even did the
rcu_pointer_handoff() just so that it should be clear that the returned
pointer is now protected outside of RCU.

To be clearer then,

static inline struct drm_i915_gem_request *
i915_gem_active_get_rcu(const struct i915_gem_active *active)
{
	struct drm_i915_gem_requst *request;

	rcu_read_lock();
	request = __i915_gem_active_get_rcu(active);
	rcu_read_unlock();

	return request;
}

?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-03 11:56     ` Chris Wilson
@ 2016-08-03 12:04       ` Chris Wilson
  2016-08-03 13:30         ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-03 12:04 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On Wed, Aug 03, 2016 at 12:56:39PM +0100, Chris Wilson wrote:
> static inline struct drm_i915_gem_request *
> i915_gem_active_get_rcu(const struct i915_gem_active *active)

Alternative name would be i915_gem_active_get_unlocked()
(Starting to get too unwieldy.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU
  2016-08-01 18:22 ` [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
@ 2016-08-03 13:23   ` Joonas Lahtinen
  2016-08-03 13:36     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 13:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> -	if (!obj) {
> -		ret = -ENOENT;
> -		goto unlock;
> -	}
> +	if (!obj)
> +		return -ENOENT;
>  
>  	/* Try to flush the object off the GPU without holding the lock.
>  	 * We will repeat the flush holding the lock in the normal manner
>  	 * to catch cases where we are gazumped.
>  	 */
> -	ret = i915_gem_object_wait_rendering__nonblocking(obj,
> -							  to_rps_client(file),
> -							  !write_domain);
> +	ret = __unsafe_wait_rendering(obj, to_rps_client(file), !write_domain);
>  	if (ret)
> -		goto unref;
> +		goto out_unlocked;
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		goto out_unlocked;
>  
>  	if (read_domains & I915_GEM_DOMAIN_GTT)
>  		ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
> @@ -1501,11 +1481,13 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  	if (write_domain != 0)
>  		intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
>  
> -unref:
>  	i915_gem_object_put(obj);
> -unlock:
>  	mutex_unlock(&dev->struct_mutex);
>  	return ret;
> +
> +out_unlocked:

This is the sole label, you could call 'err' too.

> +	i915_gem_object_put_unlocked(obj);
> +	return ret;
>  }
>  
>  /**
> @@ -1647,6 +1629,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	int ret = 0;
>  	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
>  
> +	/* Try to flush the object off the GPU first without holding the lock.
> +	 * Upon acquiring the lock, we will perform our sanity checks and then
> +	 * repeat the flush holding the lock in the normal manner to catch cases
> +	 * where we are gazumped.
> +	 */
> +	ret = __unsafe_wait_rendering(obj, NULL, !write);
> +	if (ret)
> +		goto err;
> +

Why do you lift this call super early, tracing will be affected at
least.

>  	intel_runtime_pm_get(dev_priv);
>  
>  	/* We don't use vmf->pgoff since that has the fake offset */
> @@ -1655,23 +1646,14 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  	ret = i915_mutex_lock_interruptible(dev);
>  	if (ret)
> -		goto out;
> +		goto err_rpm;
>  
>  	trace_i915_gem_object_fault(obj, page_offset, true, write);
>  
> -	/* Try to flush the object off the GPU first without holding the lock.
> -	 * Upon reacquiring the lock, we will perform our sanity checks and then
> -	 * repeat the flush holding the lock in the normal manner to catch cases
> -	 * where we are gazumped.
> -	 */
> -	ret = i915_gem_object_wait_rendering__nonblocking(obj, NULL, !write);
> -	if (ret)
> -		goto unlock;
> -
>  	/* Access to snoopable pages through the GTT is incoherent. */
>  	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev)) {
>  		ret = -EFAULT;
> -		goto unlock;
> +		goto err_unlock;
>  	}
>  
>  	/* Use a partial view if the object is bigger than the aperture. */
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/i915: Convert non-blocking userptr waits for requests over to using RCU
  2016-08-01 18:22 ` [PATCH 03/16] drm/i915: Convert non-blocking userptr " Chris Wilson
@ 2016-08-03 13:27   ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 13:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-03 12:04       ` Chris Wilson
@ 2016-08-03 13:30         ` Joonas Lahtinen
  2016-08-03 13:43           ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 13:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ke, 2016-08-03 at 13:04 +0100, Chris Wilson wrote:
> On Wed, Aug 03, 2016 at 12:56:39PM +0100, Chris Wilson wrote:
> > 
> > static inline struct drm_i915_gem_request *
> > i915_gem_active_get_rcu(const struct i915_gem_active *active)
> Alternative name would be i915_gem_active_get_unlocked()
> (Starting to get too unwieldy.)

It's less confusing.

I assume you intend to extend the rcu_read_lock() section?

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU
  2016-08-03 13:23   ` Joonas Lahtinen
@ 2016-08-03 13:36     ` Chris Wilson
  2016-08-03 13:41       ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-03 13:36 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 03, 2016 at 04:23:16PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> >  /**
> > @@ -1647,6 +1629,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	int ret = 0;
> >  	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
> >  
> > +	/* Try to flush the object off the GPU first without holding the lock.
> > +	 * Upon acquiring the lock, we will perform our sanity checks and then
> > +	 * repeat the flush holding the lock in the normal manner to catch cases
> > +	 * where we are gazumped.
> > +	 */
> > +	ret = __unsafe_wait_rendering(obj, NULL, !write);
> > +	if (ret)
> > +		goto err;
> > +
> 
> Why do you lift this call super early, tracing will be affected at
> least.

I was moving it out of the rpm_get, since we don't need to be inside that
runtime barrier. (That rpm get is very interesting btw, but that's for
later!)

The trace can be moved as well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU
  2016-08-03 13:36     ` Chris Wilson
@ 2016-08-03 13:41       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 13:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-03 at 14:36 +0100, Chris Wilson wrote:
> On Wed, Aug 03, 2016 at 04:23:16PM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > >  /**
> > > @@ -1647,6 +1629,15 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > >  	int ret = 0;
> > >  	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
> > >  
> > > +	/* Try to flush the object off the GPU first without holding the lock.
> > > +	 * Upon acquiring the lock, we will perform our sanity checks and then
> > > +	 * repeat the flush holding the lock in the normal manner to catch cases
> > > +	 * where we are gazumped.
> > > +	 */
> > > +	ret = __unsafe_wait_rendering(obj, NULL, !write);
> > > +	if (ret)
> > > +		goto err;
> > > +
> > Why do you lift this call super early, tracing will be affected at
> > least.
> I was moving it out of the rpm_get, since we don't need to be inside that
> runtime barrier. (That rpm get is very interesting btw, but that's for
> later!)
> 
> The trace can be moved as well.

With the trace moved and the label simplified,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-03 13:30         ` Joonas Lahtinen
@ 2016-08-03 13:43           ` Chris Wilson
  2016-08-04 11:51             ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-03 13:43 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 03, 2016 at 04:30:35PM +0300, Joonas Lahtinen wrote:
> On ke, 2016-08-03 at 13:04 +0100, Chris Wilson wrote:
> > On Wed, Aug 03, 2016 at 12:56:39PM +0100, Chris Wilson wrote:
> > > 
> > > static inline struct drm_i915_gem_request *
> > > i915_gem_active_get_rcu(const struct i915_gem_active *active)
> > Alternative name would be i915_gem_active_get_unlocked()
> > (Starting to get too unwieldy.)
> 
> It's less confusing.
> 
> I assume you intend to extend the rcu_read_lock() section?

Yes, I had intended to. At the moment, the other caller has been removed
because I need the struct_mutex as an execbuf-barrier so as of now there
was no value to using RCU there and reverted to simple form.

I still think it is more flexible to allow the caller to control the
locking.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting
  2016-08-01 18:22 ` [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
@ 2016-08-03 13:43   ` Joonas Lahtinen
  2016-08-03 13:49     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-03 13:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> Inside the kthread context, we can't be interrupted by signals so
> touching the mm.interruptible flag is pointless and wait-request now
> consumes EIO itself.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 96ab6161903a..57218cca7e05 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -84,16 +84,9 @@ static void cancel_userptr(struct work_struct *work)
>  	obj->userptr.work = NULL;
>  
>  	if (obj->pages != NULL) {
> -		struct drm_i915_private *dev_priv = to_i915(dev);
> -		bool was_interruptible;
> -
> -		was_interruptible = dev_priv->mm.interruptible;
> -		dev_priv->mm.interruptible = false;
> -

GEM_BUG_ON(dev_priv->mm.interruptible) too much paranoia?

> +		/* We are inside a kthread context and can't be interrupted */
>  		WARN_ON(i915_gem_object_unbind(obj));
>  		WARN_ON(i915_gem_object_put_pages(obj));
> -
> -		dev_priv->mm.interruptible = was_interruptible;
>  	}
>  
>  	i915_gem_object_put(obj);
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting
  2016-08-03 13:43   ` Joonas Lahtinen
@ 2016-08-03 13:49     ` Chris Wilson
  2016-08-04 11:53       ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-03 13:49 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 03, 2016 at 04:43:38PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > Inside the kthread context, we can't be interrupted by signals so
> > touching the mm.interruptible flag is pointless and wait-request now
> > consumes EIO itself.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +--------
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index 96ab6161903a..57218cca7e05 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -84,16 +84,9 @@ static void cancel_userptr(struct work_struct *work)
> >  	obj->userptr.work = NULL;
> >  
> >  	if (obj->pages != NULL) {
> > -		struct drm_i915_private *dev_priv = to_i915(dev);
> > -		bool was_interruptible;
> > -
> > -		was_interruptible = dev_priv->mm.interruptible;
> > -		dev_priv->mm.interruptible = false;
> > -
> 
> GEM_BUG_ON(dev_priv->mm.interruptible) too much paranoia?

It's inmaterial at this point whether or not that is set. That BUG is
something I've considered but never really found a good home. Best idea
I have is i915_mutex_lock_interruptible() (still it catches the victim
and not the guilty party, we need i915_mutex_unlock or something).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom
  2016-08-01 18:22 ` [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
@ 2016-08-04  6:46   ` Joonas Lahtinen
  2016-08-04  6:52     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04  6:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> We can now wait for the GPU (all engines) to become idle without
> requiring the struct_mutex. Inside the shrinker, we need to currently
> take the struct_mutex in order to purge objects and to purge the objects
> we need the GPU to be idle - causing a stall whilst we hold the
> struct_mutex. We can hide most of that stall by performing the wait
> before taking the struct_mutex and only doing essential waits for
> new rendering on objects to be freed.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 1341cb55b6f1..43e53e419982 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -326,9 +326,14 @@ i915_gem_shrinker_lock_uninterruptible(struct drm_i915_private *dev_priv,
>  	unsigned long timeout = msecs_to_jiffies(timeout_ms) + 1;
>  
>  	while (!i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock)) {
> +		if (i915_gem_wait_for_idle(dev_priv) == 0 &&

continue would be much cleaner here, to avoid repeating the lock
calling line? Or how likely is it for engines to be idle but
struct_mutex held for extended period?

Regards, Joonas

> +		    i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock))
> +			break;
> +
>  		schedule_timeout_killable(1);
>  		if (fatal_signal_pending(current))
>  			return false;
> +
>  		if (--timeout == 0) {
>  			pr_err("Unable to lock GPU to purge memory.\n");
>  			return false;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom
  2016-08-04  6:46   ` Joonas Lahtinen
@ 2016-08-04  6:52     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-04  6:52 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 09:46:59AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > We can now wait for the GPU (all engines) to become idle without
> > requiring the struct_mutex. Inside the shrinker, we need to currently
> > take the struct_mutex in order to purge objects and to purge the objects
> > we need the GPU to be idle - causing a stall whilst we hold the
> > struct_mutex. We can hide most of that stall by performing the wait
> > before taking the struct_mutex and only doing essential waits for
> > new rendering on objects to be freed.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index 1341cb55b6f1..43e53e419982 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -326,9 +326,14 @@ i915_gem_shrinker_lock_uninterruptible(struct drm_i915_private *dev_priv,
> >  	unsigned long timeout = msecs_to_jiffies(timeout_ms) + 1;
> >  
> >  	while (!i915_gem_shrinker_lock(&dev_priv->drm, &slu->unlock)) {
> > +		if (i915_gem_wait_for_idle(dev_priv) == 0 &&
> 
> continue would be much cleaner here, to avoid repeating the lock
> calling line?

Yes. This was a small change, but adjusting the loop wasn't that much
bigger.

> Or how likely is it for engines to be idle but
> struct_mutex held for extended period?

Currently, quite possible. struct_mutex is the defacto bkl, we may be
causing memory pressure by swapping in gigabytes of an object on another
thread.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset
  2016-08-01 18:22 ` [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
@ 2016-08-04  7:25   ` Joonas Lahtinen
  2016-08-04  7:30     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04  7:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> If we make the observation that mmap-offsets are only released when we
> free an object, we can then deduce that the shrinker only creates free
> space in the mmap arena indirectly by flushing the request list and
> freeing expired objects. If we combine this with the lockless
> vma-manager and lockless idling, we can avoid taking our big struct_mutex
> until we need to actually free the requests.
> 
> One side-effect is that we defer the madvise checking until we need the
> pages (i.e. the fault handler). This brings us into line with the other
> delayed checks (and madvise in general).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Don't necessarily agree with all the "== 0", because it's in the
minority but;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal
  2016-08-01 18:22 ` [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal Chris Wilson
@ 2016-08-04  7:26   ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04  7:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> After removing the user of this wart, we can remove the wart entirely.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset
  2016-08-04  7:25   ` Joonas Lahtinen
@ 2016-08-04  7:30     ` Chris Wilson
  2016-08-04 11:57       ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-04  7:30 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 10:25:42AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > If we make the observation that mmap-offsets are only released when we
> > free an object, we can then deduce that the shrinker only creates free
> > space in the mmap arena indirectly by flushing the request list and
> > freeing expired objects. If we combine this with the lockless
> > vma-manager and lockless idling, we can avoid taking our big struct_mutex
> > until we need to actually free the requests.
> > 
> > One side-effect is that we defer the madvise checking until we need the
> > pages (i.e. the fault handler). This brings us into line with the other
> > delayed checks (and madvise in general).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Don't necessarily agree with all the "== 0", because it's in the
> minority but;

How about if I take this opportunity to start a new life using

int err;
if (!err) ...

?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite
  2016-08-01 18:22 ` [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
@ 2016-08-04  7:53   ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04  7:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> If we try and read or write to an active request, we first must wait
> upon the GPU completing that request. Let's do that without holding the
> mutex (and so allow someone else to access the GPU whilst we wait). Upn
                                                               Upon --^
> completion, we will reacquire the mutex and only then start the
> operation (i.e. we do not rely on state from before dropping the mutex).
> 

<SNIP>

> @@ -953,25 +953,26 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>  		       args->size))
>  		return -EFAULT;
>  
> -	ret = i915_mutex_lock_interruptible(dev);
> -	if (ret)
> -		return ret;
> -
>  	obj = i915_gem_object_lookup(file, args->handle);
> -	if (!obj) {
> -		ret = -ENOENT;
> -		goto unlock;
> -	}
> +	if (!obj)
> +		return -ENOENT;
>  
>  	/* Bounds check source.  */
>  	if (args->offset > obj->base.size ||
>  	    args->size > obj->base.size - args->offset) {
>  		ret = -EINVAL;
> -		goto out;
> +		goto out_unlocked;

Again the sole exit path, could be just 'err'

> @@ -1368,27 +1372,28 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>  			return -EFAULT;
>  	}
>  
> -	intel_runtime_pm_get(dev_priv);
> -
> -	ret = i915_mutex_lock_interruptible(dev);
> -	if (ret)
> -		goto put_rpm;
> -
>  	obj = i915_gem_object_lookup(file, args->handle);
> -	if (!obj) {
> -		ret = -ENOENT;
> -		goto unlock;
> -	}
> +	if (!obj)
> +		return -ENOENT;
>  
>  	/* Bounds check destination. */
>  	if (args->offset > obj->base.size ||
>  	    args->size > obj->base.size - args->offset) {
>  		ret = -EINVAL;
> -		goto out;
> +		goto out_unlocked;

Ditto in this func, just 'err'

>  	}
>  
> -	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> +	ret = __unsafe_wait_rendering(obj, to_rps_client(file), false);
> +	if (ret)
> +		goto out_unlocked;
> +
> +	intel_runtime_pm_get(dev_priv);
>  

As discussed in IRC, pread_ioctl does not take RPM for the fallback
path, it should.

> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		goto out_rpm;
> +
> +	trace_i915_gem_object_pwrite(obj, args->offset, args->size);

For tracing, I'm not quite sure if we should emit failed attempts too,
I guess it's a policy decision?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-08-01 18:22 ` [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
@ 2016-08-04  8:26   ` Joonas Lahtinen
  2016-08-04  8:37     ` Chris Wilson
  2016-08-04 10:02     ` Chris Wilson
  0 siblings, 2 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04  8:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> With a bit of care (and leniency) we can iterate over the object and
> wait for previous rendering to complete with judicial use of atomic
> reference counting. The ABI requires us to ensure that an active object
> is eventually flushed (like the busy-ioctl) which is guaranteed by our
> management of requests (i.e. everything that is submitted to hardware is
> flushed in the same request). All we have to do is ensure that we can
> detect when the requests are complete for reporting when the object is
> idle (without triggering ETIME) - this is handled by
> __i915_wait_request.
> 
> The biggest danger in the code is walking the object without holding any
> locks. We iterate over the set of last requests and carefully grab a
> reference upon it. (If it is changing beneath us, that is the usual
> userspace race and even with locking you get the same indeterminate
> results.) If the request is unreferenced beneath us, it will be disposed
> of into the request cache - so we have to carefully order the retrieval
> of the request pointer with its removal, and to do this we employ RCU on
> the request cache and upon the last_request pointer tracking.
> 
> The impact of this is actually quite small - the return to userspace
> following the wait was already lockless. What we achieve here is
> completing an already finished wait without hitting the struct_mutex,
> our hold is quite short and so we are typically just a victim of
> contention rather than a cause.
> 

The commit message seems little bit disconnect with the code, making
the patch sound much more complex than it is. Is it up to date? Or
maybe parts of this explanation would belong to an earlier patch?

> +	active = __I915_BO_ACTIVE(obj);
> +	for_each_active(active, idx) {
> +		ret = i915_gem_active_wait_unlocked(&obj->last_read[idx], true,
> +						    args->timeout_ns >= 0 ? &args->timeout_ns : NULL,
> +						    to_rps_client(file));

Long line.

This and explanation touched up,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-08-04  8:26   ` Joonas Lahtinen
@ 2016-08-04  8:37     ` Chris Wilson
  2016-08-04 10:02     ` Chris Wilson
  1 sibling, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-04  8:37 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 11:26:04AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > With a bit of care (and leniency) we can iterate over the object and
> > wait for previous rendering to complete with judicial use of atomic
> > reference counting. The ABI requires us to ensure that an active object
> > is eventually flushed (like the busy-ioctl) which is guaranteed by our
> > management of requests (i.e. everything that is submitted to hardware is
> > flushed in the same request). All we have to do is ensure that we can
> > detect when the requests are complete for reporting when the object is
> > idle (without triggering ETIME) - this is handled by
> > __i915_wait_request.
> > 
> > The biggest danger in the code is walking the object without holding any
> > locks. We iterate over the set of last requests and carefully grab a
> > reference upon it. (If it is changing beneath us, that is the usual
> > userspace race and even with locking you get the same indeterminate
> > results.) If the request is unreferenced beneath us, it will be disposed
> > of into the request cache - so we have to carefully order the retrieval
> > of the request pointer with its removal, and to do this we employ RCU on
> > the request cache and upon the last_request pointer tracking.
> > 
> > The impact of this is actually quite small - the return to userspace
> > following the wait was already lockless. What we achieve here is
> > completing an already finished wait without hitting the struct_mutex,
> > our hold is quite short and so we are typically just a victim of
> > contention rather than a cause.
> > 
> 
> The commit message seems little bit disconnect with the code, making
> the patch sound much more complex than it is. Is it up to date? Or
> maybe parts of this explanation would belong to an earlier patch?

Well, you caught me. This used to be quite ugly until it was hidden
await behind a new helper, and this code has been through multiple
phases gradually removing the struct_mutex further and further.

With a couple of glaring exceptions (the bits not talking about the
innards but about the top level implementation), it is describing what is
going on under the covers.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-08-04  8:26   ` Joonas Lahtinen
  2016-08-04  8:37     ` Chris Wilson
@ 2016-08-04 10:02     ` Chris Wilson
  2016-08-04 12:00       ` Joonas Lahtinen
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 10:02 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 11:26:04AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > With a bit of care (and leniency) we can iterate over the object and
> > wait for previous rendering to complete with judicial use of atomic
> > reference counting. The ABI requires us to ensure that an active object
> > is eventually flushed (like the busy-ioctl) which is guaranteed by our
> > management of requests (i.e. everything that is submitted to hardware is
> > flushed in the same request). All we have to do is ensure that we can
> > detect when the requests are complete for reporting when the object is
> > idle (without triggering ETIME) - this is handled by
> > __i915_wait_request.
> > 
> > The biggest danger in the code is walking the object without holding any
> > locks. We iterate over the set of last requests and carefully grab a
> > reference upon it. (If it is changing beneath us, that is the usual
> > userspace race and even with locking you get the same indeterminate
> > results.) If the request is unreferenced beneath us, it will be disposed
> > of into the request cache - so we have to carefully order the retrieval
> > of the request pointer with its removal, and to do this we employ RCU on
> > the request cache and upon the last_request pointer tracking.
> > 
> > The impact of this is actually quite small - the return to userspace
> > following the wait was already lockless. What we achieve here is
> > completing an already finished wait without hitting the struct_mutex,
> > our hold is quite short and so we are typically just a victim of
> > contention rather than a cause.
> > 
> 
> The commit message seems little bit disconnect with the code, making
> the patch sound much more complex than it is. Is it up to date? Or
> maybe parts of this explanation would belong to an earlier patch?

"""
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().

The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in 
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention 
point!
"""
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-01 18:22 ` [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
@ 2016-08-04 10:25   ` Joonas Lahtinen
  2016-08-04 10:30     ` Chris Wilson
  2016-08-05  7:05   ` Joonas Lahtinen
  1 sibling, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 10:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Akash Goel

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> +static __always_inline unsigned
> +__busy_read_flag(const struct drm_i915_gem_request *request)
> +{
> +	return 0x10000 << request->engine->exec_id;

Duh, this really is our ABI? No BIT(NUM_ENGINES) or something sane?

Make a comment of such situation.

> +		/* Yes, the lookups are intentionally racy.
> +		 *
> +		 * Even though we guard the pointer lookup by RCU, that only
> +		 * guarantees that the pointer and its contents remain
> +		 * dereferencable and does *not* mean that the request we
> +		 * have is the same as the one being tracked by the object.
> +		 *
> +		 * Consider that we lookup the request just as it is being
> +		 * retired and free. We take a local copy of the pointer,

s/free/freed/

> +		 * but before we add its engine into the busy set, the other
> +		 * thread reallocates it and assigns it to a task on another
> +		 * engine with a fresh and incomplete seqno.
> +		 *
> +		 * So after we lookup the engine's id, we double check that

This double check is nowhere to be seen, time to update this comment
too?

The code itself is quite OK, but the comments mislead my understanding
of the code again.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-04 10:25   ` Joonas Lahtinen
@ 2016-08-04 10:30     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 10:30 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx, Akash Goel

On Thu, Aug 04, 2016 at 01:25:08PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > +static __always_inline unsigned
> > +__busy_read_flag(const struct drm_i915_gem_request *request)
> > +{
> > +	return 0x10000 << request->engine->exec_id;
> 
> Duh, this really is our ABI? No BIT(NUM_ENGINES) or something sane?

BIT(NUM_ENGINES) as uabi, you call that sane :)

> Make a comment of such situation.

Like BUILD_BUG_ON(NUM_ENGINES > 16).

> > +		/* Yes, the lookups are intentionally racy.
> > +		 *
> > +		 * Even though we guard the pointer lookup by RCU, that only
> > +		 * guarantees that the pointer and its contents remain
> > +		 * dereferencable and does *not* mean that the request we
> > +		 * have is the same as the one being tracked by the object.
> > +		 *
> > +		 * Consider that we lookup the request just as it is being
> > +		 * retired and free. We take a local copy of the pointer,
> 
> s/free/freed/
> 
> > +		 * but before we add its engine into the busy set, the other
> > +		 * thread reallocates it and assigns it to a task on another
> > +		 * engine with a fresh and incomplete seqno.
> > +		 *
> > +		 * So after we lookup the engine's id, we double check that
> 
> This double check is nowhere to be seen, time to update this comment
> too?

Actualy snuck it back in. I took it out thinking that there wasn't
sufficent read-dependency to guarrantee the ordering (but now reversed
my decision there) and had re-read my comment about why the check is
required.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl
  2016-08-01 18:22 ` [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2016-08-04 10:32   ` Joonas Lahtinen
  2016-08-04 10:48     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 10:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
>  	/* Pinned buffers may be scanout, so flush the cache */
> -	if (obj->pin_display)
> +	if (READ_ONCE(obj->pin_display)) {
> +		ret = i915_mutex_lock_interruptible(dev);
> +		if (ret)
> +			goto unref;

See below.

> +
>  		i915_gem_object_flush_cpu_write_domain(obj);
>  
> -	i915_gem_object_put(obj);
> -unlock:
> -	mutex_unlock(&dev->struct_mutex);
> +		i915_gem_object_put(obj);
> +		mutex_unlock(&dev->struct_mutex);
> +	} else {
> +		ret = 0;
> +unref:

No, nope, nein, ei, njet, inte, nack; this shall not pass.

Most inappropriate use of goto I've seen shortly.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl
  2016-08-01 18:22 ` [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2016-08-04 10:36   ` Joonas Lahtinen
  2016-08-04 10:42     ` Chris Wilson
  0 siblings, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 10:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> We don't need to incur the overhead of checking whether the object is
> pinned prior to changing its madvise. If the object is pinned, the
> madvise will not take effect until it is unpinned and so we cannot free
> the pages being pointed at by hardware. Marking a pinned object with
> allocated pages as DONTNEED will not trigger any undue warnings. The check
> is therefore superfluous, and by removing it we can remove a linear walk
> over all the vma the object has.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 51ec5cd1c6ca..4b8a391912bc 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3853,11 +3853,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>  		goto unlock;
>  	}
>  
> -	if (i915_gem_obj_is_pinned(obj)) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -

Does not this change our ABI too?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/16] drm/i915: Remove locking for get_tiling
  2016-08-01 18:22 ` [PATCH 14/16] drm/i915: Remove locking for get_tiling Chris Wilson
@ 2016-08-04 10:40   ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 10:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> Since we are not concerned with userspace racing itself with set-tiling
> (the order is indeterminant even if we take a lock), then we can safely
> read back the single obj->tiling_mode and do the static lookup of
> swizzle mode without having to take a lock.
> 
> get-tiling is reasonably frequent due to the back-channel passing around
> of tiling parameters in DRI2/DRI3.
> 
> v2: Make tiling_mode a full unsigned int so that we can trivially use it
> with READ_ONCE(). Separating it out into manual control over the flags
> field was too noisy for a simple patch. Note, that we can use the lower

s/, / /,s/can/could/

With that,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl
  2016-08-04 10:36   ` Joonas Lahtinen
@ 2016-08-04 10:42     ` Chris Wilson
  2016-08-04 11:47       ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 10:42 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 01:36:24PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > We don't need to incur the overhead of checking whether the object is
> > pinned prior to changing its madvise. If the object is pinned, the
> > madvise will not take effect until it is unpinned and so we cannot free
> > the pages being pointed at by hardware. Marking a pinned object with
> > allocated pages as DONTNEED will not trigger any undue warnings. The check
> > is therefore superfluous, and by removing it we can remove a linear walk
> > over all the vma the object has.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 6 ------
> >  1 file changed, 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 51ec5cd1c6ca..4b8a391912bc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3853,11 +3853,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> >  		goto unlock;
> >  	}
> >  
> > -	if (i915_gem_obj_is_pinned(obj)) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> 
> Does not this change our ABI too?

Yes. It relaxes an immediate failure condition and enforces it later.

Anyone who tried to purge the scanout object now has their BUG hidden.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl
  2016-08-04 10:32   ` Joonas Lahtinen
@ 2016-08-04 10:48     ` Chris Wilson
  0 siblings, 0 replies; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 10:48 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: Daniel Vetter, intel-gfx

On Thu, Aug 04, 2016 at 01:32:52PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> >  	/* Pinned buffers may be scanout, so flush the cache */
> > -	if (obj->pin_display)
> > +	if (READ_ONCE(obj->pin_display)) {
> > +		ret = i915_mutex_lock_interruptible(dev);
> > +		if (ret)
> > +			goto unref;
> 
> See below.
> 
> > +
> >  		i915_gem_object_flush_cpu_write_domain(obj);
> >  
> > -	i915_gem_object_put(obj);
> > -unlock:
> > -	mutex_unlock(&dev->struct_mutex);
> > +		i915_gem_object_put(obj);
> > +		mutex_unlock(&dev->struct_mutex);
> > +	} else {
> > +		ret = 0;
> > +unref:
> 
> No, nope, nein, ei, njet, inte, nack; this shall not pass.
> 
> Most inappropriate use of goto I've seen shortly.

It is correct though :-p

Jump forward a few patches, so I can write:

        obj = i915_gem_object_lookup(file, args->handle);
        if (!obj)
                return -ENOENT;

        /* Pinned buffers may be scanout, so flush the cache */
        ret = 0;
        if (READ_ONCE(obj->pin_display)) {
                ret = i915_mutex_lock_interruptible(dev);
                if (ret == 0) {
                        i915_gem_object_flush_cpu_write_domain(obj);
                        mutex_unlock(&dev->struct_mutex);
                }
        }

        i915_gem_object_put(obj);
        return ret;

Hmm, and the struct_mutex there is to protect the obj->base.write_domain.
Maybe cmpxchg...

Anyway, I don't care about swfinish that much, so the above with
put_unlocked for now.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-01 18:22 ` [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
@ 2016-08-04 11:17   ` Joonas Lahtinen
  2016-08-04 11:34     ` Chris Wilson
  2016-08-04 11:41     ` Chris Wilson
  0 siblings, 2 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2215,13 +2215,10 @@ struct drm_i915_gem_object {
>  
>  	atomic_t frontbuffer_bits;
>  
> -	/**
> -	 * Current tiling mode for the object.
> -	 */
> -	unsigned int tiling_mode;
> -
>  	/** Current tiling stride for the object, if it's tiled. */
> -	uint32_t stride;
> +	unsigned int tiling_and_stride;
> +#define TILING_MASK 0x3

No magics and add appropriate BUILD_BUG_ON too when stuffing bits
inside same variable.

#define TILING_MASK (BIT(31 - __builtin_clz(I915_TILING_LAST)) -1)

or something :P

> +#define STRIDE_MASK (~TILING_MASK)
>  
>  	unsigned int has_wc_mmap;
>  	/** Count of VMA actually bound by this object */
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2465,9 +2465,8 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
>  		return false;
>  	}
>  
> -	obj->tiling_mode = plane_config->tiling;
> -	if (obj->tiling_mode == I915_TILING_X)
> -		obj->stride = fb->pitches[0];
> +	if (plane_config->tiling == I915_TILING_X)
> +		obj->tiling_and_stride = fb->pitches[0] | I915_TILING_X;

This is not equivalent code.

>  	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
> @@ -14919,15 +14918,15 @@ static int intel_framebuffer_init(struct drm_device *dev,
>  	if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
>  		/* Enforce that fb modifier and tiling mode match, but only for
>  		 * X-tiled. This is needed for FBC. */
> -		if (!!(obj->tiling_mode == I915_TILING_X) !=
> +		if (!!(i915_gem_object_get_tiling(obj) == I915_TILING_X) !=

A note, !! are redundant as == returns bool already. Could remove while
touching.

Regards, Joonas

>  		    !!(mode_cmd->modifier[0] == I915_FORMAT_MOD_X_TILED)) {
>  			DRM_DEBUG("tiling_mode doesn't match fb modifier\n");
>  			return -EINVAL;
>  		}
>  	} else {

-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/16] drm/i915: Assert that the request hasn't been retired
  2016-08-01 18:22 ` [PATCH 16/16] drm/i915: Assert that the request hasn't been retired Chris Wilson
@ 2016-08-04 11:18   ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> With all callers now not playing tricks with dropping the struct_mutex
> between waiting and retiring, we can assert that the request is ready to
> be retired.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_request.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 8289d31c0ef5..b79ee5a35079 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -170,7 +170,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	struct i915_gem_active *active, *next;
>  
>  	trace_i915_gem_request_retire(request);
> -	list_del_init(&request->link);
> +	list_del(&request->link);
>  
>  	/* We know the GPU must have read the request to have
>  	 * sent us the seqno + interrupt, so use the position
> @@ -228,9 +228,7 @@ void i915_gem_request_retire_upto(struct drm_i915_gem_request *req)
>  	struct drm_i915_gem_request *tmp;
>  
>  	lockdep_assert_held(&req->i915->drm.struct_mutex);
> -
> -	if (list_empty(&req->link))
> -		return;
> +	GEM_BUG_ON(list_empty(&req->link));
>  
>  	do {
>  		tmp = list_first_entry(&engine->request_list,
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-04 11:17   ` Joonas Lahtinen
@ 2016-08-04 11:34     ` Chris Wilson
  2016-08-04 11:36       ` Joonas Lahtinen
  2016-08-04 11:41     ` Chris Wilson
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 11:34 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 02:17:22PM +0300, Joonas Lahtinen wrote:
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -2465,9 +2465,8 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
> >  		return false;
> >  	}
> >  
> > -	obj->tiling_mode = plane_config->tiling;
> > -	if (obj->tiling_mode == I915_TILING_X)
> > -		obj->stride = fb->pitches[0];
> > +	if (plane_config->tiling == I915_TILING_X)
> > +		obj->tiling_and_stride = fb->pitches[0] | I915_TILING_X;
> 
> This is not equivalent code.

Setting tiling mode and not stride is illegal, as is setting stride for
I915_TILING_NONE. Initial tiling_and_stride here is 0 (from object
allocation), so the result is the same.

What did I miss?

> >  	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
> > @@ -14919,15 +14918,15 @@ static int intel_framebuffer_init(struct drm_device *dev,
> >  	if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
> >  		/* Enforce that fb modifier and tiling mode match, but only for
> >  		 * X-tiled. This is needed for FBC. */
> > -		if (!!(obj->tiling_mode == I915_TILING_X) !=
> > +		if (!!(i915_gem_object_get_tiling(obj) == I915_TILING_X) !=
> 
> A note, !! are redundant as == returns bool already. Could remove while
> touching.

Petition Daniel, maybe we can now have I915_TILING_Y support here as
well so the API is not intentionally left incomplete. Grr.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-04 11:34     ` Chris Wilson
@ 2016-08-04 11:36       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-04 at 12:34 +0100, Chris Wilson wrote:
> On Thu, Aug 04, 2016 at 02:17:22PM +0300, Joonas Lahtinen wrote:
> > 
> > > 
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -2465,9 +2465,8 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
> > >  		return false;
> > >  	}
> > >  
> > > -	obj->tiling_mode = plane_config->tiling;
> > > -	if (obj->tiling_mode == I915_TILING_X)
> > > -		obj->stride = fb->pitches[0];
> > > +	if (plane_config->tiling == I915_TILING_X)
> > > +		obj->tiling_and_stride = fb->pitches[0] | I915_TILING_X;
> > This is not equivalent code.
> Setting tiling mode and not stride is illegal, as is setting stride for
> I915_TILING_NONE. Initial tiling_and_stride here is 0 (from object
> allocation), so the result is the same.
> 
> What did I miss?
> 

Patch splitting. If you mix a couple hundred line change of mechanical
changes to some functional improvements, that's really depressing to
review.

Regards, Joonas

> > 
> > > 
> > >  	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
> > > @@ -14919,15 +14918,15 @@ static int intel_framebuffer_init(struct drm_device *dev,
> > >  	if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
> > >  		/* Enforce that fb modifier and tiling mode match, but only for
> > >  		 * X-tiled. This is needed for FBC. */
> > > -		if (!!(obj->tiling_mode == I915_TILING_X) !=
> > > +		if (!!(i915_gem_object_get_tiling(obj) == I915_TILING_X) !=
> > A note, !! are redundant as == returns bool already. Could remove while
> > touching.
> Petition Daniel, maybe we can now have I915_TILING_Y support here as
> well so the API is not intentionally left incomplete. Grr.
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-04 11:17   ` Joonas Lahtinen
  2016-08-04 11:34     ` Chris Wilson
@ 2016-08-04 11:41     ` Chris Wilson
  2016-08-04 12:02       ` Joonas Lahtinen
  1 sibling, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-04 11:41 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 04, 2016 at 02:17:22PM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2215,13 +2215,10 @@ struct drm_i915_gem_object {
> >  
> >  	atomic_t frontbuffer_bits;
> >  
> > -	/**
> > -	 * Current tiling mode for the object.
> > -	 */
> > -	unsigned int tiling_mode;
> > -
> >  	/** Current tiling stride for the object, if it's tiled. */
> > -	uint32_t stride;
> > +	unsigned int tiling_and_stride;
> > +#define TILING_MASK 0x3
> 
> No magics and add appropriate BUILD_BUG_ON too when stuffing bits
> inside same variable.
> 
> #define TILING_MASK (BIT(31 - __builtin_clz(I915_TILING_LAST)) -1)

#define FENCE_MINIMUM_STRIDE 128
#define TILING_MASK (FENCE_MINIMUM_STRIDE-1)
#define STRIDE_MASK ~TILING_MASK

Presumably you can't complain about the hw being full of magic numbers?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl
  2016-08-04 10:42     ` Chris Wilson
@ 2016-08-04 11:47       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-04 at 11:42 +0100, Chris Wilson wrote:
> On Thu, Aug 04, 2016 at 01:36:24PM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > We don't need to incur the overhead of checking whether the object is
> > > pinned prior to changing its madvise. If the object is pinned, the
> > > madvise will not take effect until it is unpinned and so we cannot free
> > > the pages being pointed at by hardware. Marking a pinned object with
> > > allocated pages as DONTNEED will not trigger any undue warnings. The check
> > > is therefore superfluous, and by removing it we can remove a linear walk
> > > over all the vma the object has.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 6 ------
> > >  1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 51ec5cd1c6ca..4b8a391912bc 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3853,11 +3853,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > >  		goto unlock;
> > >  	}
> > >  
> > > -	if (i915_gem_obj_is_pinned(obj)) {
> > > -		ret = -EINVAL;
> > > -		goto out;
> > > -	}
> > > -
> > Does not this change our ABI too?
> Yes. It relaxes an immediate failure condition and enforces it later.
> 

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

But might want to also ping for A-b from Daniel?

Regards, Joonas

> Anyone who tried to purge the scanout object now has their BUG hidden.
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
  2016-08-01 19:28   ` Chris Wilson
@ 2016-08-04 11:50     ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-08-01 at 20:28 +0100, Chris Wilson wrote:
> On Mon, Aug 01, 2016 at 07:22:27PM +0100, Chris Wilson wrote:
> > 
> > The principal motivation for this was to try and eliminate the
> > struct_mutex from i915_gem_suspend - but we still need to hold the mutex
> > current for the i915_gem_context_lost(). (The issue there is that there
> > may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
> > struct_mutex via the stop_machine().) For the moment, enabling last
> > request tracking for the engine, allows us to do busyness checking and
> > waiting without requiring the struct_mutex - which is useful in its own
> > right.
> Couple of mistakes here: stop-engines tweaking still from when this was
> only aiming at making i915_gem_suspend() lockless (now broken out to a
> separate patc) and more importantly, accessing
> dev_priv->mm.interruptible not under any controlling lock. That takes
> passing around bool interruptible and suddenly we have a bigger patch.
> :|

Not sure what to do with this information, will you send a new
revision?

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked()
  2016-08-03 13:43           ` Chris Wilson
@ 2016-08-04 11:51             ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-03 at 14:43 +0100, Chris Wilson wrote:
> On Wed, Aug 03, 2016 at 04:30:35PM +0300, Joonas Lahtinen wrote:
> > 
> > On ke, 2016-08-03 at 13:04 +0100, Chris Wilson wrote:
> > > 
> > > On Wed, Aug 03, 2016 at 12:56:39PM +0100, Chris Wilson wrote:
> > > > 
> > > > 
> > > > static inline struct drm_i915_gem_request *
> > > > i915_gem_active_get_rcu(const struct i915_gem_active *active)
> > > Alternative name would be i915_gem_active_get_unlocked()
> > > (Starting to get too unwieldy.)
> > It's less confusing.
> > 
> > I assume you intend to extend the rcu_read_lock() section?
> Yes, I had intended to. At the moment, the other caller has been removed
> because I need the struct_mutex as an execbuf-barrier so as of now there
> was no value to using RCU there and reverted to simple form.
> 
> I still think it is more flexible to allow the caller to control the
> locking.

Yes, if there's something else to do too. But it's not a biggie with
the function renamed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting
  2016-08-03 13:49     ` Chris Wilson
@ 2016-08-04 11:53       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-03 at 14:49 +0100, Chris Wilson wrote:
> On Wed, Aug 03, 2016 at 04:43:38PM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > Inside the kthread context, we can't be interrupted by signals so
> > > touching the mm.interruptible flag is pointless and wait-request now
> > > consumes EIO itself.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +--------
> > >  1 file changed, 1 insertion(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > index 96ab6161903a..57218cca7e05 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > @@ -84,16 +84,9 @@ static void cancel_userptr(struct work_struct *work)
> > >  	obj->userptr.work = NULL;
> > >  
> > >  	if (obj->pages != NULL) {
> > > -		struct drm_i915_private *dev_priv = to_i915(dev);
> > > -		bool was_interruptible;
> > > -
> > > -		was_interruptible = dev_priv->mm.interruptible;
> > > -		dev_priv->mm.interruptible = false;
> > > -
> > GEM_BUG_ON(dev_priv->mm.interruptible) too much paranoia?
> It's inmaterial at this point whether or not that is set. That BUG is
> something I've considered but never really found a good home. Best idea
> I have is i915_mutex_lock_interruptible() (still it catches the victim
> and not the guilty party, we need i915_mutex_unlock or something).

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset
  2016-08-04  7:30     ` Chris Wilson
@ 2016-08-04 11:57       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 11:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-04 at 08:30 +0100, Chris Wilson wrote:
> On Thu, Aug 04, 2016 at 10:25:42AM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > If we make the observation that mmap-offsets are only released when we
> > > free an object, we can then deduce that the shrinker only creates free
> > > space in the mmap arena indirectly by flushing the request list and
> > > freeing expired objects. If we combine this with the lockless
> > > vma-manager and lockless idling, we can avoid taking our big struct_mutex
> > > until we need to actually free the requests.
> > > 
> > > One side-effect is that we defer the madvise checking until we need the
> > > pages (i.e. the fault handler). This brings us into line with the other
> > > delayed checks (and madvise in general).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Don't necessarily agree with all the "== 0", because it's in the
> > minority but;
> How about if I take this opportunity to start a new life using
> 
> int err;
> if (!err) ...

"int ret;" is quite de facto and if (!ret) too;

$ git grep '!ret\b' | wc -l
3826
$ git grep 'ret == 0\b' | wc -l
1869
$ git grep '!err\b' | wc -l
2064

So I'm inclined towards ret and !ret...

Regards, Joonas

> 
> ?
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl
  2016-08-04 10:02     ` Chris Wilson
@ 2016-08-04 12:00       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 12:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-04 at 11:02 +0100, Chris Wilson wrote:
> On Thu, Aug 04, 2016 at 11:26:04AM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > With a bit of care (and leniency) we can iterate over the object and
> > > wait for previous rendering to complete with judicial use of atomic
> > > reference counting. The ABI requires us to ensure that an active object
> > > is eventually flushed (like the busy-ioctl) which is guaranteed by our
> > > management of requests (i.e. everything that is submitted to hardware is
> > > flushed in the same request). All we have to do is ensure that we can
> > > detect when the requests are complete for reporting when the object is
> > > idle (without triggering ETIME) - this is handled by
> > > __i915_wait_request.
> > > 
> > > The biggest danger in the code is walking the object without holding any
> > > locks. We iterate over the set of last requests and carefully grab a
> > > reference upon it. (If it is changing beneath us, that is the usual
> > > userspace race and even with locking you get the same indeterminate
> > > results.) If the request is unreferenced beneath us, it will be disposed
> > > of into the request cache - so we have to carefully order the retrieval
> > > of the request pointer with its removal, and to do this we employ RCU on
> > > the request cache and upon the last_request pointer tracking.
> > > 
> > > The impact of this is actually quite small - the return to userspace
> > > following the wait was already lockless. What we achieve here is
> > > completing an already finished wait without hitting the struct_mutex,
> > > our hold is quite short and so we are typically just a victim of
> > > contention rather than a cause.
> > > 
> > The commit message seems little bit disconnect with the code, making
> > the patch sound much more complex than it is. Is it up to date? Or
> > maybe parts of this explanation would belong to an earlier patch?
> """
> With a bit of care (and leniency) we can iterate over the object and
> wait for previous rendering to complete with judicial use of atomic
> reference counting. The ABI requires us to ensure that an active object
> is eventually flushed (like the busy-ioctl) which is guaranteed by our
> management of requests (i.e. everything that is submitted to hardware is
> flushed in the same request). All we have to do is ensure that we can
> detect when the requests are complete for reporting when the object is
> idle (without triggering ETIME), locklessly - this is handled by
> i915_gem_active_wait_unlocked().
> 
> The impact of this is actually quite small - the return to userspace
> following the wait was already lockless and so we don't see much gain in 
> latency improvement upon completing the wait. What we do achieve here is
> completing an already finished wait without hitting the struct_mutex,
> our hold is quite short and so we are typically just a victim of
> contention rather than a cause - but it is still one less contention 
> point!
> """

Better, although the explanation could be over
i915_gem_active_wait_unlocked() as a comment?

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer
  2016-08-04 11:41     ` Chris Wilson
@ 2016-08-04 12:02       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-04 12:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-04 at 12:41 +0100, Chris Wilson wrote:
> On Thu, Aug 04, 2016 at 02:17:22PM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -2215,13 +2215,10 @@ struct drm_i915_gem_object {
> > >  
> > >  	atomic_t frontbuffer_bits;
> > >  
> > > -	/**
> > > -	 * Current tiling mode for the object.
> > > -	 */
> > > -	unsigned int tiling_mode;
> > > -
> > >  	/** Current tiling stride for the object, if it's tiled. */
> > > -	uint32_t stride;
> > > +	unsigned int tiling_and_stride;
> > > +#define TILING_MASK 0x3
> > No magics and add appropriate BUILD_BUG_ON too when stuffing bits
> > inside same variable.
> > 
> > #define TILING_MASK (BIT(31 - __builtin_clz(I915_TILING_LAST)) -1)
> #define FENCE_MINIMUM_STRIDE 128
> #define TILING_MASK (FENCE_MINIMUM_STRIDE-1)
> #define STRIDE_MASK ~TILING_MASK
> 
> Presumably you can't complain about the hw being full of magic numbers?

I can live with that, just stuff the BUILD_BUG_ON in there too and
it's;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-01 18:22 ` [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
  2016-08-04 10:25   ` Joonas Lahtinen
@ 2016-08-05  7:05   ` Joonas Lahtinen
  2016-08-05  7:34     ` Chris Wilson
  1 sibling, 1 reply; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-05  7:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Akash Goel

On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> By applying the same logic as for wait-ioctl, we can query whether a
> request has completed without holding struct_mutex. The biggest impact
> system-wide is removing the flush_active and the contention that causes.
> 
> Testcase: igt/gem_busy
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 110 +++++++++++++++++++++++++++++-----------
>  1 file changed, 80 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 43069b05bdd2..f2f70f5ff9f4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3721,49 +3721,99 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
>  	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
>  }
>  
> +static __always_inline unsigned
> +__busy_read_flag(const struct drm_i915_gem_request *request)
> +{
> +	return 0x10000 << request->engine->exec_id;
> +}
> +
> +static __always_inline unsigned int
> +__busy_write_flag(const struct drm_i915_gem_request *request)
> +{
> +	return request->engine->exec_id;

Just realized (to my horror) this is not a flag, it's a bare ID, so
better not call the function _flag, but rather _id?

> +}
> +
> +static __always_inline unsigned
> +__busy_flag(const struct i915_gem_active *active,
> +	    unsigned int (*flag)(const struct drm_i915_gem_request *))
> +{
> +	struct drm_i915_gem_request *request;
> +
> +	request = rcu_dereference(active->request);
> +	if (!request || i915_gem_request_completed(request))
> +		return 0;
> +
> +	return flag(request);
> +}
> +
> +static inline unsigned
> +busy_read_flag(const struct i915_gem_active *active)
> +{
> +	return __busy_flag(active, __busy_read_flag);
> +}
> +
> +static inline unsigned
> +busy_write_flag(const struct i915_gem_active *active)
> +{
> +	return __busy_flag(active, __busy_write_flag);
> +}
> +
>  int
>  i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>  		    struct drm_file *file)
>  {
>  	struct drm_i915_gem_busy *args = data;
>  	struct drm_i915_gem_object *obj;
> -	int ret;
> -
> -	ret = i915_mutex_lock_interruptible(dev);
> -	if (ret)
> -		return ret;
> +	unsigned long active;
>  
>  	obj = i915_gem_object_lookup(file, args->handle);
> -	if (!obj) {
> -		ret = -ENOENT;
> -		goto unlock;
> -	}
> +	if (!obj)
> +		return -ENOENT;
>  
> -	/* Count all active objects as busy, even if they are currently not used
> -	 * by the gpu. Users of this interface expect objects to eventually
> -	 * become non-busy without any further actions.
> -	 */
>  	args->busy = 0;
> -	if (i915_gem_object_is_active(obj)) {
> -		struct drm_i915_gem_request *req;
> -		int i;
> +	active = __I915_BO_ACTIVE(obj);
> +	if (active) {
> +		int idx;
>  
> -		for (i = 0; i < I915_NUM_ENGINES; i++) {
> -			req = i915_gem_active_peek(&obj->last_read[i],
> -						   &obj->base.dev->struct_mutex);
> -			if (req)
> -				args->busy |= 1 << (16 + req->engine->exec_id);
> -		}
> -		req = i915_gem_active_peek(&obj->last_write,
> -					   &obj->base.dev->struct_mutex);
> -		if (req)
> -			args->busy |= req->engine->exec_id;
> +		/* Yes, the lookups are intentionally racy.
> +		 *
> +		 * Even though we guard the pointer lookup by RCU, that only
> +		 * guarantees that the pointer and its contents remain
> +		 * dereferencable and does *not* mean that the request we
> +		 * have is the same as the one being tracked by the object.
> +		 *
> +		 * Consider that we lookup the request just as it is being
> +		 * retired and free. We take a local copy of the pointer,

still s/free/freed/

> +		 * but before we add its engine into the busy set, the other
> +		 * thread reallocates it and assigns it to a task on another
> +		 * engine with a fresh and incomplete seqno.
> +		 *
> +		 * So after we lookup the engine's id, we double check that
> +		 * the active request is the same and only then do we add it
> +		 * into the busy set.
> +		 */
> +		rcu_read_lock();
> +
> +		for_each_active(active, idx)
> +			args->busy |= busy_read_flag(&obj->last_read[idx]);

So you mean this is double check against __I915_BO_ACTIVE, right?

We're getting there, though. With above fixed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-05  7:05   ` Joonas Lahtinen
@ 2016-08-05  7:34     ` Chris Wilson
  2016-08-05  8:06       ` Joonas Lahtinen
  0 siblings, 1 reply; 61+ messages in thread
From: Chris Wilson @ 2016-08-05  7:34 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx, Akash Goel

On Fri, Aug 05, 2016 at 10:05:38AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > By applying the same logic as for wait-ioctl, we can query whether a
> > request has completed without holding struct_mutex. The biggest impact
> > system-wide is removing the flush_active and the contention that causes.
> > 
> > Testcase: igt/gem_busy
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 110 +++++++++++++++++++++++++++++-----------
> >  1 file changed, 80 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 43069b05bdd2..f2f70f5ff9f4 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3721,49 +3721,99 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> >  	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> >  }
> >  
> > +static __always_inline unsigned
> > +__busy_read_flag(const struct drm_i915_gem_request *request)
> > +{
> > +	return 0x10000 << request->engine->exec_id;
> > +}
> > +
> > +static __always_inline unsigned int
> > +__busy_write_flag(const struct drm_i915_gem_request *request)
> > +{
> > +	return request->engine->exec_id;
> 
> Just realized (to my horror) this is not a flag, it's a bare ID, so
> better not call the function _flag, but rather _id?

Bah.

__busy_write_id
__busy_read_flag
__busy_set_if_active

busy_set_active_write_id() { __busy_set_if_active(ptr, __busy_write_id}; )
busy_set_active_read_flag() { __busy_set_if_active(ptr, __busy_read_flag); }

> > +		 * but before we add its engine into the busy set, the other
> > +		 * thread reallocates it and assigns it to a task on another
> > +		 * engine with a fresh and incomplete seqno.
> > +		 *
> > +		 * So after we lookup the engine's id, we double check that
> > +		 * the active request is the same and only then do we add it
> > +		 * into the busy set.
> > +		 */
> > +		rcu_read_lock();
> > +
> > +		for_each_active(active, idx)
> > +			args->busy |= busy_read_flag(&obj->last_read[idx]);
> 
> So you mean this is double check against __I915_BO_ACTIVE, right?

Yes. The ABI guarantees forward progress but __I915_BO_ACTIVE itself
does not, so we confirm each of the active requests with the hardware.

Will add.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl
  2016-08-05  7:34     ` Chris Wilson
@ 2016-08-05  8:06       ` Joonas Lahtinen
  0 siblings, 0 replies; 61+ messages in thread
From: Joonas Lahtinen @ 2016-08-05  8:06 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Akash Goel

On pe, 2016-08-05 at 08:34 +0100, Chris Wilson wrote:
> On Fri, Aug 05, 2016 at 10:05:38AM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > > 
> > > By applying the same logic as for wait-ioctl, we can query whether a
> > > request has completed without holding struct_mutex. The biggest impact
> > > system-wide is removing the flush_active and the contention that causes.
> > > 
> > > Testcase: igt/gem_busy
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 110 +++++++++++++++++++++++++++++-----------
> > >  1 file changed, 80 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 43069b05bdd2..f2f70f5ff9f4 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3721,49 +3721,99 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> > >  	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
> > >  }
> > >  
> > > +static __always_inline unsigned
> > > +__busy_read_flag(const struct drm_i915_gem_request *request)
> > > +{
> > > +	return 0x10000 << request->engine->exec_id;
> > > +}
> > > +
> > > +static __always_inline unsigned int
> > > +__busy_write_flag(const struct drm_i915_gem_request *request)
> > > +{
> > > +	return request->engine->exec_id;
> > Just realized (to my horror) this is not a flag, it's a bare ID, so
> > better not call the function _flag, but rather _id?
> Bah.
> 
> __busy_write_id
> __busy_read_flag
> __busy_set_if_active
> 
> busy_set_active_write_id() { __busy_set_if_active(ptr, __busy_write_id}; )
> busy_set_active_read_flag() { __busy_set_if_active(ptr, __busy_read_flag); }
> 

This would be OK,

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> > 
> > > 
> > > +		 * but before we add its engine into the busy set, the other
> > > +		 * thread reallocates it and assigns it to a task on another
> > > +		 * engine with a fresh and incomplete seqno.
> > > +		 *
> > > +		 * So after we lookup the engine's id, we double check that
> > > +		 * the active request is the same and only then do we add it
> > > +		 * into the busy set.
> > > +		 */
> > > +		rcu_read_lock();
> > > +
> > > +		for_each_active(active, idx)
> > > +			args->busy |= busy_read_flag(&obj->last_read[idx]);
> > So you mean this is double check against __I915_BO_ACTIVE, right?
> Yes. The ABI guarantees forward progress but __I915_BO_ACTIVE itself
> does not, so we confirm each of the active requests with the hardware.
> 
> Will add.
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2016-08-05  8:06 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-01 18:22 Put RCU request lookup to use Chris Wilson
2016-08-01 18:22 ` [PATCH 01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-08-03 11:41   ` Joonas Lahtinen
2016-08-03 11:56     ` Chris Wilson
2016-08-03 12:04       ` Chris Wilson
2016-08-03 13:30         ` Joonas Lahtinen
2016-08-03 13:43           ` Chris Wilson
2016-08-04 11:51             ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 02/16] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
2016-08-03 13:23   ` Joonas Lahtinen
2016-08-03 13:36     ` Chris Wilson
2016-08-03 13:41       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 03/16] drm/i915: Convert non-blocking userptr " Chris Wilson
2016-08-03 13:27   ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 04/16] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
2016-08-03 13:43   ` Joonas Lahtinen
2016-08-03 13:49     ` Chris Wilson
2016-08-04 11:53       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 05/16] drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex Chris Wilson
2016-08-01 19:28   ` Chris Wilson
2016-08-04 11:50     ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 06/16] drm/gem/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
2016-08-04  6:46   ` Joonas Lahtinen
2016-08-04  6:52     ` Chris Wilson
2016-08-01 18:22 ` [PATCH 07/16] drm/i915: Tidy generation of the GTT mmap offset Chris Wilson
2016-08-04  7:25   ` Joonas Lahtinen
2016-08-04  7:30     ` Chris Wilson
2016-08-04 11:57       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 08/16] drm/i915: Remove unused no-shrinker-steal Chris Wilson
2016-08-04  7:26   ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 09/16] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
2016-08-04  7:53   ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-08-04  8:26   ` Joonas Lahtinen
2016-08-04  8:37     ` Chris Wilson
2016-08-04 10:02     ` Chris Wilson
2016-08-04 12:00       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 11/16] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-08-04 10:25   ` Joonas Lahtinen
2016-08-04 10:30     ` Chris Wilson
2016-08-05  7:05   ` Joonas Lahtinen
2016-08-05  7:34     ` Chris Wilson
2016-08-05  8:06       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 12/16] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-08-04 10:32   ` Joonas Lahtinen
2016-08-04 10:48     ` Chris Wilson
2016-08-01 18:22 ` [PATCH 13/16] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-08-04 10:36   ` Joonas Lahtinen
2016-08-04 10:42     ` Chris Wilson
2016-08-04 11:47       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 14/16] drm/i915: Remove locking for get_tiling Chris Wilson
2016-08-04 10:40   ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 15/16] drm/i915: Repack fence tiling mode and stride into a single integer Chris Wilson
2016-08-04 11:17   ` Joonas Lahtinen
2016-08-04 11:34     ` Chris Wilson
2016-08-04 11:36       ` Joonas Lahtinen
2016-08-04 11:41     ` Chris Wilson
2016-08-04 12:02       ` Joonas Lahtinen
2016-08-01 18:22 ` [PATCH 16/16] drm/i915: Assert that the request hasn't been retired Chris Wilson
2016-08-04 11:18   ` Joonas Lahtinen
2016-08-02  5:00 ` ✗ Ro.CI.BAT: failure for series starting with [01/16] drm/i915: Introduce i915_gem_active_wait_unlocked() Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.