All of lore.kernel.org
 help / color / mirror / Atom feed
* RPS tuning
@ 2015-04-27 12:41 Chris Wilson
  2015-04-27 12:41 ` [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level Chris Wilson
                   ` (15 more replies)
  0 siblings, 16 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

I have so many outstanding patches, that my smtp relay reject them for
being a spamhost...

These are the simple patches for further tuning RPS to begin with. 5
trivial/bug fixes, a couple of patches for read-read concurrency, and 
then 8 patches to both limit the number of RPS boosts on systems without
semaphore support (*cough* Broadwell *cough*) and then one to prevent
downclocking whilst clients are still waiting, along with a boost for
severe stalls. Finally the icing on the cake is enabling waitboosting
for Ironlake.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level
  2015-04-27 12:41 RPS tuning Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-29 14:50   ` Tvrtko Ursulin
  2015-04-27 12:41 ` [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture Chris Wilson
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Since the remove of the pin-ioctl, we only care about not changing the
cache level on buffers pinned to the hardware as indicated by
obj->pin_display. So we can safely replace i915_gem_object_is_pinned()
here with a plain obj->pin_display check. During rebinding, we will check
sanity checks in case vma->pin_count is erroneously set.

At the same time, we can micro-optimise GTT mmap() behaviour since we
only need to relinquish the mmaps before Sandybridge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f0a6d03e9ba5..afdb604e4005 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3768,31 +3768,34 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 {
 	struct drm_device *dev = obj->base.dev;
 	struct i915_vma *vma, *next;
+	bool bound = false;
 	int ret;
 
 	if (obj->cache_level == cache_level)
 		return 0;
 
-	if (i915_gem_obj_is_pinned(obj)) {
+	if (obj->pin_display) {
 		DRM_DEBUG("can not change the cache level of pinned objects\n");
 		return -EBUSY;
 	}
 
 	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
+		if (!drm_mm_node_allocated(&vma->node))
+			continue;
+
 		if (!i915_gem_valid_gtt_space(vma, cache_level)) {
 			ret = i915_vma_unbind(vma);
 			if (ret)
 				return ret;
-		}
+		} else
+			bound = true;
 	}
 
-	if (i915_gem_obj_bound_any(obj)) {
+	if (bound) {
 		ret = i915_gem_object_finish_gpu(obj);
 		if (ret)
 			return ret;
 
-		i915_gem_object_finish_gtt(obj);
-
 		/* Before SandyBridge, you could not use tiling or fence
 		 * registers with snooped memory, so relinquish any fences
 		 * currently pointing to our region in the aperture.
@@ -3801,15 +3804,21 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			ret = i915_gem_object_put_fence(obj);
 			if (ret)
 				return ret;
+
+			i915_gem_release_mmap(obj);
 		}
 
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
-			if (drm_mm_node_allocated(&vma->node)) {
-				ret = i915_vma_bind(vma, cache_level,
-						    PIN_UPDATE);
-				if (ret)
-					return ret;
-			}
+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+			if (!drm_mm_node_allocated(&vma->node))
+				continue;
+
+			if (vma->pin_count)
+				return -EBUSY;
+
+			ret = i915_vma_bind(vma, cache_level, PIN_UPDATE);
+			if (ret)
+				return ret;
+		}
 	}
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture
  2015-04-27 12:41 RPS tuning Chris Wilson
  2015-04-27 12:41 ` [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-29 15:05   ` Tvrtko Ursulin
  2015-04-27 12:41 ` [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() Chris Wilson
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

With the removal of the pin_ioctl, we need only consider
obj->pin_display when looking at available aperture space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index afdb604e4005..ec9e36e9ec78 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -157,7 +157,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 	pinned = 0;
 	mutex_lock(&dev->struct_mutex);
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
-		if (i915_gem_obj_is_pinned(obj))
+		if (obj->pin_display)
 			pinned += i915_gem_obj_ggtt_size(obj);
 	mutex_unlock(&dev->struct_mutex);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu()
  2015-04-27 12:41 RPS tuning Chris Wilson
  2015-04-27 12:41 ` [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level Chris Wilson
  2015-04-27 12:41 ` [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-11 16:43   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c      | 26 +++-----------------------
 drivers/gpu/drm/i915/intel_display.c | 13 ++++++++-----
 3 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f3c77ca6b838..c9161c3d1e34 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2791,7 +2791,6 @@ static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
 
 void i915_gem_reset(struct drm_device *dev);
 bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
-int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
 int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
@@ -2813,6 +2812,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
+i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
+			       bool readonly);
+int __must_check
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
 int __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ec9e36e9ec78..cd0ac1b54122 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -40,9 +40,6 @@
 
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
-static __must_check int
-i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
-			       bool readonly);
 static void
 i915_gem_object_retire(struct drm_i915_gem_object *obj);
 
@@ -1399,7 +1396,7 @@ i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
  * Ensures that all rendering to the object has completed and the object is
  * safe to unbind from the GTT or access from the CPU.
  */
-static __must_check int
+int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly)
 {
@@ -3039,7 +3036,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	BUG_ON(obj->pages == NULL);
 
-	ret = i915_gem_object_finish_gpu(obj);
+	ret = i915_gem_object_wait_rendering(obj, false);
 	if (ret)
 		return ret;
 	/* Continue on if we fail due to EIO, the GPU is hung so we
@@ -3792,7 +3789,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	}
 
 	if (bound) {
-		ret = i915_gem_object_finish_gpu(obj);
+		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
 
@@ -3996,23 +3993,6 @@ i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
 	obj->pin_display--;
 }
 
-int
-i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj)
-{
-	int ret;
-
-	if ((obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0)
-		return 0;
-
-	ret = i915_gem_object_wait_rendering(obj, false);
-	if (ret)
-		return ret;
-
-	/* Ensure that we invalidate the GPU's caches and TLBs. */
-	obj->base.read_domains &= ~I915_GEM_GPU_DOMAINS;
-	return 0;
-}
-
 /**
  * Moves a single object to the CPU read, and possibly write domain.
  *
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3094b0807b40..4334e59bf0aa 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3235,27 +3235,30 @@ void intel_finish_reset(struct drm_device *dev)
 	drm_modeset_unlock_all(dev);
 }
 
-static int
+static void
 intel_finish_fb(struct drm_framebuffer *old_fb)
 {
 	struct drm_i915_gem_object *obj = intel_fb_obj(old_fb);
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	bool was_interruptible = dev_priv->mm.interruptible;
 	int ret;
 
 	/* Big Hammer, we also need to ensure that any pending
 	 * MI_WAIT_FOR_EVENT inside a user batch buffer on the
 	 * current scanout is retired before unpinning the old
-	 * framebuffer.
+	 * framebuffer. Note that we rely on userspace rendering
+	 * into the buffer attached to the pipe they are waiting
+	 * on. If not, userspace generates a GPU hang with IPEHR
+	 * point to the MI_WAIT_FOR_EVENT.
 	 *
 	 * This should only fail upon a hung GPU, in which case we
 	 * can safely continue.
 	 */
 	dev_priv->mm.interruptible = false;
-	ret = i915_gem_object_finish_gpu(obj);
+	ret = i915_gem_object_wait_rendering(obj, true);
 	dev_priv->mm.interruptible = was_interruptible;
 
-	return ret;
+	WARN_ON(ret);
 }
 
 static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (2 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-11 16:46   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Synchronising to an object active on the same ring is a no-op, for the
benefit of execbuffer scheduler. However, for CS flips this means that
we can forgo checking whether the last write request of the object is
actually queued and more importantly whether the cache flush for the
write was emitted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4334e59bf0aa..e3399af8e0cd 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10898,6 +10898,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		i915_gem_request_assign(&work->flip_queued_req,
 					obj->last_write_req);
 	} else {
+		if (obj->last_write_req) {
+			ret = i915_gem_check_olr(obj->last_write_req);
+			if (ret)
+				goto cleanup_unpin;
+		}
+
 		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
 						   page_flip_flags);
 		if (ret)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (3 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-11 16:51   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ander Conselvan de Oliveira

As we perform the mmio-flip without any locking and then try to acquire
the struct_mutex prior to dereferencing the request, it is possible for
userspace to queue a new pageflip before the worker can finish clearing
the old state - and then it will clear the new flip request. The result
is that the new flip could be completed before the GPU has finished
rendering.

The bugs stems from removing the seqno checking in
commit 536f5b5e86b225dab94c7ff8061ae482b6077387
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Thu Nov 6 11:03:40 2014 +0200

    drm/i915: Make mmio flip wait for seqno in the work function

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  6 ++++--
 drivers/gpu/drm/i915/intel_display.c | 39 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/intel_drv.h     |  4 ++--
 3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c9161c3d1e34..0a412eaf7a31 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2136,10 +2136,12 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
 	return req ? req->ring : NULL;
 }
 
-static inline void
+static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
-	kref_get(&req->ref);
+	if (req)
+		kref_get(&req->ref);
+	return req;
 }
 
 static inline void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e3399af8e0cd..2ca71eeaf2d7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10651,22 +10651,18 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 
 static void intel_mmio_flip_work_func(struct work_struct *work)
 {
-	struct intel_crtc *crtc =
-		container_of(work, struct intel_crtc, mmio_flip.work);
-	struct intel_mmio_flip *mmio_flip;
+	struct intel_mmio_flip *mmio_flip =
+		container_of(work, struct intel_mmio_flip, work);
 
-	mmio_flip = &crtc->mmio_flip;
-	if (mmio_flip->req)
-		WARN_ON(__i915_wait_request(mmio_flip->req,
-					    crtc->reset_counter,
-					    false, NULL, NULL) != 0);
+	if (mmio_flip->rq)
+		WARN_ON(__i915_wait_request(mmio_flip->rq,
+					    mmio_flip->crtc->reset_counter,
+					    false, NULL, NULL));
 
-	intel_do_mmio_flip(crtc);
-	if (mmio_flip->req) {
-		mutex_lock(&crtc->base.dev->struct_mutex);
-		i915_gem_request_assign(&mmio_flip->req, NULL);
-		mutex_unlock(&crtc->base.dev->struct_mutex);
-	}
+	intel_do_mmio_flip(mmio_flip->crtc);
+
+	i915_gem_request_unreference__unlocked(mmio_flip->rq);
+	kfree(mmio_flip);
 }
 
 static int intel_queue_mmio_flip(struct drm_device *dev,
@@ -10676,12 +10672,17 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 				 struct intel_engine_cs *ring,
 				 uint32_t flags)
 {
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_mmio_flip *mmio_flip;
+
+	mmio_flip = kmalloc(sizeof(*mmio_flip), GFP_KERNEL);
+	if (mmio_flip == NULL)
+		return -ENOMEM;
 
-	i915_gem_request_assign(&intel_crtc->mmio_flip.req,
-				obj->last_write_req);
+	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
+	mmio_flip->crtc = to_intel_crtc(crtc);
 
-	schedule_work(&intel_crtc->mmio_flip.work);
+	INIT_WORK(&mmio_flip->work, intel_mmio_flip_work_func);
+	schedule_work(&mmio_flip->work);
 
 	return 0;
 }
@@ -13669,8 +13670,6 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
 	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
 	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
 
-	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
-
 	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
 
 	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 23a42a40abae..7f8cce797ba2 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -468,8 +468,9 @@ struct intel_pipe_wm {
 };
 
 struct intel_mmio_flip {
-	struct drm_i915_gem_request *req;
 	struct work_struct work;
+	struct drm_i915_gem_request *rq;
+	struct intel_crtc *crtc;
 };
 
 struct skl_pipe_wm {
@@ -554,7 +555,6 @@ struct intel_crtc {
 	} wm;
 
 	int scanline_offset;
-	struct intel_mmio_flip mmio_flip;
 
 	struct intel_crtc_atomic_commit atomic;
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (4 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-29 13:51   ` Tvrtko Ursulin
  2015-04-27 12:41 ` [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Lionel Landwerlin

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  16 +-
 drivers/gpu/drm/i915/i915_drv.h         |  19 +-
 drivers/gpu/drm/i915/i915_gem.c         | 540 ++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_context.c |   2 -
 drivers/gpu/drm/i915/i915_gem_debug.c   |  92 ++----
 drivers/gpu/drm/i915/i915_gpu_error.c   |  19 +-
 drivers/gpu/drm/i915/intel_display.c    |   6 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  19 +-
 drivers/gpu/drm/i915/intel_overlay.c    |   2 -
 drivers/gpu/drm/i915/intel_ringbuffer.c |  28 +-
 10 files changed, 417 insertions(+), 326 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9c2b9e450799..ab55641a0b78 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -120,10 +120,13 @@ static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	struct intel_engine_cs *ring;
 	struct i915_vma *vma;
 	int pin_count = 0;
+	int i;
 
-	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
+	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
 		   &obj->base,
 		   obj->active ? "*" : " ",
 		   get_pin_flag(obj),
@@ -131,8 +134,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   get_global_flag(obj),
 		   obj->base.size / 1024,
 		   obj->base.read_domains,
-		   obj->base.write_domain,
-		   i915_gem_request_get_seqno(obj->last_read_req),
+		   obj->base.write_domain);
+	for_each_ring(ring, dev_priv, i)
+		seq_printf(m, "%x ",
+				i915_gem_request_get_seqno(obj->last_read_req[i]));
+	seq_printf(m, "] %x %x%s%s%s",
 		   i915_gem_request_get_seqno(obj->last_write_req),
 		   i915_gem_request_get_seqno(obj->last_fenced_req),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
@@ -169,9 +175,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		*t = '\0';
 		seq_printf(m, " (%s mappable)", s);
 	}
-	if (obj->last_read_req != NULL)
+	if (obj->last_write_req != NULL)
 		seq_printf(m, " (%s)",
-			   i915_gem_request_get_ring(obj->last_read_req)->name);
+			   i915_gem_request_get_ring(obj->last_write_req)->name);
 	if (obj->frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0a412eaf7a31..caee59bf94ba 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -508,7 +508,7 @@ struct drm_i915_error_state {
 	struct drm_i915_error_buffer {
 		u32 size;
 		u32 name;
-		u32 rseqno, wseqno;
+		u32 rseqno[I915_NUM_RINGS], wseqno;
 		u32 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
@@ -1914,7 +1914,7 @@ struct drm_i915_gem_object {
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	struct list_head ring_list;
+	struct list_head ring_list[I915_NUM_RINGS];
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
@@ -1925,7 +1925,7 @@ struct drm_i915_gem_object {
 	 * rendering and so a non-zero seqno), and is not set if it i s on
 	 * inactive (ready to be unbound) list.
 	 */
-	unsigned int active:1;
+	unsigned int active:I915_NUM_RINGS;
 
 	/**
 	 * This is set if the object has been written to since last bound
@@ -1996,8 +1996,17 @@ struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/** Breadcrumb of last rendering to the buffer. */
-	struct drm_i915_gem_request *last_read_req;
+	/** Breadcrumb of last rendering to the buffer.
+	 * There can only be one writer, but we allow for multiple readers.
+	 * If there is a writer that necessarily implies that all other
+	 * read requests are complete - but we may only be lazily clearing
+	 * the read requests. A read request is naturally the most recent
+	 * request on a ring, so we may have two different write and read
+	 * requests on one ring where the write request is older than the
+	 * read request. This allows for the CPU to read from an active
+	 * buffer by only waiting for the write to complete.
+	 * */
+	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
 	struct drm_i915_gem_request *last_write_req;
 	/** Breadcrumb of last fenced GPU access to the buffer. */
 	struct drm_i915_gem_request *last_fenced_req;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cd0ac1b54122..d60eca03e306 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,11 +38,14 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+#define RQ_BUG_ON(expr)
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj);
-
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
+static void
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
 static void i915_gem_write_fence(struct drm_device *dev, int reg,
 				 struct drm_i915_gem_object *obj);
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
@@ -517,8 +520,6 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 		ret = i915_gem_object_wait_rendering(obj, true);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 
 	ret = i915_gem_object_get_pages(obj);
@@ -938,8 +939,6 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 	/* Same trick applies to invalidate partially written cachelines read
 	 * before writing. */
@@ -1238,6 +1237,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
+	if (list_empty(&req->list))
+		return 0;
+
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
@@ -1337,6 +1339,63 @@ out:
 	return ret;
 }
 
+static inline void
+i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+{
+	struct drm_i915_file_private *file_priv = request->file_priv;
+
+	if (!file_priv)
+		return;
+
+	spin_lock(&file_priv->mm.lock);
+	list_del(&request->client_list);
+	request->file_priv = NULL;
+	spin_unlock(&file_priv->mm.lock);
+}
+
+static void i915_gem_request_retire(struct drm_i915_gem_request *request)
+{
+	trace_i915_gem_request_retire(request);
+
+	/* We know the GPU must have read the request to have
+	 * sent us the seqno + interrupt, so use the position
+	 * of tail of the request to update the last known position
+	 * of the GPU head.
+	 *
+	 * Note this requires that we are always called in request
+	 * completion order.
+	 */
+	request->ringbuf->last_retired_head = request->postfix;
+
+	list_del_init(&request->list);
+	i915_gem_request_remove_from_client(request);
+
+	put_pid(request->pid);
+
+	i915_gem_request_unreference(request);
+}
+
+static void
+__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *engine = rq->ring;
+	struct drm_i915_gem_request *tmp;
+
+	lockdep_assert_held(&engine->dev->struct_mutex);
+
+	if (list_empty(&rq->list))
+		return;
+
+	do {
+		tmp = list_first_entry(&engine->request_list,
+				       typeof(*tmp), list);
+
+		i915_gem_request_retire(tmp);
+	} while (tmp != rq);
+
+	WARN_ON(i915_verify_lists(engine->dev));
+}
+
 /**
  * Waits for a request to be signaled, and cleans up the
  * request and object lists appropriately for that event.
@@ -1347,7 +1406,6 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	struct drm_device *dev;
 	struct drm_i915_private *dev_priv;
 	bool interruptible;
-	unsigned reset_counter;
 	int ret;
 
 	BUG_ON(req == NULL);
@@ -1366,29 +1424,13 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
-	ret = __i915_wait_request(req, reset_counter,
+	ret = __i915_wait_request(req,
+				  atomic_read(&dev_priv->gpu_error.reset_counter),
 				  interruptible, NULL, NULL);
-	i915_gem_request_unreference(req);
-	return ret;
-}
-
-static int
-i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
-{
-	if (!obj->active)
-		return 0;
-
-	/* Manually manage the write flush as we may have not yet
-	 * retired the buffer.
-	 *
-	 * Note that the last_write_req is always the earlier of
-	 * the two (read/write) requests, so if we haved successfully waited,
-	 * we know we have passed the last write.
-	 */
-	i915_gem_request_assign(&obj->last_write_req, NULL);
+	if (ret)
+		return ret;
 
+	__i915_gem_request_retire__upto(req);
 	return 0;
 }
 
@@ -1400,18 +1442,52 @@ int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly)
 {
-	struct drm_i915_gem_request *req;
-	int ret;
+	int ret, i;
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
-	ret = i915_wait_request(req);
-	if (ret)
-		return ret;
+	if (readonly) {
+		if (obj->last_write_req != NULL) {
+			ret = i915_wait_request(obj->last_write_req);
+			if (ret)
+				return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+			i = obj->last_write_req->ring->id;
+			if (obj->last_read_req[i] == obj->last_write_req)
+				i915_gem_object_retire__read(obj, i);
+			else
+				i915_gem_object_retire__write(obj);
+		}
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			if (obj->last_read_req[i] == NULL)
+				continue;
+
+			ret = i915_wait_request(obj->last_read_req[i]);
+			if (ret)
+				return ret;
+
+			i915_gem_object_retire__read(obj, i);
+		}
+		RQ_BUG_ON(obj->active);
+	}
+
+	return 0;
+}
+
+static void
+i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
+			       struct drm_i915_gem_request *rq)
+{
+	int ring = rq->ring->id;
+
+	if (obj->last_read_req[ring] == rq)
+		i915_gem_object_retire__read(obj, ring);
+	else if (obj->last_write_req == rq)
+		i915_gem_object_retire__write(obj);
+
+	__i915_gem_request_retire__upto(rq);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
@@ -1422,37 +1498,66 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 					    struct drm_i915_file_private *file_priv,
 					    bool readonly)
 {
-	struct drm_i915_gem_request *req;
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret;
+	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 	BUG_ON(!dev_priv->mm.interruptible);
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_olr(req);
-	if (ret)
-		return ret;
-
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	if (readonly) {
+		struct drm_i915_gem_request *rq;
+
+		rq = obj->last_write_req;
+		if (rq == NULL)
+			return 0;
+
+		ret = i915_gem_check_olr(rq);
+		if (ret)
+			goto err;
+
+		requests[n++] = i915_gem_request_reference(rq);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			struct drm_i915_gem_request *rq;
+
+			rq = obj->last_read_req[i];
+			if (rq == NULL)
+				continue;
+
+			ret = i915_gem_check_olr(rq);
+			if (ret)
+				goto err;
+
+			requests[n++] = i915_gem_request_reference(rq);
+		}
+	}
+
 	mutex_unlock(&dev->struct_mutex);
-	ret = __i915_wait_request(req, reset_counter, true, NULL, file_priv);
+	for (i = 0; ret == 0 && i < n; i++)
+		ret = __i915_wait_request(requests[i], reset_counter, true,
+					  NULL, file_priv);
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_request_unreference(req);
-	if (ret)
-		return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+err:
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			i915_gem_object_retire_request(obj, requests[i]);
+		i915_gem_request_unreference(requests[i]);
+	}
+
+	return ret;
 }
 
 /**
@@ -2192,78 +2297,58 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static void
-i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
-			       struct intel_engine_cs *ring)
+void i915_vma_move_to_active(struct i915_vma *vma,
+			     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req;
-	struct intel_engine_cs *old_ring;
-
-	BUG_ON(ring == NULL);
-
-	req = intel_ring_get_request(ring);
-	old_ring = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (old_ring != ring && obj->last_write_req) {
-		/* Keep the request relative to the current ring */
-		i915_gem_request_assign(&obj->last_write_req, req);
-	}
+	struct drm_i915_gem_object *obj = vma->obj;
 
 	/* Add a reference if we're newly entering the active list. */
-	if (!obj->active) {
+	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
-		obj->active = 1;
-	}
+	obj->active |= intel_ring_flag(ring);
 
-	list_move_tail(&obj->ring_list, &ring->active_list);
+	list_move_tail(&obj->ring_list[ring->id], &ring->active_list);
+	i915_gem_request_assign(&obj->last_read_req[ring->id],
+				intel_ring_get_request(ring));
 
-	i915_gem_request_assign(&obj->last_read_req, req);
+	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
 
-void i915_vma_move_to_active(struct i915_vma *vma,
-			     struct intel_engine_cs *ring)
+static void
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	list_move_tail(&vma->mm_list, &vma->vm->active_list);
-	return i915_gem_object_move_to_active(vma->obj, ring);
+	RQ_BUG_ON(obj->last_write_req == NULL);
+	RQ_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
+
+	i915_gem_request_assign(&obj->last_write_req, NULL);
+	intel_fb_obj_flush(obj, true);
 }
 
 static void
-i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS);
-	BUG_ON(!obj->active);
+	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
+	RQ_BUG_ON(!(obj->active & (1 << ring)));
+
+	list_del_init(&obj->ring_list[ring]);
+	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+
+	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
+		i915_gem_object_retire__write(obj);
+
+	obj->active &= ~(1 << ring);
+	if (obj->active)
+		return;
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (!list_empty(&vma->mm_list))
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
-	intel_fb_obj_flush(obj, true);
-
-	list_del_init(&obj->ring_list);
-
-	i915_gem_request_assign(&obj->last_read_req, NULL);
-	i915_gem_request_assign(&obj->last_write_req, NULL);
-	obj->base.write_domain = 0;
-
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
-
-	obj->active = 0;
 	drm_gem_object_unreference(&obj->base);
-
-	WARN_ON(i915_verify_lists(dev));
-}
-
-static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj)
-{
-	if (obj->last_read_req == NULL)
-		return;
-
-	if (i915_gem_request_completed(obj->last_read_req, true))
-		i915_gem_object_move_to_inactive(obj);
 }
 
 static int
@@ -2440,20 +2525,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
-{
-	struct drm_i915_file_private *file_priv = request->file_priv;
-
-	if (!file_priv)
-		return;
-
-	spin_lock(&file_priv->mm.lock);
-	list_del(&request->client_list);
-	request->file_priv = NULL;
-	spin_unlock(&file_priv->mm.lock);
-}
-
 static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
 				   const struct intel_context *ctx)
 {
@@ -2499,16 +2570,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_free_request(struct drm_i915_gem_request *request)
-{
-	list_del(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	put_pid(request->pid);
-
-	i915_gem_request_unreference(request);
-}
-
 void i915_gem_request_free(struct kref *req_ref)
 {
 	struct drm_i915_gem_request *req = container_of(req_ref,
@@ -2609,9 +2670,9 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		obj = list_first_entry(&ring->active_list,
 				       struct drm_i915_gem_object,
-				       ring_list);
+				       ring_list[ring->id]);
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	/*
@@ -2647,7 +2708,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 					   struct drm_i915_gem_request,
 					   list);
 
-		i915_gem_free_request(request);
+		i915_gem_request_retire(request);
 	}
 
 	/* This may not have been flushed before the reset, so clean it now */
@@ -2695,6 +2756,8 @@ void i915_gem_reset(struct drm_device *dev)
 	i915_gem_context_reset(dev);
 
 	i915_gem_restore_fences(dev);
+
+	WARN_ON(i915_verify_lists(dev));
 }
 
 /**
@@ -2703,11 +2766,11 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
-	if (list_empty(&ring->request_list))
-		return;
-
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	if (list_empty(&ring->active_list))
+		return;
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2723,16 +2786,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		if (!i915_gem_request_completed(request, true))
 			break;
 
-		trace_i915_gem_request_retire(request);
-
-		/* We know the GPU must have read the request to have
-		 * sent us the seqno + interrupt, so use the position
-		 * of tail of the request to update the last known position
-		 * of the GPU head.
-		 */
-		request->ringbuf->last_retired_head = request->postfix;
-
-		i915_gem_free_request(request);
+		i915_gem_request_retire(request);
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
@@ -2744,12 +2798,12 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		obj = list_first_entry(&ring->active_list,
 				      struct drm_i915_gem_object,
-				      ring_list);
+				      ring_list[ring->id]);
 
-		if (!i915_gem_request_completed(obj->last_read_req, true))
+		if (!list_empty(&obj->last_read_req[ring->id]->list))
 			break;
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	if (unlikely(ring->trace_irq_req &&
@@ -2844,17 +2898,30 @@ i915_gem_idle_work_handler(struct work_struct *work)
 static int
 i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
-	struct intel_engine_cs *ring;
-	int ret;
+	int ret, i;
+
+	if (!obj->active)
+		return 0;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_request *rq;
+
+		rq = obj->last_read_req[i];
+		if (rq == NULL)
+			continue;
 
-	if (obj->active) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+		if (list_empty(&rq->list))
+			goto retire;
 
-		ret = i915_gem_check_olr(obj->last_read_req);
+		ret = i915_gem_check_olr(rq);
 		if (ret)
 			return ret;
 
-		i915_gem_retire_requests_ring(ring);
+		if (i915_gem_request_completed(rq, true)) {
+			__i915_gem_request_retire__upto(rq);
+retire:
+			i915_gem_object_retire__read(obj, i);
+		}
 	}
 
 	return 0;
@@ -2888,9 +2955,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *req;
+	struct drm_i915_gem_request *req[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret = 0;
+	int i, n = 0;
+	int ret;
 
 	if (args->flags != 0)
 		return -EINVAL;
@@ -2910,11 +2978,9 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (ret)
 		goto out;
 
-	if (!obj->active || !obj->last_read_req)
+	if (!obj->active)
 		goto out;
 
-	req = obj->last_read_req;
-
 	/* Do this after OLR check to make sure we make forward progress polling
 	 * on this IOCTL with a timeout == 0 (like busy ioctl)
 	 */
@@ -2925,13 +2991,23 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	drm_gem_object_unreference(&obj->base);
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		if (obj->last_read_req[i] == NULL)
+			continue;
+
+		req[n++] = i915_gem_request_reference(obj->last_read_req[i]);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __i915_wait_request(req, reset_counter, true,
-				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-				  file->driver_priv);
-	i915_gem_request_unreference__unlocked(req);
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			ret = __i915_wait_request(req[i], reset_counter, true,
+						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
+						  file->driver_priv);
+		i915_gem_request_unreference__unlocked(req[i]);
+	}
 	return ret;
 
 out:
@@ -2940,6 +3016,56 @@ out:
 	return ret;
 }
 
+static int
+__i915_gem_object_sync(struct drm_i915_gem_object *obj,
+		       struct intel_engine_cs *to,
+		       struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *from;
+	int ret;
+
+	from = i915_gem_request_get_ring(rq);
+	if (to == from)
+		return 0;
+
+	if (i915_gem_request_completed(rq, true))
+		return 0;
+
+	ret = i915_gem_check_olr(rq);
+	if (ret)
+		return ret;
+
+	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+		ret = __i915_wait_request(rq,
+					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
+					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
+		if (ret)
+			return ret;
+
+		i915_gem_object_retire_request(obj, rq);
+	} else {
+		int idx = intel_ring_sync_index(from, to);
+		u32 seqno = i915_gem_request_get_seqno(rq);
+
+		if (seqno <= from->semaphore.sync_seqno[idx])
+			return 0;
+
+		trace_i915_gem_ring_sync_to(from, to, rq);
+		ret = to->semaphore.sync_to(to, from, seqno);
+		if (ret)
+			return ret;
+
+		/* We use last_read_req because sync_to()
+		 * might have just caused seqno wrap under
+		 * the radar.
+		 */
+		from->semaphore.sync_seqno[idx] =
+			i915_gem_request_get_seqno(obj->last_read_req[from->id]);
+	}
+
+	return 0;
+}
+
 /**
  * i915_gem_object_sync - sync an object to a ring.
  *
@@ -2948,7 +3074,17 @@ out:
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
- * rather than a particular GPU ring.
+ * rather than a particular GPU ring. Conceptually we serialise writes
+ * between engines inside the GPU. We only allow on engine to write
+ * into a buffer at any time, but multiple readers. To ensure each has
+ * a coherent view of memory, we must:
+ *
+ * - If there is an outstanding write request to the object, the new
+ *   request must wait for it to complete (either CPU or in hw, requests
+ *   on the same ring will be naturally ordered).
+ *
+ * - If we are a write request (pending_write_domain is set), the new
+ *   request must wait for outstanding read requests to complete.
  *
  * Returns 0 if successful, else propagates up the lower layer error.
  */
@@ -2956,41 +3092,32 @@ int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		     struct intel_engine_cs *to)
 {
-	struct intel_engine_cs *from;
-	u32 seqno;
-	int ret, idx;
-
-	from = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (from == NULL || to == from)
-		return 0;
+	const bool readonly = obj->base.pending_write_domain == 0;
+	struct drm_i915_gem_request *rq[I915_NUM_RINGS];
+	int ret, i, n;
 
-	if (to == NULL || !i915_semaphore_is_enabled(obj->base.dev))
-		return i915_gem_object_wait_rendering(obj, false);
-
-	idx = intel_ring_sync_index(from, to);
-
-	seqno = i915_gem_request_get_seqno(obj->last_read_req);
-	/* Optimization: Avoid semaphore sync when we are sure we already
-	 * waited for an object with higher seqno */
-	if (seqno <= from->semaphore.sync_seqno[idx])
+	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_olr(obj->last_read_req);
-	if (ret)
-		return ret;
+	if (to == NULL)
+		return i915_gem_object_wait_rendering(obj, readonly);
 
-	trace_i915_gem_ring_sync_to(from, to, obj->last_read_req);
-	ret = to->semaphore.sync_to(to, from, seqno);
-	if (!ret)
-		/* We use last_read_req because sync_to()
-		 * might have just caused seqno wrap under
-		 * the radar.
-		 */
-		from->semaphore.sync_seqno[idx] =
-				i915_gem_request_get_seqno(obj->last_read_req);
+	n = 0;
+	if (readonly) {
+		if (obj->last_write_req)
+			rq[n++] = obj->last_write_req;
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++)
+			if (obj->last_read_req[i])
+				rq[n++] = obj->last_read_req[i];
+	}
+	for (i = 0; i < n; i++) {
+		ret = __i915_gem_object_sync(obj, to, rq[i]);
+		if (ret)
+			return ret;
+	}
 
-	return ret;
+	return 0;
 }
 
 static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
@@ -3075,10 +3202,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
 	if (list_empty(&obj->vma_list)) {
-		/* Throw away the active reference before
-		 * moving to the unbound list. */
-		i915_gem_object_retire(obj);
-
 		i915_gem_gtt_finish_object(obj);
 		list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
 	}
@@ -3111,6 +3234,7 @@ int i915_gpu_idle(struct drm_device *dev)
 			return ret;
 	}
 
+	WARN_ON(i915_verify_lists(dev));
 	return 0;
 }
 
@@ -3707,8 +3831,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
-
 	/* Flush and acquire obj->pages so that we are coherent through
 	 * direct access in memory with previous cached writes through
 	 * shmemfs and that our cache domain tracking remains valid.
@@ -3924,11 +4046,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	u32 old_read_domains, old_write_domain;
 	int ret;
 
-	if (pipelined != i915_gem_request_get_ring(obj->last_read_req)) {
-		ret = i915_gem_object_sync(obj, pipelined);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_object_sync(obj, pipelined);
+	if (ret)
+		return ret;
 
 	/* Mark the pin_display early so that we account for the
 	 * display coherency whilst setting up the cache domains.
@@ -4012,7 +4132,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
 	i915_gem_object_flush_gtt_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -4300,15 +4419,15 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * necessary flushes here.
 	 */
 	ret = i915_gem_object_flush_active(obj);
+	if (ret)
+		goto unref;
 
-	args->busy = obj->active;
-	if (obj->last_read_req) {
-		struct intel_engine_cs *ring;
-		BUILD_BUG_ON(I915_NUM_RINGS > 16);
-		ring = i915_gem_request_get_ring(obj->last_read_req);
-		args->busy |= intel_ring_flag(ring) << 16;
-	}
+	BUILD_BUG_ON(I915_NUM_RINGS > 16);
+	args->busy = obj->active << 16;
+	if (obj->last_write_req)
+		args->busy |= obj->last_write_req->ring->id;
 
+unref:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
@@ -4382,8 +4501,11 @@ unlock:
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			  const struct drm_i915_gem_object_ops *ops)
 {
+	int i;
+
 	INIT_LIST_HEAD(&obj->global_list);
-	INIT_LIST_HEAD(&obj->ring_list);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		INIT_LIST_HEAD(&obj->ring_list[i]);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fa8eed57838d..477355101053 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -748,8 +748,6 @@ static int do_switch(struct intel_engine_cs *ring,
 		 * swapped, but there is no way to do that yet.
 		 */
 		from->legacy_hw_ctx.rcs_state->dirty = 1;
-		BUG_ON(i915_gem_request_get_ring(
-			from->legacy_hw_ctx.rcs_state->last_read_req) != ring);
 
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
index f462d1b51d97..17299d04189f 100644
--- a/drivers/gpu/drm/i915/i915_gem_debug.c
+++ b/drivers/gpu/drm/i915/i915_gem_debug.c
@@ -34,82 +34,34 @@ int
 i915_verify_lists(struct drm_device *dev)
 {
 	static int warned;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *ring;
 	int err = 0;
+	int i;
 
 	if (warned)
 		return 0;
 
-	list_for_each_entry(obj, &dev_priv->render_ring.active_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed render active %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid render active %p (a %d r %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.read_domains);
-			err++;
-		} else if (obj->base.write_domain && list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid render active %p (w %x, gwl %d)\n",
-				  obj,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.flushing_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed flushing %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0 ||
-			   list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid flushing %p (a %d w %x gwl %d)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.gpu_write_list, gpu_write_list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed gpu write %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid gpu write %p (a %d w %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain);
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &i915_gtt_vm->inactive_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed inactive %p\n", obj);
-			err++;
-			break;
-		} else if (obj->pin_count || obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS)) {
-			DRM_ERROR("invalid inactive %p (p %d a %d w %x)\n",
-				  obj,
-				  obj->pin_count, obj->active,
-				  obj->base.write_domain);
-			err++;
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
+			if (obj->base.dev != dev ||
+			    !atomic_read(&obj->base.refcount.refcount)) {
+				DRM_ERROR("%s: freed active obj %p\n",
+					  ring->name, obj);
+				err++;
+				break;
+			} else if (!obj->active ||
+				   obj->last_read_req[ring->id] == NULL) {
+				DRM_ERROR("%s: invalid active obj %p\n",
+					  ring->name, obj);
+				err++;
+			} else if (obj->base.write_domain) {
+				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
+					  ring->name,
+					  obj, obj->base.write_domain);
+				err++;
+			}
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index ac22614dbb0e..17dc2fcaba10 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -192,15 +192,20 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 				struct drm_i915_error_buffer *err,
 				int count)
 {
+	int i;
+
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x %x %x",
+		err_printf(m, "    %08x %8u %02x %02x [ ",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
-			   err->write_domain,
-			   err->rseqno, err->wseqno);
+			   err->write_domain);
+		for (i = 0; i < I915_NUM_RINGS; i++)
+			err_printf(m, "%02x ", err->rseqno[i]);
+
+		err_printf(m, "] %02x", err->wseqno);
 		err_puts(m, pin_flag(err->pinned));
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
@@ -680,10 +685,12 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
+	int i;
 
 	err->size = obj->base.size;
 	err->name = obj->base.name;
-	err->rseqno = i915_gem_request_get_seqno(obj->last_read_req);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read_req[i]);
 	err->wseqno = i915_gem_request_get_seqno(obj->last_write_req);
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
@@ -696,8 +703,8 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
-	err->ring = obj->last_read_req ?
-			i915_gem_request_get_ring(obj->last_read_req)->id : -1;
+	err->ring = obj->last_write_req ?
+			i915_gem_request_get_ring(obj->last_write_req)->id : -1;
 	err->cache_level = obj->cache_level;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 2ca71eeaf2d7..104c0fcb575b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10551,7 +10551,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 	else if (i915.enable_execlists)
 		return true;
 	else
-		return ring != i915_gem_request_get_ring(obj->last_read_req);
+		return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc)
@@ -10867,7 +10867,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
 		ring = &dev_priv->ring[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+		ring = i915_gem_request_get_ring(obj->last_write_req);
 		if (ring == NULL || ring->id != RCS)
 			ring = &dev_priv->ring[BCS];
 	} else {
@@ -10883,7 +10883,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 */
 	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state,
-					 mmio_flip ? i915_gem_request_get_ring(obj->last_read_req) : ring);
+					 mmio_flip ? i915_gem_request_get_ring(obj->last_write_req) : ring);
 	if (ret)
 		goto cleanup_pending;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 732fd633e73a..6b9c12e807a8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -673,7 +673,8 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
 	struct drm_i915_gem_request *request;
-	int ret, new_space;
+	unsigned space;
+	int ret;
 
 	if (intel_ring_space(ringbuf) >= bytes)
 		return 0;
@@ -684,14 +685,13 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 		 * from multiple ringbuffers. Here, we must ignore any that
 		 * aren't from the ringbuffer we're considering.
 		 */
-		struct intel_context *ctx = request->ctx;
-		if (ctx->engine[ring->id].ringbuf != ringbuf)
+		if (request->ringbuf != ringbuf)
 			continue;
 
 		/* Would completion of this request free enough space? */
-		new_space = __intel_ring_space(request->postfix, ringbuf->tail,
-				       ringbuf->size);
-		if (new_space >= bytes)
+		space = __intel_ring_space(request->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= bytes)
 			break;
 	}
 
@@ -702,11 +702,8 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
-	WARN_ON(intel_ring_space(ringbuf) < new_space);
-
-	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
+	ringbuf->space = space;
+	return 0;
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 5fd2d5ac02e2..25c8ec697da1 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -228,7 +228,6 @@ static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
 	ret = i915_wait_request(overlay->last_flip_req);
 	if (ret)
 		return ret;
-	i915_gem_retire_requests(dev);
 
 	i915_gem_request_assign(&overlay->last_flip_req, NULL);
 	return 0;
@@ -376,7 +375,6 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 	ret = i915_wait_request(overlay->last_flip_req);
 	if (ret)
 		return ret;
-	i915_gem_retire_requests(overlay->dev);
 
 	if (overlay->flip_tail)
 		overlay->flip_tail(overlay);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index de8c0747aaef..3004fd87f1c6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2081,15 +2081,16 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_request *request;
-	int ret, new_space;
+	unsigned space;
+	int ret;
 
 	if (intel_ring_space(ringbuf) >= n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		new_space = __intel_ring_space(request->postfix, ringbuf->tail,
-				       ringbuf->size);
-		if (new_space >= n)
+		space = __intel_ring_space(request->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= n)
 			break;
 	}
 
@@ -2100,10 +2101,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
-	WARN_ON(intel_ring_space(ringbuf) < new_space);
-
+	ringbuf->space = space;
 	return 0;
 }
 
@@ -2132,7 +2130,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 
 int intel_ring_idle(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req;
+	struct drm_i915_gem_request *rq;
 	int ret;
 
 	/* We need to add any requests required to flush the objects and ring */
@@ -2146,11 +2144,15 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 	if (list_empty(&ring->request_list))
 		return 0;
 
-	req = list_entry(ring->request_list.prev,
-			   struct drm_i915_gem_request,
-			   list);
+	rq = list_entry(ring->request_list.prev,
+			struct drm_i915_gem_request,
+			list);
 
-	return i915_wait_request(req);
+	/* Make sure we do not trigger any retires */
+	return __i915_wait_request(rq,
+				   atomic_read(&to_i915(ring->dev)->gpu_error.reset_counter),
+				   to_i915(ring->dev)->mm.interruptible,
+				   NULL, NULL);
 }
 
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (5 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-29 14:03   ` Tvrtko Ursulin
  2015-04-27 12:41 ` [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info Chris Wilson
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

This trims a little overhead from the common case of not needing to
synchronize between rings.

v2: execlists is special and likes to duplicate code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++---
 drivers/gpu/drm/i915/intel_lrc.c           |  9 ++++++---
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 45d74da7f6c4..4fe09568089b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -889,6 +889,7 @@ static int
 i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 				struct list_head *vmas)
 {
+	const unsigned other_rings = ~intel_ring_flag(ring);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -896,9 +897,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		ret = i915_gem_object_sync(obj, ring);
-		if (ret)
-			return ret;
+
+		if (obj->active & other_rings) {
+			ret = i915_gem_object_sync(obj, ring);
+			if (ret)
+				return ret;
+		}
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
 			flush_chipset |= i915_gem_clflush_object(obj, false);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6b9c12e807a8..7b7cad017aa0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -622,6 +622,7 @@ static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
 				 struct list_head *vmas)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
+	const unsigned other_rings = ~intel_ring_flag(ring);
 	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
@@ -630,9 +631,11 @@ static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
-		ret = i915_gem_object_sync(obj, ring);
-		if (ret)
-			return ret;
+		if (obj->active & other_rings) {
+			ret = i915_gem_object_sync(obj, ring);
+			if (ret)
+				return ret;
+		}
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
 			flush_chipset |= i915_gem_clflush_object(obj, false);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (6 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-04 14:36   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Expose some more of our internal RPS bookkeeping for debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ab55641a0b78..2cbb3e9266f0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1217,12 +1217,17 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 			   GEN6_CURBSYTAVG_MASK);
 		seq_printf(m, "RP PREV UP: %dus\n", rpprevup &
 			   GEN6_CURBSYTAVG_MASK);
+		seq_printf(m, "Up threshold: %d%%\n",
+			   dev_priv->rps.up_threshold);
+
 		seq_printf(m, "RP CUR DOWN EI: %dus\n", rpdownei &
 			   GEN6_CURIAVG_MASK);
 		seq_printf(m, "RP CUR DOWN: %dus\n", rpcurdown &
 			   GEN6_CURBSYTAVG_MASK);
 		seq_printf(m, "RP PREV DOWN: %dus\n", rpprevdown &
 			   GEN6_CURBSYTAVG_MASK);
+		seq_printf(m, "Down threshold: %d%%\n",
+			   dev_priv->rps.down_threshold);
 
 		max_freq = (rp_state_cap & 0xff0000) >> 16;
 		max_freq *= (IS_SKYLAKE(dev) ? GEN9_FREQ_SCALER : 1);
@@ -1238,12 +1243,21 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		max_freq *= (IS_SKYLAKE(dev) ? GEN9_FREQ_SCALER : 1);
 		seq_printf(m, "Max non-overclocked (RP0) frequency: %dMHz\n",
 			   intel_gpu_freq(dev_priv, max_freq));
-
 		seq_printf(m, "Max overclocked frequency: %dMHz\n",
 			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
 
+		seq_printf(m, "Current freq: %d MHz\n",
+			   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
+		seq_printf(m, "Actual freq: %d MHz\n", cagf);
 		seq_printf(m, "Idle freq: %d MHz\n",
 			   intel_gpu_freq(dev_priv, dev_priv->rps.idle_freq));
+		seq_printf(m, "Min freq: %d MHz\n",
+			   intel_gpu_freq(dev_priv, dev_priv->rps.min_freq));
+		seq_printf(m, "Max freq: %d MHz\n",
+			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
+		seq_printf(m,
+			   "efficient (RPe) frequency: %d MHz\n",
+			   intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq));
 	} else if (IS_VALLEYVIEW(dev)) {
 		u32 freq_sts;
 
@@ -1252,6 +1266,12 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		seq_printf(m, "PUNIT_REG_GPU_FREQ_STS: 0x%08x\n", freq_sts);
 		seq_printf(m, "DDR freq: %d MHz\n", dev_priv->mem_freq);
 
+		seq_printf(m, "actual GPU freq: %d MHz\n",
+			   intel_gpu_freq(dev_priv, (freq_sts >> 8) & 0xff));
+
+		seq_printf(m, "current GPU freq: %d MHz\n",
+			   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
+
 		seq_printf(m, "max GPU freq: %d MHz\n",
 			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
 
@@ -1264,9 +1284,6 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		seq_printf(m,
 			   "efficient (RPe) frequency: %d MHz\n",
 			   intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq));
-
-		seq_printf(m, "current GPU freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, (freq_sts >> 8) & 0xff));
 		mutex_unlock(&dev_priv->rps.hw_lock);
 	} else {
 		seq_puts(m, "no P-state info available\n");
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (7 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-04 14:38   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 10/16] drm/i915: Limit mmio flip " Chris Wilson
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  1 +
 drivers/gpu/drm/i915/i915_drv.h     | 34 ++++++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_pm.c     |  1 +
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2cbb3e9266f0..1d68e3ecaa00 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2311,6 +2311,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 			   list_empty(&file_priv->rps_boost) ? "" : ", active");
 		rcu_read_unlock();
 	}
+	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
 	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index caee59bf94ba..415a8e756e48 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -272,6 +272,22 @@ struct drm_i915_private;
 struct i915_mm_struct;
 struct i915_mmu_object;
 
+struct drm_i915_file_private {
+	struct drm_i915_private *dev_priv;
+	struct drm_file *file;
+
+	struct {
+		spinlock_t lock;
+		struct list_head request_list;
+	} mm;
+	struct idr context_idr;
+
+	struct list_head rps_boost;
+	struct intel_engine_cs *bsd_ring;
+
+	unsigned rps_boosts;
+};
+
 enum intel_dpll_id {
 	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
 	/* real shared dpll ids must be >= 0 */
@@ -1054,6 +1070,8 @@ struct intel_gen6_power_mgmt {
 	struct list_head clients;
 	unsigned boosts;
 
+	struct drm_i915_file_private semaphores;
+
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
 
@@ -2191,22 +2209,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
  * a later patch when the call to i915_seqno_passed() is obsoleted...
  */
 
-struct drm_i915_file_private {
-	struct drm_i915_private *dev_priv;
-	struct drm_file *file;
-
-	struct {
-		spinlock_t lock;
-		struct list_head request_list;
-	} mm;
-	struct idr context_idr;
-
-	struct list_head rps_boost;
-	struct intel_engine_cs *bsd_ring;
-
-	unsigned rps_boosts;
-};
-
 /*
  * A command that requires special handling by the command parser.
  */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d60eca03e306..90c33912ffd5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3036,9 +3036,12 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		return ret;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(rq,
-					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
-					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
+					  atomic_read(&i915->gpu_error.reset_counter),
+					  i915->mm.interruptible,
+					  NULL,
+					  &i915->rps.semaphores);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a7516ed24eee..8dc158adba14 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6861,6 +6861,7 @@ void intel_pm_setup(struct drm_device *dev)
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
+	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/16] drm/i915: Limit mmio flip RPS boosts
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (8 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-27 12:41 ` [PATCH 11/16] drm/i915: Convert RPS tracking to a intel_rps_client struct Chris Wilson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Since we will often pageflip to an active surface, we will often have to
wait for the surface to be written before issuing the flip. Also we are
likely to wait on that surface in plenty of time before the vblank.
Since we have a mechanism for boosting when a flip misses the expected
vblank, curtain the number of times we RPS boost when simply waiting for
mmioflip.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 1 +
 drivers/gpu/drm/i915/i915_drv.h      | 1 +
 drivers/gpu/drm/i915/intel_display.c | 4 +++-
 drivers/gpu/drm/i915/intel_drv.h     | 1 +
 drivers/gpu/drm/i915/intel_pm.c      | 1 +
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1d68e3ecaa00..07742562c2a7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2312,6 +2312,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 		rcu_read_unlock();
 	}
 	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
+	seq_printf(m, "MMIO flip boosts: %d\n", dev_priv->rps.mmioflips.rps_boosts);
 	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 415a8e756e48..7dd908962e12 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1071,6 +1071,7 @@ struct intel_gen6_power_mgmt {
 	unsigned boosts;
 
 	struct drm_i915_file_private semaphores;
+	struct drm_i915_file_private mmioflips;
 
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 104c0fcb575b..1adcfa15221d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10657,7 +10657,8 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 	if (mmio_flip->rq)
 		WARN_ON(__i915_wait_request(mmio_flip->rq,
 					    mmio_flip->crtc->reset_counter,
-					    false, NULL, NULL));
+					    false, NULL,
+					    &mmio_flip->i915->rps.mmioflips));
 
 	intel_do_mmio_flip(mmio_flip->crtc);
 
@@ -10678,6 +10679,7 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 	if (mmio_flip == NULL)
 		return -ENOMEM;
 
+	mmio_flip->i915 = to_i915(dev);
 	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
 	mmio_flip->crtc = to_intel_crtc(crtc);
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 7f8cce797ba2..cc37ba1e98fb 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -469,6 +469,7 @@ struct intel_pipe_wm {
 
 struct intel_mmio_flip {
 	struct work_struct work;
+	struct drm_i915_private *i915;
 	struct drm_i915_gem_request *rq;
 	struct intel_crtc *crtc;
 };
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8dc158adba14..16fb303c08cf 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6862,6 +6862,7 @@ void intel_pm_setup(struct drm_device *dev)
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
 	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
+	INIT_LIST_HEAD(&dev_priv->rps.mmioflips.rps_boost);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/16] drm/i915: Convert RPS tracking to a intel_rps_client struct
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (9 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 10/16] drm/i915: Limit mmio flip " Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-27 12:41 ` [PATCH 12/16] drm/i915: Don't downclock whilst we have clients waiting for GPU results Chris Wilson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 12 ++++++++----
 drivers/gpu/drm/i915/i915_drv.h     | 13 +++++++------
 drivers/gpu/drm/i915/i915_gem.c     | 24 +++++++++++++++---------
 drivers/gpu/drm/i915/intel_drv.h    |  2 +-
 drivers/gpu/drm/i915/intel_pm.c     | 14 +++++++-------
 5 files changed, 38 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 07742562c2a7..cc9ed6bdf155 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2307,12 +2307,16 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 		seq_printf(m, "%s [%d]: %d boosts%s\n",
 			   task ? task->comm : "<unknown>",
 			   task ? task->pid : -1,
-			   file_priv->rps_boosts,
-			   list_empty(&file_priv->rps_boost) ? "" : ", active");
+			   file_priv->rps.boosts,
+			   list_empty(&file_priv->rps.link) ? "" : ", active");
 		rcu_read_unlock();
 	}
-	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
-	seq_printf(m, "MMIO flip boosts: %d\n", dev_priv->rps.mmioflips.rps_boosts);
+	seq_printf(m, "Semaphore boosts: %d%s\n",
+		   dev_priv->rps.semaphores.boosts,
+		   list_empty(&dev_priv->rps.semaphores.link) ? "" : ", active");
+	seq_printf(m, "MMIO flip boosts: %d%s\n",
+		   dev_priv->rps.mmioflips.boosts,
+		   list_empty(&dev_priv->rps.mmioflips.link) ? "" : ", active");
 	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7dd908962e12..4a73afd62c34 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -282,10 +282,12 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
-	struct list_head rps_boost;
-	struct intel_engine_cs *bsd_ring;
+	struct intel_rps_client {
+		struct list_head link;
+		unsigned boosts;
+	} rps;
 
-	unsigned rps_boosts;
+	struct intel_engine_cs *bsd_ring;
 };
 
 enum intel_dpll_id {
@@ -1070,8 +1072,7 @@ struct intel_gen6_power_mgmt {
 	struct list_head clients;
 	unsigned boosts;
 
-	struct drm_i915_file_private semaphores;
-	struct drm_i915_file_private mmioflips;
+	struct intel_rps_client semaphores, mmioflips;
 
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
@@ -2822,7 +2823,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
-			struct drm_i915_file_private *file_priv);
+			struct intel_rps_client *rps);
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 90c33912ffd5..43ac75834e61 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1223,7 +1223,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
-			struct drm_i915_file_private *file_priv)
+			struct intel_rps_client *rps)
 {
 	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
 	struct drm_device *dev = ring->dev;
@@ -1246,8 +1246,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev)->gen >= 6)
-		gen6_rps_boost(dev_priv, file_priv);
+	if (INTEL_INFO(dev_priv)->gen >= 6)
+		gen6_rps_boost(dev_priv, rps);
 
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
@@ -1495,7 +1495,7 @@ i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
  */
 static __must_check int
 i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
-					    struct drm_i915_file_private *file_priv,
+					    struct intel_rps_client *rps,
 					    bool readonly)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -1547,7 +1547,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	mutex_unlock(&dev->struct_mutex);
 	for (i = 0; ret == 0 && i < n; i++)
 		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, file_priv);
+					  NULL, rps);
 	mutex_lock(&dev->struct_mutex);
 
 err:
@@ -1560,6 +1560,12 @@ err:
 	return ret;
 }
 
+static struct intel_rps_client *to_rps_client(struct drm_file *file)
+{
+	struct drm_i915_file_private *fpriv = file->driver_priv;
+	return &fpriv->rps;
+}
+
 /**
  * Called when user space prepares to use an object with the CPU, either
  * through the mmap ioctl's mapping or a GTT mapping.
@@ -1602,7 +1608,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	 * to catch cases where we are gazumped.
 	 */
 	ret = i915_gem_object_wait_rendering__nonblocking(obj,
-							  file->driver_priv,
+							  to_rps_client(file),
 							  !write_domain);
 	if (ret)
 		goto unref;
@@ -5167,9 +5173,9 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	}
 	spin_unlock(&file_priv->mm.lock);
 
-	if (!list_empty(&file_priv->rps_boost)) {
+	if (!list_empty(&file_priv->rps.link)) {
 		mutex_lock(&to_i915(dev)->rps.hw_lock);
-		list_del(&file_priv->rps_boost);
+		list_del(&file_priv->rps.link);
 		mutex_unlock(&to_i915(dev)->rps.hw_lock);
 	}
 }
@@ -5188,7 +5194,7 @@ int i915_gem_open(struct drm_device *dev, struct drm_file *file)
 	file->driver_priv = file_priv;
 	file_priv->dev_priv = dev->dev_private;
 	file_priv->file = file;
-	INIT_LIST_HEAD(&file_priv->rps_boost);
+	INIT_LIST_HEAD(&file_priv->rps.link);
 
 	spin_lock_init(&file_priv->mm.lock);
 	INIT_LIST_HEAD(&file_priv->mm.request_list);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index cc37ba1e98fb..9eb0a911343a 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1361,7 +1361,7 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv,
-		    struct drm_i915_file_private *file_priv);
+		    struct intel_rps_client *rps);
 void intel_queue_rps_boost_for_request(struct drm_device *dev,
 				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 16fb303c08cf..dcc52f928650 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4122,7 +4122,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 }
 
 void gen6_rps_boost(struct drm_i915_private *dev_priv,
-		    struct drm_i915_file_private *file_priv)
+		    struct intel_rps_client *rps)
 {
 	u32 val;
 
@@ -4131,13 +4131,13 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv,
 	if (dev_priv->rps.enabled &&
 	    dev_priv->mm.busy &&
 	    dev_priv->rps.cur_freq < val &&
-	    (file_priv == NULL || list_empty(&file_priv->rps_boost))) {
+	    (rps == NULL || list_empty(&rps->link))) {
 		intel_set_rps(dev_priv->dev, val);
 		dev_priv->rps.last_adj = 0;
 
-		if (file_priv != NULL) {
-			list_add(&file_priv->rps_boost, &dev_priv->rps.clients);
-			file_priv->rps_boosts++;
+		if (rps != NULL) {
+			list_add(&rps->link, &dev_priv->rps.clients);
+			rps->boosts++;
 		} else
 			dev_priv->rps.boosts++;
 	}
@@ -6861,8 +6861,8 @@ void intel_pm_setup(struct drm_device *dev)
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
-	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
-	INIT_LIST_HEAD(&dev_priv->rps.mmioflips.rps_boost);
+	INIT_LIST_HEAD(&dev_priv->rps.semaphores.link);
+	INIT_LIST_HEAD(&dev_priv->rps.mmioflips.link);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/16] drm/i915: Don't downclock whilst we have clients waiting for GPU results
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (10 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 11/16] drm/i915: Convert RPS tracking to a intel_rps_client struct Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-27 12:41 ` [PATCH 13/16] drm/i915: Free RPS boosts for all laggards Chris Wilson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

If we have clients stalled waiting for requests, ignore the GPU if it
signals that it should downclock due to low load. This helps prevent
the automatic timeout from causing extremely long running batches from
taking even longer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_irq.c     | 14 ++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index cc9ed6bdf155..d4eb67ca27fb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2282,6 +2282,18 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int count_irq_waiters(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *ring;
+	int count = 0;
+	int i;
+
+	for_each_ring(ring, i915, i)
+		count += ring->irq_refcount;
+
+	return count;
+}
+
 static int i915_rps_boost_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -2298,6 +2310,15 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	if (ret)
 		goto unlock;
 
+	seq_printf(m, "RPS enabled? %d\n", dev_priv->rps.enabled);
+	seq_printf(m, "GPU busy? %d\n", dev_priv->mm.busy);
+	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
+	seq_printf(m, "Frequency requested %d; min hard:%d, soft:%d; max soft:%d, hard:%d\n",
+		   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq),
+		   intel_gpu_freq(dev_priv, dev_priv->rps.min_freq),
+		   intel_gpu_freq(dev_priv, dev_priv->rps.min_freq_softlimit),
+		   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq_softlimit),
+		   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct drm_i915_file_private *file_priv = file->driver_priv;
 		struct task_struct *task;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9da955e4f355..6ad2764d26b7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1070,6 +1070,18 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 	return events;
 }
 
+static bool any_waiters(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *ring;
+	int i;
+
+	for_each_ring(ring, dev_priv, i)
+		if (ring->irq_refcount)
+			return true;
+
+	return false;
+}
+
 static void gen6_pm_rps_work(struct work_struct *work)
 {
 	struct drm_i915_private *dev_priv =
@@ -1114,6 +1126,8 @@ static void gen6_pm_rps_work(struct work_struct *work)
 			new_delay = dev_priv->rps.efficient_freq;
 			adj = 0;
 		}
+	} else if (any_waiters(dev_priv)) {
+		adj = 0;
 	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
 		if (dev_priv->rps.cur_freq > dev_priv->rps.efficient_freq)
 			new_delay = dev_priv->rps.efficient_freq;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/16] drm/i915: Free RPS boosts for all laggards
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (11 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 12/16] drm/i915: Don't downclock whilst we have clients waiting for GPU results Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-21 12:50   ` Daniel Vetter
  2015-05-21 12:56   ` Daniel Vetter
  2015-04-27 12:41 ` [PATCH 14/16] drm/i915: Make the RPS interface gen agnostic Chris Wilson
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

If the client stalls on a congested request, chosen to be 20ms old to
match throttling, allow the client a free RPS boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c  |  2 +-
 drivers/gpu/drm/i915/intel_drv.h |  3 ++-
 drivers/gpu/drm/i915/intel_pm.c  | 19 +++++++++++++++----
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43ac75834e61..83bccb9f62d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1247,7 +1247,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
 	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps);
+		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
 
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 9eb0a911343a..34cfa61f3321 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1361,7 +1361,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv,
-		    struct intel_rps_client *rps);
+		    struct intel_rps_client *rps,
+		    unsigned long submitted);
 void intel_queue_rps_boost_for_request(struct drm_device *dev,
 				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index dcc52f928650..850e02e1c7eb 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4122,10 +4122,17 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 }
 
 void gen6_rps_boost(struct drm_i915_private *dev_priv,
-		    struct intel_rps_client *rps)
+		    struct intel_rps_client *rps,
+		    unsigned long submitted)
 {
 	u32 val;
 
+	/* Force a RPS boost (and don't count it against the client) if
+	 * the GPU is severely congested.
+	 */
+	if (rps && time_after(jiffies, submitted + msecs_to_jiffies(20)))
+		rps = NULL;
+
 	mutex_lock(&dev_priv->rps.hw_lock);
 	val = dev_priv->rps.max_freq_softlimit;
 	if (dev_priv->rps.enabled &&
@@ -6825,11 +6832,12 @@ struct request_boost {
 static void __intel_rps_boost_work(struct work_struct *work)
 {
 	struct request_boost *boost = container_of(work, struct request_boost, work);
+	struct drm_i915_gem_request *rq = boost->rq;
 
-	if (!i915_gem_request_completed(boost->rq, true))
-		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
+	if (!i915_gem_request_completed(rq, true))
+		gen6_rps_boost(to_i915(rq->ring->dev), 0, rq->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(boost->rq);
+	i915_gem_request_unreference__unlocked(rq);
 	kfree(boost);
 }
 
@@ -6841,6 +6849,9 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
+	if (i915_gem_request_completed(rq, true))
+		return;
+
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
 	if (boost == NULL)
 		return;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/16] drm/i915: Make the RPS interface gen agnostic
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (12 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 13/16] drm/i915: Free RPS boosts for all laggards Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-27 12:41 ` [PATCH 15/16] drm/i915, intel_ips: Enable GPU wait-boosting with IPS Chris Wilson
  2015-04-27 12:41 ` [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency Chris Wilson
  15 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

This is a preparation patch to change the interface over from gen6+ to
any so that we can extend the RPS infrastructure to support earlier
generations in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 +--
 drivers/gpu/drm/i915/i915_drv.h      |  2 +-
 drivers/gpu/drm/i915/i915_gem.c      |  3 +-
 drivers/gpu/drm/i915/i915_irq.c      | 24 +++++++-------
 drivers/gpu/drm/i915/i915_sysfs.c    |  6 ++--
 drivers/gpu/drm/i915/intel_display.c |  6 ++--
 drivers/gpu/drm/i915/intel_drv.h     |  6 ++--
 drivers/gpu/drm/i915/intel_pm.c      | 64 +++++++++++++++++-------------------
 8 files changed, 54 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d4eb67ca27fb..37d03a95d137 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4454,7 +4454,7 @@ i915_max_freq_set(void *data, u64 val)
 
 	dev_priv->rps.max_freq_softlimit = val;
 
-	intel_set_rps(dev, val);
+	intel_set_rps(dev_priv, val);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
@@ -4521,7 +4521,7 @@ i915_min_freq_set(void *data, u64 val)
 
 	dev_priv->rps.min_freq_softlimit = val;
 
-	intel_set_rps(dev, val);
+	intel_set_rps(dev_priv, val);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4a73afd62c34..7a260da815ad 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3177,7 +3177,7 @@ extern void i915_redisable_vga(struct drm_device *dev);
 extern void i915_redisable_vga_power_on(struct drm_device *dev);
 extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
 extern void intel_init_pch_refclk(struct drm_device *dev);
-extern void intel_set_rps(struct drm_device *dev, u8 val);
+extern void intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 extern void intel_detect_pch(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 83bccb9f62d6..ba3c89bc83b6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1246,8 +1246,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
+	intel_rps_boost(dev_priv, rps, req->emitted_jiffies);
 
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6ad2764d26b7..375633356ac0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1087,7 +1087,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	struct drm_i915_private *dev_priv =
 		container_of(work, struct drm_i915_private, rps.work);
 	u32 pm_iir;
-	int new_delay, adj;
+	int freq, adj;
 
 	spin_lock_irq(&dev_priv->irq_lock);
 	/* Speed up work cancelation during disabling rps interrupts. */
@@ -1103,7 +1103,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 
 	/* Make sure we didn't queue anything we're not going to process. */
 	WARN_ON(pm_iir & ~dev_priv->pm_rps_events);
-
 	if ((pm_iir & dev_priv->pm_rps_events) == 0)
 		return;
 
@@ -1112,7 +1111,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	pm_iir |= vlv_wa_c0_ei(dev_priv, pm_iir);
 
 	adj = dev_priv->rps.last_adj;
-	new_delay = dev_priv->rps.cur_freq;
+	freq = dev_priv->rps.cur_freq;
 	if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
 		if (adj > 0)
 			adj *= 2;
@@ -1122,17 +1121,17 @@ static void gen6_pm_rps_work(struct work_struct *work)
 		 * For better performance, jump directly
 		 * to RPe if we're below it.
 		 */
-		if (new_delay < dev_priv->rps.efficient_freq - adj) {
-			new_delay = dev_priv->rps.efficient_freq;
+		if (freq < dev_priv->rps.efficient_freq - adj) {
+			freq = dev_priv->rps.efficient_freq;
 			adj = 0;
 		}
 	} else if (any_waiters(dev_priv)) {
 		adj = 0;
 	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
 		if (dev_priv->rps.cur_freq > dev_priv->rps.efficient_freq)
-			new_delay = dev_priv->rps.efficient_freq;
+			freq = dev_priv->rps.efficient_freq;
 		else
-			new_delay = dev_priv->rps.min_freq_softlimit;
+			freq = dev_priv->rps.min_freq_softlimit;
 		adj = 0;
 	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
 		if (adj < 0)
@@ -1148,12 +1147,11 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	/* sysfs frequency interfaces may have snuck in while servicing the
 	 * interrupt
 	 */
-	new_delay += adj;
-	new_delay = clamp_t(int, new_delay,
-			    dev_priv->rps.min_freq_softlimit,
-			    dev_priv->rps.max_freq_softlimit);
-
-	intel_set_rps(dev_priv->dev, new_delay);
+	freq += adj;
+	freq = clamp_t(int, freq,
+		       dev_priv->rps.min_freq_softlimit,
+		       dev_priv->rps.max_freq_softlimit);
+	intel_set_rps(dev_priv, freq);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 247626885f49..df5636093397 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -423,7 +423,7 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 	/* We still need *_set_rps to process the new max_delay and
 	 * update the interrupt limits and PMINTRMSK even though
 	 * frequency request may be unchanged. */
-	intel_set_rps(dev, val);
+	intel_set_rps(dev_priv, val);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
@@ -482,7 +482,7 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 	/* We still need *_set_rps to process the new min_delay and
 	 * update the interrupt limits and PMINTRMSK even though
 	 * frequency request may be unchanged. */
-	intel_set_rps(dev, val);
+	intel_set_rps(dev_priv, val);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
@@ -511,7 +511,7 @@ static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr
 	u32 val;
 
 	if (attr == &dev_attr_gt_RP0_freq_mhz)
-		val = intel_gpu_freq(dev_priv, dev_priv->rps.rp0_freq);
+		val = intel_gpu_freq(dev_priv, dev_priv->rps.max_freq);
 	else if (attr == &dev_attr_gt_RP1_freq_mhz)
 		val = intel_gpu_freq(dev_priv, dev_priv->rps.rp1_freq);
 	else if (attr == &dev_attr_gt_RPn_freq_mhz)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 1adcfa15221d..f63f194141db 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10082,8 +10082,7 @@ void intel_mark_busy(struct drm_device *dev)
 
 	intel_runtime_pm_get(dev_priv);
 	i915_update_gfx_val(dev_priv);
-	if (INTEL_INFO(dev)->gen >= 6)
-		gen6_rps_busy(dev_priv);
+	intel_rps_busy(dev_priv);
 	dev_priv->mm.busy = true;
 }
 
@@ -10104,8 +10103,7 @@ void intel_mark_idle(struct drm_device *dev)
 		intel_decrease_pllclock(crtc);
 	}
 
-	if (INTEL_INFO(dev)->gen >= 6)
-		gen6_rps_idle(dev->dev_private);
+	intel_rps_idle(dev->dev_private);
 
 	intel_runtime_pm_put(dev_priv);
 }
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 34cfa61f3321..b48ac7112ab8 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1357,10 +1357,10 @@ void intel_disable_gt_powersave(struct drm_device *dev);
 void intel_suspend_gt_powersave(struct drm_device *dev);
 void intel_reset_gt_powersave(struct drm_device *dev);
 void gen6_update_ring_freq(struct drm_device *dev);
-void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
-void gen6_rps_idle(struct drm_i915_private *dev_priv);
-void gen6_rps_boost(struct drm_i915_private *dev_priv,
+void intel_rps_busy(struct drm_i915_private *dev_priv);
+void intel_rps_idle(struct drm_i915_private *dev_priv);
+void intel_rps_boost(struct drm_i915_private *dev_priv,
 		    struct intel_rps_client *rps,
 		    unsigned long submitted);
 void intel_queue_rps_boost_for_request(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 850e02e1c7eb..0dcc7bb47f71 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3980,10 +3980,8 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 /* gen6_set_rps is called to update the frequency request, but should also be
  * called when the range (min_delay and max_delay) is modified so that we can
  * update the GEN6_RP_INTERRUPT_LIMITS register accordingly. */
-static void gen6_set_rps(struct drm_device *dev, u8 val)
+static void gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 	WARN_ON(val > dev_priv->rps.max_freq);
 	WARN_ON(val < dev_priv->rps.min_freq);
@@ -3994,10 +3992,10 @@ static void gen6_set_rps(struct drm_device *dev, u8 val)
 	if (val != dev_priv->rps.cur_freq) {
 		gen6_set_rps_thresholds(dev_priv, val);
 
-		if (IS_GEN9(dev))
+		if (IS_GEN9(dev_priv))
 			I915_WRITE(GEN6_RPNSWREQ,
 				   GEN9_FREQUENCY(val));
-		else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
 			I915_WRITE(GEN6_RPNSWREQ,
 				   HSW_FREQUENCY(val));
 		else
@@ -4019,15 +4017,13 @@ static void gen6_set_rps(struct drm_device *dev, u8 val)
 	trace_intel_gpu_freq_change(val * 50);
 }
 
-static void valleyview_set_rps(struct drm_device *dev, u8 val)
+static void valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 	WARN_ON(val > dev_priv->rps.max_freq);
 	WARN_ON(val < dev_priv->rps.min_freq);
 
-	if (WARN_ONCE(IS_CHERRYVIEW(dev) && (val & 1),
+	if (WARN_ONCE(IS_CHERRYVIEW(dev_priv) && (val & 1),
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
@@ -4059,7 +4055,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 
 	/* CHV and latest VLV don't need to force the gfx clock */
 	if (IS_CHERRYVIEW(dev) || dev->pdev->revision >= 0xd) {
-		valleyview_set_rps(dev_priv->dev, val);
+		valleyview_set_rps(dev_priv, val);
 		return;
 	}
 
@@ -4090,8 +4086,11 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 }
 
-void gen6_rps_busy(struct drm_i915_private *dev_priv)
+void intel_rps_busy(struct drm_i915_private *dev_priv)
 {
+	if (INTEL_INFO(dev_priv)->gen < 6)
+		return;
+
 	mutex_lock(&dev_priv->rps.hw_lock);
 	if (dev_priv->rps.enabled) {
 		if (dev_priv->pm_rps_events & (GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED))
@@ -4102,16 +4101,17 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
-void gen6_rps_idle(struct drm_i915_private *dev_priv)
+void intel_rps_idle(struct drm_i915_private *dev_priv)
 {
-	struct drm_device *dev = dev_priv->dev;
+	if (INTEL_INFO(dev_priv)->gen < 6)
+		return;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 	if (dev_priv->rps.enabled) {
-		if (IS_VALLEYVIEW(dev))
+		if (IS_VALLEYVIEW(dev_priv))
 			vlv_set_rps_idle(dev_priv);
 		else
-			gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq);
+			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
 		dev_priv->rps.last_adj = 0;
 		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
 	}
@@ -4121,9 +4121,9 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
-void gen6_rps_boost(struct drm_i915_private *dev_priv,
-		    struct intel_rps_client *rps,
-		    unsigned long submitted)
+void intel_rps_boost(struct drm_i915_private *dev_priv,
+		     struct intel_rps_client *rps,
+		     unsigned long submitted)
 {
 	u32 val;
 
@@ -4139,7 +4139,7 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv,
 	    dev_priv->mm.busy &&
 	    dev_priv->rps.cur_freq < val &&
 	    (rps == NULL || list_empty(&rps->link))) {
-		intel_set_rps(dev_priv->dev, val);
+		intel_set_rps(dev_priv, val);
 		dev_priv->rps.last_adj = 0;
 
 		if (rps != NULL) {
@@ -4151,12 +4151,12 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv,
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
-void intel_set_rps(struct drm_device *dev, u8 val)
+void intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	if (IS_VALLEYVIEW(dev))
-		valleyview_set_rps(dev, val);
-	else
-		gen6_set_rps(dev, val);
+	if (IS_VALLEYVIEW(dev_priv))
+		valleyview_set_rps(dev_priv, val);
+	else if (INTEL_INFO(dev_priv)->gen > 6)
+		gen6_set_rps(dev_priv, val);
 }
 
 static void gen9_disable_rps(struct drm_device *dev)
@@ -4333,7 +4333,7 @@ static void gen9_enable_rps(struct drm_device *dev)
 	 * Up/Down EI & threshold registers, as well as the RP_CONTROL,
 	 * RP_INTERRUPT_LIMITS & RPNSWREQ registers */
 	dev_priv->rps.power = HIGH_POWER; /* force a reset */
-	gen6_set_rps(dev_priv->dev, dev_priv->rps.min_freq_softlimit);
+	gen6_set_rps(dev_priv, dev_priv->rps.min_freq_softlimit);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -4466,7 +4466,7 @@ static void gen8_enable_rps(struct drm_device *dev)
 	/* 6: Ring frequency + overclocking (our driver does this later */
 
 	dev_priv->rps.power = HIGH_POWER; /* force a reset */
-	gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq);
+	gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -4560,7 +4560,7 @@ static void gen6_enable_rps(struct drm_device *dev)
 	}
 
 	dev_priv->rps.power = HIGH_POWER; /* force a reset */
-	gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq);
+	gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
 
 	rc6vids = 0;
 	ret = sandybridge_pcode_read(dev_priv, GEN6_PCODE_READ_RC6VIDS, &rc6vids);
@@ -5106,7 +5106,7 @@ static void cherryview_enable_rps(struct drm_device *dev)
 			 intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq),
 			 dev_priv->rps.efficient_freq);
 
-	valleyview_set_rps(dev_priv->dev, dev_priv->rps.efficient_freq);
+	valleyview_set_rps(dev_priv, dev_priv->rps.efficient_freq);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -5190,7 +5190,7 @@ static void valleyview_enable_rps(struct drm_device *dev)
 			 intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq),
 			 dev_priv->rps.efficient_freq);
 
-	valleyview_set_rps(dev_priv->dev, dev_priv->rps.efficient_freq);
+	valleyview_set_rps(dev_priv, dev_priv->rps.efficient_freq);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -5720,15 +5720,13 @@ static void gen6_suspend_rps(struct drm_device *dev)
  */
 void intel_suspend_gt_powersave(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
 	if (INTEL_INFO(dev)->gen < 6)
 		return;
 
 	gen6_suspend_rps(dev);
 
 	/* Force GPU to min freq during suspend */
-	gen6_rps_idle(dev_priv);
+	intel_rps_idle(to_i915(dev));
 }
 
 void intel_disable_gt_powersave(struct drm_device *dev)
@@ -6835,7 +6833,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct drm_i915_gem_request *rq = boost->rq;
 
 	if (!i915_gem_request_completed(rq, true))
-		gen6_rps_boost(to_i915(rq->ring->dev), 0, rq->emitted_jiffies);
+		intel_rps_boost(to_i915(rq->ring->dev), 0, rq->emitted_jiffies);
 
 	i915_gem_request_unreference__unlocked(rq);
 	kfree(boost);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/16] drm/i915, intel_ips: Enable GPU wait-boosting with IPS
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (13 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 14/16] drm/i915: Make the RPS interface gen agnostic Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-04-27 12:41 ` [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency Chris Wilson
  15 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jesse Barnes

Refactor the reclocking logic used by RPS on Ironlake to reuse the
infrastructure developed for RPS on Sandybridge+, along with the
waitboosting support for stalled clients and missed frames.

Reported-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90137
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jesse@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_drv.h      |   8 ---
 drivers/gpu/drm/i915/i915_irq.c      |  83 +++++++++++------------
 drivers/gpu/drm/i915/i915_sysfs.c    |  10 +++
 drivers/gpu/drm/i915/intel_display.c |   1 -
 drivers/gpu/drm/i915/intel_pm.c      | 124 ++++++++++++++++++++++++-----------
 drivers/platform/x86/intel_ips.c     |  14 +++-
 include/drm/i915_drm.h               |   1 +
 7 files changed, 149 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a260da815ad..c35723ace814 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1088,12 +1088,6 @@ struct intel_gen6_power_mgmt {
 extern spinlock_t mchdev_lock;
 
 struct intel_ilk_power_mgmt {
-	u8 cur_delay;
-	u8 min_delay;
-	u8 max_delay;
-	u8 fmax;
-	u8 fstart;
-
 	u64 last_count1;
 	unsigned long last_time1;
 	unsigned long chipset_power;
@@ -2533,7 +2527,6 @@ extern int i915_reset(struct drm_device *dev);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
 extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
 extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
-extern void i915_update_gfx_val(struct drm_i915_private *dev_priv);
 int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
 void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
 
@@ -3175,7 +3168,6 @@ extern void intel_modeset_setup_hw_state(struct drm_device *dev,
 					 bool force_restore);
 extern void i915_redisable_vga(struct drm_device *dev);
 extern void i915_redisable_vga_power_on(struct drm_device *dev);
-extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
 extern void intel_init_pch_refclk(struct drm_device *dev);
 extern void intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 375633356ac0..234a6e004a4d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -952,45 +952,6 @@ static void i915_hotplug_work_func(struct work_struct *work)
 		drm_kms_helper_hotplug_event(dev);
 }
 
-static void ironlake_rps_change_irq_handler(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 busy_up, busy_down, max_avg, min_avg;
-	u8 new_delay;
-
-	spin_lock(&mchdev_lock);
-
-	I915_WRITE16(MEMINTRSTS, I915_READ(MEMINTRSTS));
-
-	new_delay = dev_priv->ips.cur_delay;
-
-	I915_WRITE16(MEMINTRSTS, MEMINT_EVAL_CHG);
-	busy_up = I915_READ(RCPREVBSYTUPAVG);
-	busy_down = I915_READ(RCPREVBSYTDNAVG);
-	max_avg = I915_READ(RCBMAXAVG);
-	min_avg = I915_READ(RCBMINAVG);
-
-	/* Handle RCS change request from hw */
-	if (busy_up > max_avg) {
-		if (dev_priv->ips.cur_delay != dev_priv->ips.max_delay)
-			new_delay = dev_priv->ips.cur_delay - 1;
-		if (new_delay < dev_priv->ips.max_delay)
-			new_delay = dev_priv->ips.max_delay;
-	} else if (busy_down < min_avg) {
-		if (dev_priv->ips.cur_delay != dev_priv->ips.min_delay)
-			new_delay = dev_priv->ips.cur_delay + 1;
-		if (new_delay > dev_priv->ips.min_delay)
-			new_delay = dev_priv->ips.min_delay;
-	}
-
-	if (ironlake_set_drps(dev, new_delay))
-		dev_priv->ips.cur_delay = new_delay;
-
-	spin_unlock(&mchdev_lock);
-
-	return;
-}
-
 static void notify_ring(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
@@ -1039,6 +1000,36 @@ void gen6_rps_reset_ei(struct drm_i915_private *dev_priv)
 	dev_priv->rps.up_ei = dev_priv->rps.down_ei;
 }
 
+static u32 ilk_compute_pm_iir(struct drm_i915_private *dev_priv)
+{
+	u32 pm_iir;
+
+	spin_lock(&mchdev_lock);
+	I915_WRITE16(MEMINTRSTS, I915_READ(MEMINTRSTS));
+	I915_WRITE16(MEMINTRSTS, MEMINT_EVAL_CHG);
+
+#define busy_up I915_READ(RCPREVBSYTUPAVG)
+#define busy_down I915_READ(RCPREVBSYTDNAVG)
+#define max_avg I915_READ(RCBMAXAVG)
+#define min_avg I915_READ(RCBMINAVG)
+
+	if (busy_up > max_avg)
+		pm_iir = GEN6_PM_RP_UP_THRESHOLD;
+	else if (busy_down < min_avg)
+		pm_iir = GEN6_PM_RP_DOWN_THRESHOLD;
+	else
+		pm_iir = 0;
+
+#undef busy_up
+#undef busy_down
+#undef max_avg
+#undef min_avg
+
+	spin_unlock(&mchdev_lock);
+
+	return pm_iir;
+}
+
 static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 {
 	struct intel_rps_ei now;
@@ -1095,10 +1086,14 @@ static void gen6_pm_rps_work(struct work_struct *work)
 		spin_unlock_irq(&dev_priv->irq_lock);
 		return;
 	}
-	pm_iir = dev_priv->rps.pm_iir;
-	dev_priv->rps.pm_iir = 0;
-	/* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
-	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
+	if (IS_GEN5(dev_priv)) {
+		pm_iir = ilk_compute_pm_iir(dev_priv);
+	} else {
+		pm_iir = dev_priv->rps.pm_iir;
+		dev_priv->rps.pm_iir = 0;
+		/* Make sure not to corrupt PMIMR state used by ringbuffer */
+		gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
+	}
 	spin_unlock_irq(&dev_priv->irq_lock);
 
 	/* Make sure we didn't queue anything we're not going to process. */
@@ -2045,7 +2040,7 @@ static void ilk_display_irq_handler(struct drm_device *dev, u32 de_iir)
 	}
 
 	if (IS_GEN5(dev) && de_iir & DE_PCU_EVENT)
-		ironlake_rps_change_irq_handler(dev);
+		queue_work(dev_priv->wq, &dev_priv->rps.work);
 }
 
 static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir)
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index df5636093397..6f770e7f92db 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -522,6 +522,14 @@ static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr
 	return snprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
+static const struct attribute *gen5_attrs[] = {
+	&dev_attr_gt_cur_freq_mhz.attr,
+	&dev_attr_gt_max_freq_mhz.attr,
+	&dev_attr_gt_min_freq_mhz.attr,
+	&dev_attr_gt_RP0_freq_mhz.attr,
+	&dev_attr_gt_RPn_freq_mhz.attr,
+	NULL,
+};
 static const struct attribute *gen6_attrs[] = {
 	&dev_attr_gt_act_freq_mhz.attr,
 	&dev_attr_gt_cur_freq_mhz.attr,
@@ -652,6 +660,8 @@ void i915_setup_sysfs(struct drm_device *dev)
 		ret = sysfs_create_files(&dev->primary->kdev->kobj, vlv_attrs);
 	else if (INTEL_INFO(dev)->gen >= 6)
 		ret = sysfs_create_files(&dev->primary->kdev->kobj, gen6_attrs);
+	else if (INTEL_INFO(dev)->gen >= 5)
+		ret = sysfs_create_files(&dev->primary->kdev->kobj, gen5_attrs);
 	if (ret)
 		DRM_ERROR("RPS sysfs setup failed\n");
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f63f194141db..e228070031ed 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10081,7 +10081,6 @@ void intel_mark_busy(struct drm_device *dev)
 		return;
 
 	intel_runtime_pm_get(dev_priv);
-	i915_update_gfx_val(dev_priv);
 	intel_rps_busy(dev_priv);
 	dev_priv->mm.busy = true;
 }
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0dcc7bb47f71..0cc9e95f70d3 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3727,21 +3727,29 @@ DEFINE_SPINLOCK(mchdev_lock);
  * mchdev_lock. */
 static struct drm_i915_private *i915_mch_dev;
 
-bool ironlake_set_drps(struct drm_device *dev, u8 val)
+static bool __ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	u16 rgvswctl;
 
+	if (WARN_ON(val < dev_priv->rps.min_freq))
+		return false;
+	if (WARN_ON(val > dev_priv->rps.max_freq))
+		return false;
+
 	assert_spin_locked(&mchdev_lock);
 
-	rgvswctl = I915_READ16(MEMSWCTL);
-	if (rgvswctl & MEMCTL_CMD_STS) {
+	if (wait_for_atomic((I915_READ(MEMSWCTL) & MEMCTL_CMD_STS) == 0, 10)) {
 		DRM_DEBUG("gpu busy, RCS change rejected\n");
 		return false; /* still busy with another command */
 	}
 
-	rgvswctl = (MEMCTL_CMD_CHFREQ << MEMCTL_CMD_SHIFT) |
-		(val << MEMCTL_FREQ_SHIFT) | MEMCTL_SFCAVM;
+	dev_priv->rps.cur_freq = val;
+	val = dev_priv->rps.max_freq - val + dev_priv->rps.min_freq;
+
+	rgvswctl =
+		(MEMCTL_CMD_CHFREQ << MEMCTL_CMD_SHIFT) |
+		(val << MEMCTL_FREQ_SHIFT) |
+		MEMCTL_SFCAVM;
 	I915_WRITE16(MEMSWCTL, rgvswctl);
 	POSTING_READ16(MEMSWCTL);
 
@@ -3751,6 +3759,13 @@ bool ironlake_set_drps(struct drm_device *dev, u8 val)
 	return true;
 }
 
+static void ironlake_set_rps(struct drm_i915_private *dev_priv, u8 val)
+{
+	spin_lock_irq(&mchdev_lock);
+	__ironlake_set_rps(dev_priv, val);
+	spin_unlock_irq(&mchdev_lock);
+}
+
 static void ironlake_enable_drps(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3782,16 +3797,18 @@ static void ironlake_enable_drps(struct drm_device *dev)
 	vstart = (I915_READ(PXVFREQ_BASE + (fstart * 4)) & PXVFREQ_PX_MASK) >>
 		PXVFREQ_PX_SHIFT;
 
-	dev_priv->ips.fmax = fmax; /* IPS callback will increase this */
-	dev_priv->ips.fstart = fstart;
-
-	dev_priv->ips.max_delay = fstart;
-	dev_priv->ips.min_delay = fmin;
-	dev_priv->ips.cur_delay = fstart;
+	dev_priv->rps.max_freq = fmin;
+	dev_priv->rps.min_freq = fmax;
+	dev_priv->rps.cur_freq = fmin - fstart;
 
 	DRM_DEBUG_DRIVER("fmax: %d, fmin: %d, fstart: %d\n",
 			 fmax, fmin, fstart);
 
+	dev_priv->rps.max_freq_softlimit = dev_priv->rps.min_freq;
+	dev_priv->rps.min_freq_softlimit = dev_priv->rps.min_freq;
+	dev_priv->rps.efficient_freq = dev_priv->rps.cur_freq;
+	dev_priv->rps.idle_freq = dev_priv->rps.min_freq;
+
 	I915_WRITE(MEMINTREN, MEMINT_CX_SUPR_EN | MEMINT_EVAL_CHG_EN);
 
 	/*
@@ -3808,7 +3825,7 @@ static void ironlake_enable_drps(struct drm_device *dev)
 		DRM_ERROR("stuck trying to change perf mode\n");
 	mdelay(1);
 
-	ironlake_set_drps(dev, fstart);
+	__ironlake_set_rps(dev_priv, dev_priv->rps.cur_freq);
 
 	dev_priv->ips.last_count1 = I915_READ(0x112e4) + I915_READ(0x112e8) +
 		I915_READ(0x112e0);
@@ -3836,7 +3853,7 @@ static void ironlake_disable_drps(struct drm_device *dev)
 	I915_WRITE(DEIMR, I915_READ(DEIMR) | DE_PCU_EVENT);
 
 	/* Go back to the starting frequency */
-	ironlake_set_drps(dev, dev_priv->ips.fstart);
+	__ironlake_set_rps(dev_priv, dev_priv->rps.efficient_freq);
 	mdelay(1);
 	rgvswctl |= MEMCTL_CMD_STS;
 	I915_WRITE(MEMSWCTL, rgvswctl);
@@ -4086,29 +4103,37 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 }
 
+static void i915_update_gfx_val(struct drm_i915_private *dev_priv);
+
 void intel_rps_busy(struct drm_i915_private *dev_priv)
 {
-	if (INTEL_INFO(dev_priv)->gen < 6)
+	if (INTEL_INFO(dev_priv)->gen < 5)
 		return;
 
-	mutex_lock(&dev_priv->rps.hw_lock);
-	if (dev_priv->rps.enabled) {
-		if (dev_priv->pm_rps_events & (GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED))
-			gen6_rps_reset_ei(dev_priv);
-		I915_WRITE(GEN6_PMINTRMSK,
-			   gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq));
+	if (INTEL_INFO(dev_priv)->gen >= 6) {
+		mutex_lock(&dev_priv->rps.hw_lock);
+		if (dev_priv->rps.enabled) {
+			if (dev_priv->pm_rps_events & (GEN6_PM_RP_DOWN_EI_EXPIRED | GEN6_PM_RP_UP_EI_EXPIRED))
+				gen6_rps_reset_ei(dev_priv);
+			I915_WRITE(GEN6_PMINTRMSK,
+				   gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq));
+		}
+		mutex_unlock(&dev_priv->rps.hw_lock);
 	}
-	mutex_unlock(&dev_priv->rps.hw_lock);
+
+	i915_update_gfx_val(dev_priv);
 }
 
 void intel_rps_idle(struct drm_i915_private *dev_priv)
 {
-	if (INTEL_INFO(dev_priv)->gen < 6)
+	if (INTEL_INFO(dev_priv)->gen < 5)
 		return;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 	if (dev_priv->rps.enabled) {
-		if (IS_VALLEYVIEW(dev_priv))
+		if (IS_GEN5(dev_priv))
+			ironlake_set_rps(dev_priv, dev_priv->rps.idle_freq);
+		else if (IS_VALLEYVIEW(dev_priv))
 			vlv_set_rps_idle(dev_priv);
 		else
 			gen6_set_rps(dev_priv, dev_priv->rps.idle_freq);
@@ -4119,6 +4144,8 @@ void intel_rps_idle(struct drm_i915_private *dev_priv)
 	while (!list_empty(&dev_priv->rps.clients))
 		list_del_init(dev_priv->rps.clients.next);
 	mutex_unlock(&dev_priv->rps.hw_lock);
+
+	i915_update_gfx_val(dev_priv);
 }
 
 void intel_rps_boost(struct drm_i915_private *dev_priv,
@@ -4153,7 +4180,9 @@ void intel_rps_boost(struct drm_i915_private *dev_priv,
 
 void intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	if (IS_VALLEYVIEW(dev_priv))
+	if (IS_GEN5(dev_priv))
+		ironlake_set_rps(dev_priv, val);
+	else if (IS_VALLEYVIEW(dev_priv))
 		valleyview_set_rps(dev_priv, val);
 	else if (INTEL_INFO(dev_priv)->gen > 6)
 		gen6_set_rps(dev_priv, val);
@@ -5366,11 +5395,9 @@ static void __i915_update_gfx_val(struct drm_i915_private *dev_priv)
 	dev_priv->ips.gfx_power = diff;
 }
 
-void i915_update_gfx_val(struct drm_i915_private *dev_priv)
+static void i915_update_gfx_val(struct drm_i915_private *dev_priv)
 {
-	struct drm_device *dev = dev_priv->dev;
-
-	if (INTEL_INFO(dev)->gen != 5)
+	if (INTEL_INFO(dev_priv)->gen != 5)
 		return;
 
 	spin_lock_irq(&mchdev_lock);
@@ -5419,10 +5446,9 @@ static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
 
 unsigned long i915_gfx_val(struct drm_i915_private *dev_priv)
 {
-	struct drm_device *dev = dev_priv->dev;
 	unsigned long val;
 
-	if (INTEL_INFO(dev)->gen != 5)
+	if (INTEL_INFO(dev_priv)->gen != 5)
 		return 0;
 
 	spin_lock_irq(&mchdev_lock);
@@ -5479,8 +5505,8 @@ bool i915_gpu_raise(void)
 	}
 	dev_priv = i915_mch_dev;
 
-	if (dev_priv->ips.max_delay > dev_priv->ips.fmax)
-		dev_priv->ips.max_delay--;
+	if (dev_priv->rps.max_freq_softlimit < dev_priv->rps.max_freq)
+		dev_priv->rps.max_freq_softlimit++;
 
 out_unlock:
 	spin_unlock_irq(&mchdev_lock);
@@ -5507,8 +5533,8 @@ bool i915_gpu_lower(void)
 	}
 	dev_priv = i915_mch_dev;
 
-	if (dev_priv->ips.max_delay < dev_priv->ips.min_delay)
-		dev_priv->ips.max_delay++;
+	if (dev_priv->rps.max_freq_softlimit > dev_priv->rps.min_freq)
+		dev_priv->rps.max_freq_softlimit--;
 
 out_unlock:
 	spin_unlock_irq(&mchdev_lock);
@@ -5562,9 +5588,10 @@ bool i915_gpu_turbo_disable(void)
 	}
 	dev_priv = i915_mch_dev;
 
-	dev_priv->ips.max_delay = dev_priv->ips.fstart;
+	dev_priv->rps.max_freq_softlimit = dev_priv->rps.min_freq;
+	dev_priv->rps.enabled = false;
 
-	if (!ironlake_set_drps(dev_priv->dev, dev_priv->ips.fstart))
+	if (!__ironlake_set_rps(dev_priv, dev_priv->rps.min_freq))
 		ret = false;
 
 out_unlock:
@@ -5574,6 +5601,27 @@ out_unlock:
 }
 EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable);
 
+bool i915_gpu_turbo_enable(void)
+{
+	struct drm_i915_private *dev_priv;
+	bool ret = true;
+
+	spin_lock_irq(&mchdev_lock);
+	if (!i915_mch_dev) {
+		ret = false;
+		goto out_unlock;
+	}
+	dev_priv = i915_mch_dev;
+
+	dev_priv->rps.max_freq_softlimit = dev_priv->rps.max_freq;
+	dev_priv->rps.enabled = true;
+
+out_unlock:
+	spin_unlock_irq(&mchdev_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(i915_gpu_turbo_enable);
+
 /**
  * Tells the intel_ips driver that the i915 driver is now loaded, if
  * IPS got loaded first.
@@ -6844,7 +6892,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 {
 	struct request_boost *boost;
 
-	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
+	if (rq == NULL || INTEL_INFO(dev)->gen < 5)
 		return;
 
 	if (i915_gem_request_completed(rq, true))
diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index e2065e06a3f3..16030fbbd611 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -338,6 +338,7 @@ struct ips_driver {
 	bool (*gpu_lower)(void);
 	bool (*gpu_busy)(void);
 	bool (*gpu_turbo_disable)(void);
+	bool (*gpu_turbo_enable)(void);
 
 	/* For restoration at unload */
 	u64 orig_turbo_limit;
@@ -577,7 +578,11 @@ static void ips_enable_gpu_turbo(struct ips_driver *ips)
 {
 	if (ips->__gpu_turbo_on)
 		return;
-	ips->__gpu_turbo_on = true;
+
+	if (!ips->gpu_turbo_enable())
+		dev_err(&ips->dev->dev, "failed to enable graphics turbo\n");
+	else
+		ips->__gpu_turbo_on = true;
 }
 
 /**
@@ -1438,9 +1443,14 @@ static bool ips_get_i915_syms(struct ips_driver *ips)
 	ips->gpu_turbo_disable = symbol_get(i915_gpu_turbo_disable);
 	if (!ips->gpu_turbo_disable)
 		goto out_put_busy;
+	ips->gpu_turbo_enable = symbol_get(i915_gpu_turbo_enable);
+	if (!ips->gpu_turbo_enable)
+		goto out_put_disable;
 
 	return true;
 
+out_put_disable:
+	symbol_put(i915_gpu_turbo_disable);
 out_put_busy:
 	symbol_put(i915_gpu_busy);
 out_put_lower:
@@ -1702,6 +1712,8 @@ static void ips_remove(struct pci_dev *dev)
 		symbol_put(i915_gpu_busy);
 	if (ips->gpu_turbo_disable)
 		symbol_put(i915_gpu_turbo_disable);
+	if (ips->gpu_turbo_enable)
+		symbol_put(i915_gpu_turbo_enable);
 
 	rdmsrl(TURBO_POWER_CURRENT_LIMIT, turbo_override);
 	turbo_override &= ~(TURBO_TDC_OVR_EN | TURBO_TDP_OVR_EN);
diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 595f85c392ac..406710c30658 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -35,6 +35,7 @@ extern bool i915_gpu_raise(void);
 extern bool i915_gpu_lower(void);
 extern bool i915_gpu_busy(void);
 extern bool i915_gpu_turbo_disable(void);
+extern bool i915_gpu_turbo_enable(void);
 
 /*
  * The Bridge device's PCI config space has information about the
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency
  2015-04-27 12:41 RPS tuning Chris Wilson
                   ` (14 preceding siblings ...)
  2015-04-27 12:41 ` [PATCH 15/16] drm/i915, intel_ips: Enable GPU wait-boosting with IPS Chris Wilson
@ 2015-04-27 12:41 ` Chris Wilson
  2015-05-04 14:51   ` Daniel Vetter
  2015-05-21 12:55   ` Daniel Vetter
  15 siblings, 2 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-27 12:41 UTC (permalink / raw)
  To: intel-gfx

Ignore the restriction imposed by the user for when the GPU is stalling
the clients and dropping frames. We will return back to the user limits
immediately once the stall is over.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0cc9e95f70d3..9205b5fe2186 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4161,7 +4161,7 @@ void intel_rps_boost(struct drm_i915_private *dev_priv,
 		rps = NULL;
 
 	mutex_lock(&dev_priv->rps.hw_lock);
-	val = dev_priv->rps.max_freq_softlimit;
+	val = dev_priv->rps.max_freq;
 	if (dev_priv->rps.enabled &&
 	    dev_priv->mm.busy &&
 	    dev_priv->rps.cur_freq < val &&
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations
  2015-04-27 12:41 ` [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-04-29 13:51   ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-04-29 13:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Lionel Landwerlin


On 04/27/2015 01:41 PM, Chris Wilson wrote:
> Currently, we only track the last request globally across all engines.
> This prevents us from issuing concurrent read requests on e.g. the RCS
> and BCS engines (or more likely the render and media engines). Without
> semaphores, we incur costly stalls as we synchronise between rings -
> greatly impacting the current performance of Broadwell versus Haswell in
> certain workloads (like video decode). With the introduction of
> reference counted requests, it is much easier to track the last request
> per ring, as well as the last global write request so that we can
> optimise inter-engine read read requests (as well as better optimise
> certain CPU waits).
>
> v2: Fix inverted readonly condition for nonblocking waits.
> v3: Handle non-continguous engine array after waits
> v4: Rebase, tidy, rewrite ring list debugging
> v5: Use obj->active as a bitfield, it looks cool
> v6: Micro-optimise, mostly involving moving code around
> v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
> v8: Rebase
> v9: Refactor i915_gem_object_sync() to allow the compiler to better
> optimise it.

Looks OK, you can upgrade my r-b to v9.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf
  2015-04-27 12:41 ` [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
@ 2015-04-29 14:03   ` Tvrtko Ursulin
  2015-04-29 14:22     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-04-29 14:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

On 04/27/2015 01:41 PM, Chris Wilson wrote:
> This trims a little overhead from the common case of not needing to
> synchronize between rings.
>
> v2: execlists is special and likes to duplicate code.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++---
>   drivers/gpu/drm/i915/intel_lrc.c           |  9 ++++++---
>   2 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 45d74da7f6c4..4fe09568089b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -889,6 +889,7 @@ static int
>   i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>   				struct list_head *vmas)
>   {
> +	const unsigned other_rings = ~intel_ring_flag(ring);
>   	struct i915_vma *vma;
>   	uint32_t flush_domains = 0;
>   	bool flush_chipset = false;
> @@ -896,9 +897,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>
>   	list_for_each_entry(vma, vmas, exec_list) {
>   		struct drm_i915_gem_object *obj = vma->obj;
> -		ret = i915_gem_object_sync(obj, ring);
> -		if (ret)
> -			return ret;
> +
> +		if (obj->active & other_rings) {
> +			ret = i915_gem_object_sync(obj, ring);
> +			if (ret)
> +				return ret;
> +		}

Just to avoid bailing out near the top of i915_gem_object_sync?

Anyway,

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf
  2015-04-29 14:03   ` Tvrtko Ursulin
@ 2015-04-29 14:22     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-29 14:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Apr 29, 2015 at 03:03:43PM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 04/27/2015 01:41 PM, Chris Wilson wrote:
> >This trims a little overhead from the common case of not needing to
> >synchronize between rings.
> >
> >v2: execlists is special and likes to duplicate code.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++---
> >  drivers/gpu/drm/i915/intel_lrc.c           |  9 ++++++---
> >  2 files changed, 13 insertions(+), 6 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 45d74da7f6c4..4fe09568089b 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -889,6 +889,7 @@ static int
> >  i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >  				struct list_head *vmas)
> >  {
> >+	const unsigned other_rings = ~intel_ring_flag(ring);
> >  	struct i915_vma *vma;
> >  	uint32_t flush_domains = 0;
> >  	bool flush_chipset = false;
> >@@ -896,9 +897,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >
> >  	list_for_each_entry(vma, vmas, exec_list) {
> >  		struct drm_i915_gem_object *obj = vma->obj;
> >-		ret = i915_gem_object_sync(obj, ring);
> >-		if (ret)
> >-			return ret;
> >+
> >+		if (obj->active & other_rings) {
> >+			ret = i915_gem_object_sync(obj, ring);
> >+			if (ret)
> >+				return ret;
> >+		}
> 
> Just to avoid bailing out near the top of i915_gem_object_sync?

Yes. Every function where we spend more time in the push/pop preamble
than in the body is a function I don't want to call :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level
  2015-04-27 12:41 ` [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level Chris Wilson
@ 2015-04-29 14:50   ` Tvrtko Ursulin
  2015-04-29 15:15     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-04-29 14:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/27/2015 01:41 PM, Chris Wilson wrote:
> Since the remove of the pin-ioctl, we only care about not changing the
> cache level on buffers pinned to the hardware as indicated by
> obj->pin_display. So we can safely replace i915_gem_object_is_pinned()
> here with a plain obj->pin_display check. During rebinding, we will check
> sanity checks in case vma->pin_count is erroneously set.
>
> At the same time, we can micro-optimise GTT mmap() behaviour since we
> only need to relinquish the mmaps before Sandybridge.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 33 +++++++++++++++++++++------------
>   1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f0a6d03e9ba5..afdb604e4005 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3768,31 +3768,34 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   {
>   	struct drm_device *dev = obj->base.dev;
>   	struct i915_vma *vma, *next;
> +	bool bound = false;
>   	int ret;
>
>   	if (obj->cache_level == cache_level)
>   		return 0;
>
> -	if (i915_gem_obj_is_pinned(obj)) {
> +	if (obj->pin_display) {
>   		DRM_DEBUG("can not change the cache level of pinned objects\n");
>   		return -EBUSY;
>   	}
>
>   	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
> +		if (!drm_mm_node_allocated(&vma->node))
> +			continue;
> +

Another micro-optimisation?

Side note - this was puzzling and then I realized 
i915_gem_valid_gtt_space says true when node is not allocated. It seems 
to be only concerned by node colouring - so is that one badly name 
function or I am missing something?

>   		if (!i915_gem_valid_gtt_space(vma, cache_level)) {
>   			ret = i915_vma_unbind(vma);
>   			if (ret)
>   				return ret;
> -		}
> +		} else
> +			bound = true;
>   	}
>
> -	if (i915_gem_obj_bound_any(obj)) {
> +	if (bound) {
>   		ret = i915_gem_object_finish_gpu(obj);
>   		if (ret)
>   			return ret;
>
> -		i915_gem_object_finish_gtt(obj);
> -
>   		/* Before SandyBridge, you could not use tiling or fence
>   		 * registers with snooped memory, so relinquish any fences
>   		 * currently pointing to our region in the aperture.
> @@ -3801,15 +3804,21 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   			ret = i915_gem_object_put_fence(obj);
>   			if (ret)
>   				return ret;
> +
> +			i915_gem_release_mmap(obj);
>   		}

If only < gen6 we need to drop the mmap, what happens with the domain 
tracking - who removes the GTT bit from there now?

>
> -		list_for_each_entry(vma, &obj->vma_list, vma_link)
> -			if (drm_mm_node_allocated(&vma->node)) {
> -				ret = i915_vma_bind(vma, cache_level,
> -						    PIN_UPDATE);
> -				if (ret)
> -					return ret;
> -			}
> +		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +			if (!drm_mm_node_allocated(&vma->node))
> +				continue;
> +
> +			if (vma->pin_count)
> +				return -EBUSY;

Preserve DRM_DEBUG as it was before?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture
  2015-04-27 12:41 ` [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture Chris Wilson
@ 2015-04-29 15:05   ` Tvrtko Ursulin
  2015-04-29 15:48     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-04-29 15:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/27/2015 01:41 PM, Chris Wilson wrote:
> With the removal of the pin_ioctl, we need only consider
> obj->pin_display when looking at available aperture space.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index afdb604e4005..ec9e36e9ec78 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -157,7 +157,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>   	pinned = 0;
>   	mutex_lock(&dev->struct_mutex);
>   	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> -		if (i915_gem_obj_is_pinned(obj))
> +		if (obj->pin_display)
>   			pinned += i915_gem_obj_ggtt_size(obj);
>   	mutex_unlock(&dev->struct_mutex);

The only thing I can think of are transients from execbuf, pre-ppgtt, 
but I suppose we don't care about that a lot? Or I misunderstand how 
something works?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level
  2015-04-29 14:50   ` Tvrtko Ursulin
@ 2015-04-29 15:15     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-29 15:15 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Apr 29, 2015 at 03:50:13PM +0100, Tvrtko Ursulin wrote:
> 
> On 04/27/2015 01:41 PM, Chris Wilson wrote:
> >Since the remove of the pin-ioctl, we only care about not changing the
> >cache level on buffers pinned to the hardware as indicated by
> >obj->pin_display. So we can safely replace i915_gem_object_is_pinned()
> >here with a plain obj->pin_display check. During rebinding, we will check
> >sanity checks in case vma->pin_count is erroneously set.
> >
> >At the same time, we can micro-optimise GTT mmap() behaviour since we
> >only need to relinquish the mmaps before Sandybridge.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 33 +++++++++++++++++++++------------
> >  1 file changed, 21 insertions(+), 12 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index f0a6d03e9ba5..afdb604e4005 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3768,31 +3768,34 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  {
> >  	struct drm_device *dev = obj->base.dev;
> >  	struct i915_vma *vma, *next;
> >+	bool bound = false;
> >  	int ret;
> >
> >  	if (obj->cache_level == cache_level)
> >  		return 0;
> >
> >-	if (i915_gem_obj_is_pinned(obj)) {
> >+	if (obj->pin_display) {
> >  		DRM_DEBUG("can not change the cache level of pinned objects\n");
> >  		return -EBUSY;
> >  	}
> >
> >  	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
> >+		if (!drm_mm_node_allocated(&vma->node))
> >+			continue;
> >+
> 
> Another micro-optimisation?
> 
> Side note - this was puzzling and then I realized
> i915_gem_valid_gtt_space says true when node is not allocated. It
> seems to be only concerned by node colouring - so is that one badly
> name function or I am missing something?

It's only concerned with colouring, yes. I felt like moving the two
checks together but it also acts as a sanity check in i915_vma_insert.
So it's not a micro-optimisation as such, except that I had two
conflicting uses and this was the easiest path to take.
 
> >  		if (!i915_gem_valid_gtt_space(vma, cache_level)) {
> >  			ret = i915_vma_unbind(vma);
> >  			if (ret)
> >  				return ret;
> >-		}
> >+		} else
> >+			bound = true;
> >  	}
> >
> >-	if (i915_gem_obj_bound_any(obj)) {
> >+	if (bound) {
> >  		ret = i915_gem_object_finish_gpu(obj);
> >  		if (ret)
> >  			return ret;
> >
> >-		i915_gem_object_finish_gtt(obj);
> >-
> >  		/* Before SandyBridge, you could not use tiling or fence
> >  		 * registers with snooped memory, so relinquish any fences
> >  		 * currently pointing to our region in the aperture.
> >@@ -3801,15 +3804,21 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  			ret = i915_gem_object_put_fence(obj);
> >  			if (ret)
> >  				return ret;
> >+
> >+			i915_gem_release_mmap(obj);
> >  		}
> 
> If only < gen6 we need to drop the mmap, what happens with the
> domain tracking - who removes the GTT bit from there now?

We don't care. The GTT domain bits remains valid. Access through the GTT
will remain consistent as we change the caching bits - except that on
snooped architectures access may suddenly become verboten and in those
cases we should throwaway the mmap and force the client to fault them
back in (so we can do the sanity checks in i915_gem_fault() and throw
SIGBUS to protect ourself).

To be pendantic we should split this into:


if (gen < 6)
	i915_gem_put_fence();
if (!HAS_LLC && cache_level != I915_CACHE_NONE)
	i915_gem_release_mmap(obj);

> >-		list_for_each_entry(vma, &obj->vma_list, vma_link)
> >-			if (drm_mm_node_allocated(&vma->node)) {
> >-				ret = i915_vma_bind(vma, cache_level,
> >-						    PIN_UPDATE);
> >-				if (ret)
> >-					return ret;
> >-			}
> >+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> >+			if (!drm_mm_node_allocated(&vma->node))
> >+				continue;
> >+
> >+			if (vma->pin_count)
> >+				return -EBUSY;
> 
> Preserve DRM_DEBUG as it was before?

Oh, that can actually die. of be if (WARN_ON()) since the preposition is
that I have already checked for the only vma that can be legally pinned
at this point.

Bah, this entire patch is bogus - because of how set_cache_level is used
by pin_to_display(). Whoops. time to thow it out and start again.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture
  2015-04-29 15:05   ` Tvrtko Ursulin
@ 2015-04-29 15:48     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-04-29 15:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Apr 29, 2015 at 04:05:40PM +0100, Tvrtko Ursulin wrote:
> 
> On 04/27/2015 01:41 PM, Chris Wilson wrote:
> >With the removal of the pin_ioctl, we need only consider
> >obj->pin_display when looking at available aperture space.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index afdb604e4005..ec9e36e9ec78 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -157,7 +157,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
> >  	pinned = 0;
> >  	mutex_lock(&dev->struct_mutex);
> >  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> >-		if (i915_gem_obj_is_pinned(obj))
> >+		if (obj->pin_display)
> >  			pinned += i915_gem_obj_ggtt_size(obj);
> >  	mutex_unlock(&dev->struct_mutex);
> 
> The only thing I can think of are transients from execbuf,
> pre-ppgtt, but I suppose we don't care about that a lot? Or I
> misunderstand how something works?

They are only pinned whilst we process reservations for the batch. The
only things that are permenantly pinned are the display objects, which
hardware is pointed at. Hmm, I guess they are actually a few others we
do care about such as overlay regs and legacy ringbuffer objs, default
contects and status pages and the like. On the other hand perma-pinning
into the GGTT is now quite rare, so perhaps it is time to start tracking
it again.

I guess I am going to have to revisit this patch (my goal is try and
clarify what i915_gem_obj_is_pinned() means by eliminating it ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info
  2015-04-27 12:41 ` [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info Chris Wilson
@ 2015-05-04 14:36   ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-04 14:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:19PM +0100, Chris Wilson wrote:
> Expose some more of our internal RPS bookkeeping for debugging.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ab55641a0b78..2cbb3e9266f0 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1217,12 +1217,17 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
>  			   GEN6_CURBSYTAVG_MASK);
>  		seq_printf(m, "RP PREV UP: %dus\n", rpprevup &
>  			   GEN6_CURBSYTAVG_MASK);
> +		seq_printf(m, "Up threshold: %d%%\n",
> +			   dev_priv->rps.up_threshold);
> +
>  		seq_printf(m, "RP CUR DOWN EI: %dus\n", rpdownei &
>  			   GEN6_CURIAVG_MASK);
>  		seq_printf(m, "RP CUR DOWN: %dus\n", rpcurdown &
>  			   GEN6_CURBSYTAVG_MASK);
>  		seq_printf(m, "RP PREV DOWN: %dus\n", rpprevdown &
>  			   GEN6_CURBSYTAVG_MASK);
> +		seq_printf(m, "Down threshold: %d%%\n",
> +			   dev_priv->rps.down_threshold);
>  
>  		max_freq = (rp_state_cap & 0xff0000) >> 16;
>  		max_freq *= (IS_SKYLAKE(dev) ? GEN9_FREQ_SCALER : 1);
> @@ -1238,12 +1243,21 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
>  		max_freq *= (IS_SKYLAKE(dev) ? GEN9_FREQ_SCALER : 1);
>  		seq_printf(m, "Max non-overclocked (RP0) frequency: %dMHz\n",
>  			   intel_gpu_freq(dev_priv, max_freq));
> -
>  		seq_printf(m, "Max overclocked frequency: %dMHz\n",
>  			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
>  
> +		seq_printf(m, "Current freq: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
> +		seq_printf(m, "Actual freq: %d MHz\n", cagf);
>  		seq_printf(m, "Idle freq: %d MHz\n",
>  			   intel_gpu_freq(dev_priv, dev_priv->rps.idle_freq));
> +		seq_printf(m, "Min freq: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, dev_priv->rps.min_freq));
> +		seq_printf(m, "Max freq: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));
> +		seq_printf(m,
> +			   "efficient (RPe) frequency: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq));
>  	} else if (IS_VALLEYVIEW(dev)) {
>  		u32 freq_sts;
>  
> @@ -1252,6 +1266,12 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
>  		seq_printf(m, "PUNIT_REG_GPU_FREQ_STS: 0x%08x\n", freq_sts);
>  		seq_printf(m, "DDR freq: %d MHz\n", dev_priv->mem_freq);
>  
> +		seq_printf(m, "actual GPU freq: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, (freq_sts >> 8) & 0xff));
> +
> +		seq_printf(m, "current GPU freq: %d MHz\n",
> +			   intel_gpu_freq(dev_priv, dev_priv->rps.cur_freq));
> +
>  		seq_printf(m, "max GPU freq: %d MHz\n",
>  			   intel_gpu_freq(dev_priv, dev_priv->rps.max_freq));

Refactoring the common code here would be pretty. Anyway, queued for
-next, thanks for the patch.
-Daniel

>  
> @@ -1264,9 +1284,6 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
>  		seq_printf(m,
>  			   "efficient (RPe) frequency: %d MHz\n",
>  			   intel_gpu_freq(dev_priv, dev_priv->rps.efficient_freq));
> -
> -		seq_printf(m, "current GPU freq: %d MHz\n",
> -			   intel_gpu_freq(dev_priv, (freq_sts >> 8) & 0xff));
>  		mutex_unlock(&dev_priv->rps.hw_lock);
>  	} else {
>  		seq_puts(m, "no P-state info available\n");
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
  2015-04-27 12:41 ` [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
@ 2015-05-04 14:38   ` Daniel Vetter
  2015-05-04 14:46     ` Daniel Vetter
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-05-04 14:38 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:20PM +0100, Chris Wilson wrote:
> Ring switches can occur many times per frame, and are often out of
> control, causing frequent RPS boosting for no practical benefit. Treat
> the sw semaphore synchronisation as a separate client and only allow it
> to boost once per busy/idle cycle.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  1 +
>  drivers/gpu/drm/i915/i915_drv.h     | 34 ++++++++++++++++++----------------
>  drivers/gpu/drm/i915/i915_gem.c     |  7 +++++--
>  drivers/gpu/drm/i915/intel_pm.c     |  1 +
>  4 files changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2cbb3e9266f0..1d68e3ecaa00 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2311,6 +2311,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>  			   list_empty(&file_priv->rps_boost) ? "" : ", active");
>  		rcu_read_unlock();
>  	}
> +	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
>  	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
>  
>  	mutex_unlock(&dev_priv->rps.hw_lock);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index caee59bf94ba..415a8e756e48 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -272,6 +272,22 @@ struct drm_i915_private;
>  struct i915_mm_struct;
>  struct i915_mmu_object;
>  
> +struct drm_i915_file_private {
> +	struct drm_i915_private *dev_priv;
> +	struct drm_file *file;
> +
> +	struct {
> +		spinlock_t lock;
> +		struct list_head request_list;
> +	} mm;
> +	struct idr context_idr;
> +
> +	struct list_head rps_boost;
> +	struct intel_engine_cs *bsd_ring;
> +
> +	unsigned rps_boosts;
> +};
> +
>  enum intel_dpll_id {
>  	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
>  	/* real shared dpll ids must be >= 0 */
> @@ -1054,6 +1070,8 @@ struct intel_gen6_power_mgmt {
>  	struct list_head clients;
>  	unsigned boosts;
>  
> +	struct drm_i915_file_private semaphores;

Still not in favour of reusing the entire file_private here. Imo
extracting the rps_* stuff from that into a separate struct would be
better and much less confusing.
-Daniel

> +
>  	/* manual wa residency calculations */
>  	struct intel_rps_ei up_ei, down_ei;
>  
> @@ -2191,22 +2209,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   * a later patch when the call to i915_seqno_passed() is obsoleted...
>   */
>  
> -struct drm_i915_file_private {
> -	struct drm_i915_private *dev_priv;
> -	struct drm_file *file;
> -
> -	struct {
> -		spinlock_t lock;
> -		struct list_head request_list;
> -	} mm;
> -	struct idr context_idr;
> -
> -	struct list_head rps_boost;
> -	struct intel_engine_cs *bsd_ring;
> -
> -	unsigned rps_boosts;
> -};
> -
>  /*
>   * A command that requires special handling by the command parser.
>   */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d60eca03e306..90c33912ffd5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3036,9 +3036,12 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  		return ret;
>  
>  	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> +		struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  		ret = __i915_wait_request(rq,
> -					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
> -					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
> +					  atomic_read(&i915->gpu_error.reset_counter),
> +					  i915->mm.interruptible,
> +					  NULL,
> +					  &i915->rps.semaphores);
>  		if (ret)
>  			return ret;
>  
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index a7516ed24eee..8dc158adba14 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6861,6 +6861,7 @@ void intel_pm_setup(struct drm_device *dev)
>  	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
>  			  intel_gen6_powersave_work);
>  	INIT_LIST_HEAD(&dev_priv->rps.clients);
> +	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
>  
>  	dev_priv->pm.suspended = false;
>  }
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
  2015-05-04 14:38   ` Daniel Vetter
@ 2015-05-04 14:46     ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-04 14:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, May 04, 2015 at 04:38:02PM +0200, Daniel Vetter wrote:
> On Mon, Apr 27, 2015 at 01:41:20PM +0100, Chris Wilson wrote:
> > Ring switches can occur many times per frame, and are often out of
> > control, causing frequent RPS boosting for no practical benefit. Treat
> > the sw semaphore synchronisation as a separate client and only allow it
> > to boost once per busy/idle cycle.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c |  1 +
> >  drivers/gpu/drm/i915/i915_drv.h     | 34 ++++++++++++++++++----------------
> >  drivers/gpu/drm/i915/i915_gem.c     |  7 +++++--
> >  drivers/gpu/drm/i915/intel_pm.c     |  1 +
> >  4 files changed, 25 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 2cbb3e9266f0..1d68e3ecaa00 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -2311,6 +2311,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
> >  			   list_empty(&file_priv->rps_boost) ? "" : ", active");
> >  		rcu_read_unlock();
> >  	}
> > +	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
> >  	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
> >  
> >  	mutex_unlock(&dev_priv->rps.hw_lock);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index caee59bf94ba..415a8e756e48 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -272,6 +272,22 @@ struct drm_i915_private;
> >  struct i915_mm_struct;
> >  struct i915_mmu_object;
> >  
> > +struct drm_i915_file_private {
> > +	struct drm_i915_private *dev_priv;
> > +	struct drm_file *file;
> > +
> > +	struct {
> > +		spinlock_t lock;
> > +		struct list_head request_list;
> > +	} mm;
> > +	struct idr context_idr;
> > +
> > +	struct list_head rps_boost;
> > +	struct intel_engine_cs *bsd_ring;
> > +
> > +	unsigned rps_boosts;
> > +};
> > +
> >  enum intel_dpll_id {
> >  	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
> >  	/* real shared dpll ids must be >= 0 */
> > @@ -1054,6 +1070,8 @@ struct intel_gen6_power_mgmt {
> >  	struct list_head clients;
> >  	unsigned boosts;
> >  
> > +	struct drm_i915_file_private semaphores;
> 
> Still not in favour of reusing the entire file_private here. Imo
> extracting the rps_* stuff from that into a separate struct would be
> better and much less confusing.

Ah it's in a follow-up so I'm all happy, except that all the patches kinda
depend on each another and not all of the seemingly required prereqs for
are reviewed yet. So unfortunately I need to pass on these for now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency
  2015-04-27 12:41 ` [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency Chris Wilson
@ 2015-05-04 14:51   ` Daniel Vetter
  2015-05-04 14:58     ` Chris Wilson
  2015-05-21 12:55   ` Daniel Vetter
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-05-04 14:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:27PM +0100, Chris Wilson wrote:
> Ignore the restriction imposed by the user for when the GPU is stalling
> the clients and dropping frames. We will return back to the user limits
> immediately once the stall is over.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Unsure about this one since I don't know who's all using the sysfs
interface. Probably needs acks from people who actually do use it that we
don't accidentally break their setup - which could happen right at the
border where we might miss an occasional frame every few seconds or so.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_pm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 0cc9e95f70d3..9205b5fe2186 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4161,7 +4161,7 @@ void intel_rps_boost(struct drm_i915_private *dev_priv,
>  		rps = NULL;
>  
>  	mutex_lock(&dev_priv->rps.hw_lock);
> -	val = dev_priv->rps.max_freq_softlimit;
> +	val = dev_priv->rps.max_freq;
>  	if (dev_priv->rps.enabled &&
>  	    dev_priv->mm.busy &&
>  	    dev_priv->rps.cur_freq < val &&
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency
  2015-05-04 14:51   ` Daniel Vetter
@ 2015-05-04 14:58     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-05-04 14:58 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, May 04, 2015 at 04:51:24PM +0200, Daniel Vetter wrote:
> On Mon, Apr 27, 2015 at 01:41:27PM +0100, Chris Wilson wrote:
> > Ignore the restriction imposed by the user for when the GPU is stalling
> > the clients and dropping frames. We will return back to the user limits
> > immediately once the stall is over.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Unsure about this one since I don't know who's all using the sysfs
> interface. Probably needs acks from people who actually do use it that we
> don't accidentally break their setup - which could happen right at the
> border where we might miss an occasional frame every few seconds or so.

Don't mind, this was more of a thought experiment (principally to
circumvent the IPS limiting in the previous patch for ILK). I was
tempted to introduce a boost frequency for separate control.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu()
  2015-04-27 12:41 ` [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() Chris Wilson
@ 2015-05-11 16:43   ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-11 16:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:14PM +0100, Chris Wilson wrote:
> We no longer interpolate domains in the same manner, and even if we did,
> we should trust setting either of the other write domains would trigger
> an invalidation rather than force it. Remove the tweaking of the
> read_domains since it serves no purpose and use
> i915_gem_object_wait_rendering() directly.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Merged with a bit of history digging added.

Thanks, Daniel
> ---
>  drivers/gpu/drm/i915/i915_drv.h      |  4 +++-
>  drivers/gpu/drm/i915/i915_gem.c      | 26 +++-----------------------
>  drivers/gpu/drm/i915/intel_display.c | 13 ++++++++-----
>  3 files changed, 14 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f3c77ca6b838..c9161c3d1e34 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2791,7 +2791,6 @@ static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv)
>  
>  void i915_gem_reset(struct drm_device *dev);
>  bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
> -int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
>  int i915_gem_init_rings(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
> @@ -2813,6 +2812,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  int __must_check i915_wait_request(struct drm_i915_gem_request *req);
>  int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
>  int __must_check
> +i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
> +			       bool readonly);
> +int __must_check
>  i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
>  				  bool write);
>  int __must_check
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ec9e36e9ec78..cd0ac1b54122 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -40,9 +40,6 @@
>  
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> -static __must_check int
> -i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
> -			       bool readonly);
>  static void
>  i915_gem_object_retire(struct drm_i915_gem_object *obj);
>  
> @@ -1399,7 +1396,7 @@ i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
>   * Ensures that all rendering to the object has completed and the object is
>   * safe to unbind from the GTT or access from the CPU.
>   */
> -static __must_check int
> +int
>  i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>  			       bool readonly)
>  {
> @@ -3039,7 +3036,7 @@ int i915_vma_unbind(struct i915_vma *vma)
>  
>  	BUG_ON(obj->pages == NULL);
>  
> -	ret = i915_gem_object_finish_gpu(obj);
> +	ret = i915_gem_object_wait_rendering(obj, false);
>  	if (ret)
>  		return ret;
>  	/* Continue on if we fail due to EIO, the GPU is hung so we
> @@ -3792,7 +3789,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  	}
>  
>  	if (bound) {
> -		ret = i915_gem_object_finish_gpu(obj);
> +		ret = i915_gem_object_wait_rendering(obj, false);
>  		if (ret)
>  			return ret;
>  
> @@ -3996,23 +3993,6 @@ i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
>  	obj->pin_display--;
>  }
>  
> -int
> -i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj)
> -{
> -	int ret;
> -
> -	if ((obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0)
> -		return 0;
> -
> -	ret = i915_gem_object_wait_rendering(obj, false);
> -	if (ret)
> -		return ret;
> -
> -	/* Ensure that we invalidate the GPU's caches and TLBs. */
> -	obj->base.read_domains &= ~I915_GEM_GPU_DOMAINS;
> -	return 0;
> -}
> -
>  /**
>   * Moves a single object to the CPU read, and possibly write domain.
>   *
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 3094b0807b40..4334e59bf0aa 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3235,27 +3235,30 @@ void intel_finish_reset(struct drm_device *dev)
>  	drm_modeset_unlock_all(dev);
>  }
>  
> -static int
> +static void
>  intel_finish_fb(struct drm_framebuffer *old_fb)
>  {
>  	struct drm_i915_gem_object *obj = intel_fb_obj(old_fb);
> -	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>  	bool was_interruptible = dev_priv->mm.interruptible;
>  	int ret;
>  
>  	/* Big Hammer, we also need to ensure that any pending
>  	 * MI_WAIT_FOR_EVENT inside a user batch buffer on the
>  	 * current scanout is retired before unpinning the old
> -	 * framebuffer.
> +	 * framebuffer. Note that we rely on userspace rendering
> +	 * into the buffer attached to the pipe they are waiting
> +	 * on. If not, userspace generates a GPU hang with IPEHR
> +	 * point to the MI_WAIT_FOR_EVENT.
>  	 *
>  	 * This should only fail upon a hung GPU, in which case we
>  	 * can safely continue.
>  	 */
>  	dev_priv->mm.interruptible = false;
> -	ret = i915_gem_object_finish_gpu(obj);
> +	ret = i915_gem_object_wait_rendering(obj, true);
>  	dev_priv->mm.interruptible = was_interruptible;
>  
> -	return ret;
> +	WARN_ON(ret);
>  }
>  
>  static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips
  2015-04-27 12:41 ` [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
@ 2015-05-11 16:46   ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-11 16:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:15PM +0100, Chris Wilson wrote:
> Synchronising to an object active on the same ring is a no-op, for the
> benefit of execbuffer scheduler. However, for CS flips this means that
> we can forgo checking whether the last write request of the object is
> actually queued and more importantly whether the cache flush for the
> write was emitted.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Nice catch. Hopefully this all goes soon away once we preallocate request.

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 4334e59bf0aa..e3399af8e0cd 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -10898,6 +10898,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>  		i915_gem_request_assign(&work->flip_queued_req,
>  					obj->last_write_req);
>  	} else {
> +		if (obj->last_write_req) {
> +			ret = i915_gem_check_olr(obj->last_write_req);
> +			if (ret)
> +				goto cleanup_unpin;
> +		}
> +
>  		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
>  						   page_flip_flags);
>  		if (ret)
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-04-27 12:41 ` [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
@ 2015-05-11 16:51   ` Daniel Vetter
  2015-05-11 20:23     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-05-11 16:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Ander Conselvan de Oliveira, intel-gfx

On Mon, Apr 27, 2015 at 01:41:16PM +0100, Chris Wilson wrote:
> As we perform the mmio-flip without any locking and then try to acquire
> the struct_mutex prior to dereferencing the request, it is possible for
> userspace to queue a new pageflip before the worker can finish clearing
> the old state - and then it will clear the new flip request. The result
> is that the new flip could be completed before the GPU has finished
> rendering.
> 
> The bugs stems from removing the seqno checking in
> commit 536f5b5e86b225dab94c7ff8061ae482b6077387
> Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
> Date:   Thu Nov 6 11:03:40 2014 +0200
> 
>     drm/i915: Make mmio flip wait for seqno in the work function
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

I think I grumbled about this before, but the rq vs. req distinction
elludes me. rq = runqueue in my reading ... What do we need to use "req"
for that we're forced to have such an ambigious name for requests?

Anyway, queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h      |  6 ++++--
>  drivers/gpu/drm/i915/intel_display.c | 39 ++++++++++++++++++------------------
>  drivers/gpu/drm/i915/intel_drv.h     |  4 ++--
>  3 files changed, 25 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c9161c3d1e34..0a412eaf7a31 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2136,10 +2136,12 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
>  	return req ? req->ring : NULL;
>  }
>  
> -static inline void
> +static inline struct drm_i915_gem_request *
>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>  {
> -	kref_get(&req->ref);
> +	if (req)
> +		kref_get(&req->ref);
> +	return req;
>  }
>  
>  static inline void
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index e3399af8e0cd..2ca71eeaf2d7 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -10651,22 +10651,18 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
>  
>  static void intel_mmio_flip_work_func(struct work_struct *work)
>  {
> -	struct intel_crtc *crtc =
> -		container_of(work, struct intel_crtc, mmio_flip.work);
> -	struct intel_mmio_flip *mmio_flip;
> +	struct intel_mmio_flip *mmio_flip =
> +		container_of(work, struct intel_mmio_flip, work);
>  
> -	mmio_flip = &crtc->mmio_flip;
> -	if (mmio_flip->req)
> -		WARN_ON(__i915_wait_request(mmio_flip->req,
> -					    crtc->reset_counter,
> -					    false, NULL, NULL) != 0);
> +	if (mmio_flip->rq)
> +		WARN_ON(__i915_wait_request(mmio_flip->rq,
> +					    mmio_flip->crtc->reset_counter,
> +					    false, NULL, NULL));
>  
> -	intel_do_mmio_flip(crtc);
> -	if (mmio_flip->req) {
> -		mutex_lock(&crtc->base.dev->struct_mutex);
> -		i915_gem_request_assign(&mmio_flip->req, NULL);
> -		mutex_unlock(&crtc->base.dev->struct_mutex);
> -	}
> +	intel_do_mmio_flip(mmio_flip->crtc);
> +
> +	i915_gem_request_unreference__unlocked(mmio_flip->rq);
> +	kfree(mmio_flip);
>  }
>  
>  static int intel_queue_mmio_flip(struct drm_device *dev,
> @@ -10676,12 +10672,17 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
>  				 struct intel_engine_cs *ring,
>  				 uint32_t flags)
>  {
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_mmio_flip *mmio_flip;
> +
> +	mmio_flip = kmalloc(sizeof(*mmio_flip), GFP_KERNEL);
> +	if (mmio_flip == NULL)
> +		return -ENOMEM;
>  
> -	i915_gem_request_assign(&intel_crtc->mmio_flip.req,
> -				obj->last_write_req);
> +	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
> +	mmio_flip->crtc = to_intel_crtc(crtc);
>  
> -	schedule_work(&intel_crtc->mmio_flip.work);
> +	INIT_WORK(&mmio_flip->work, intel_mmio_flip_work_func);
> +	schedule_work(&mmio_flip->work);
>  
>  	return 0;
>  }
> @@ -13669,8 +13670,6 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
>  	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
>  	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
>  
> -	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
> -
>  	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
>  
>  	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 23a42a40abae..7f8cce797ba2 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -468,8 +468,9 @@ struct intel_pipe_wm {
>  };
>  
>  struct intel_mmio_flip {
> -	struct drm_i915_gem_request *req;
>  	struct work_struct work;
> +	struct drm_i915_gem_request *rq;
> +	struct intel_crtc *crtc;
>  };
>  
>  struct skl_pipe_wm {
> @@ -554,7 +555,6 @@ struct intel_crtc {
>  	} wm;
>  
>  	int scanline_offset;
> -	struct intel_mmio_flip mmio_flip;
>  
>  	struct intel_crtc_atomic_commit atomic;
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-05-11 16:51   ` Daniel Vetter
@ 2015-05-11 20:23     ` Chris Wilson
  2015-05-12  8:43       ` Daniel Vetter
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2015-05-11 20:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Ander Conselvan de Oliveira, intel-gfx

On Mon, May 11, 2015 at 06:51:03PM +0200, Daniel Vetter wrote:
> On Mon, Apr 27, 2015 at 01:41:16PM +0100, Chris Wilson wrote:
> > As we perform the mmio-flip without any locking and then try to acquire
> > the struct_mutex prior to dereferencing the request, it is possible for
> > userspace to queue a new pageflip before the worker can finish clearing
> > the old state - and then it will clear the new flip request. The result
> > is that the new flip could be completed before the GPU has finished
> > rendering.
> > 
> > The bugs stems from removing the seqno checking in
> > commit 536f5b5e86b225dab94c7ff8061ae482b6077387
> > Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
> > Date:   Thu Nov 6 11:03:40 2014 +0200
> > 
> >     drm/i915: Make mmio flip wait for seqno in the work function
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
> 
> I think I grumbled about this before, but the rq vs. req distinction
> elludes me. rq = runqueue in my reading ... What do we need to use "req"
> for that we're forced to have such an ambigious name for requests?

Because I use rq everywhere and runqueues aren't very common in the
kernel? Besides which why did you change some of my _request to _req?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-05-11 20:23     ` Chris Wilson
@ 2015-05-12  8:43       ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-12  8:43 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Ander Conselvan de Oliveira

On Mon, May 11, 2015 at 10:23 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> I think I grumbled about this before, but the rq vs. req distinction
>> elludes me. rq = runqueue in my reading ... What do we need to use "req"
>> for that we're forced to have such an ambigious name for requests?
>
> Because I use rq everywhere and runqueues aren't very common in the
> kernel? Besides which why did you change some of my _request to _req?

Not intentionally, maybe as part of resolving a conflict. I haven't
decided yet what to do about the overall situation, but giving our irc
discussion I'm indeed inclined to run all patches through sed -e
's/\<rq\>/req/' before applying. Overall this really looks like a
bikeshed (with a slight bias towards req from grepping other sources),
and whomever's first wins with the color choice, just to avoid
conflicts.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 13/16] drm/i915: Free RPS boosts for all laggards
  2015-04-27 12:41 ` [PATCH 13/16] drm/i915: Free RPS boosts for all laggards Chris Wilson
@ 2015-05-21 12:50   ` Daniel Vetter
  2015-05-21 12:56   ` Daniel Vetter
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-21 12:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:24PM +0100, Chris Wilson wrote:
> If the client stalls on a congested request, chosen to be 20ms old to
> match throttling, allow the client a free RPS boost.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c  |  2 +-
>  drivers/gpu/drm/i915/intel_drv.h |  3 ++-
>  drivers/gpu/drm/i915/intel_pm.c  | 19 +++++++++++++++----
>  3 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 43ac75834e61..83bccb9f62d6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1247,7 +1247,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
>  
>  	if (INTEL_INFO(dev_priv)->gen >= 6)
> -		gen6_rps_boost(dev_priv, rps);
> +		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
>  
>  	/* Record current time in case interrupted by signal, or wedged */
>  	trace_i915_gem_request_wait_begin(req);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 9eb0a911343a..34cfa61f3321 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1361,7 +1361,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
>  void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
>  void gen6_rps_idle(struct drm_i915_private *dev_priv);
>  void gen6_rps_boost(struct drm_i915_private *dev_priv,
> -		    struct intel_rps_client *rps);
> +		    struct intel_rps_client *rps,
> +		    unsigned long submitted);
>  void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  				       struct drm_i915_gem_request *rq);
>  void ilk_wm_get_hw_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index dcc52f928650..850e02e1c7eb 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4122,10 +4122,17 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
>  }
>  
>  void gen6_rps_boost(struct drm_i915_private *dev_priv,
> -		    struct intel_rps_client *rps)
> +		    struct intel_rps_client *rps,
> +		    unsigned long submitted)
>  {
>  	u32 val;
>  
> +	/* Force a RPS boost (and don't count it against the client) if
> +	 * the GPU is severely congested.
> +	 */
> +	if (rps && time_after(jiffies, submitted + msecs_to_jiffies(20)))

A #define for these 20 ms used both here and in the throttle ioctl would
be neat, just to reinforce the connection. Can you please follow-up with
that patch?
-Daniel

> +		rps = NULL;
> +
>  	mutex_lock(&dev_priv->rps.hw_lock);
>  	val = dev_priv->rps.max_freq_softlimit;
>  	if (dev_priv->rps.enabled &&
> @@ -6825,11 +6832,12 @@ struct request_boost {
>  static void __intel_rps_boost_work(struct work_struct *work)
>  {
>  	struct request_boost *boost = container_of(work, struct request_boost, work);
> +	struct drm_i915_gem_request *rq = boost->rq;
>  
> -	if (!i915_gem_request_completed(boost->rq, true))
> -		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
> +	if (!i915_gem_request_completed(rq, true))
> +		gen6_rps_boost(to_i915(rq->ring->dev), 0, rq->emitted_jiffies);
>  
> -	i915_gem_request_unreference__unlocked(boost->rq);
> +	i915_gem_request_unreference__unlocked(rq);
>  	kfree(boost);
>  }
>  
> @@ -6841,6 +6849,9 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
>  		return;
>  
> +	if (i915_gem_request_completed(rq, true))
> +		return;
> +
>  	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
>  	if (boost == NULL)
>  		return;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency
  2015-04-27 12:41 ` [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency Chris Wilson
  2015-05-04 14:51   ` Daniel Vetter
@ 2015-05-21 12:55   ` Daniel Vetter
  2015-05-21 13:05     ` Chris Wilson
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-05-21 12:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:27PM +0100, Chris Wilson wrote:
> Ignore the restriction imposed by the user for when the GPU is stalling
> the clients and dropping frames. We will return back to the user limits
> immediately once the stall is over.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 0cc9e95f70d3..9205b5fe2186 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4161,7 +4161,7 @@ void intel_rps_boost(struct drm_i915_private *dev_priv,
>  		rps = NULL;
>  
>  	mutex_lock(&dev_priv->rps.hw_lock);
> -	val = dev_priv->rps.max_freq_softlimit;
> +	val = dev_priv->rps.max_freq;

On second thought this has the risk of going into the overclocking range,
which we probably don't want to do behind the user's back. I've left this
one out for now.
-Daniel

>  	if (dev_priv->rps.enabled &&
>  	    dev_priv->mm.busy &&
>  	    dev_priv->rps.cur_freq < val &&
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 13/16] drm/i915: Free RPS boosts for all laggards
  2015-04-27 12:41 ` [PATCH 13/16] drm/i915: Free RPS boosts for all laggards Chris Wilson
  2015-05-21 12:50   ` Daniel Vetter
@ 2015-05-21 12:56   ` Daniel Vetter
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-05-21 12:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Apr 27, 2015 at 01:41:24PM +0100, Chris Wilson wrote:
> If the client stalls on a congested request, chosen to be 20ms old to
> match throttling, allow the client a free RPS boost.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Merged up to this on with an s/rq/req/ per the patch I've just submitted.
I'm not sure about the two follow-up ilk patches, maybe hunt for a few
acks? Jesse might still care ;-)

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c  |  2 +-
>  drivers/gpu/drm/i915/intel_drv.h |  3 ++-
>  drivers/gpu/drm/i915/intel_pm.c  | 19 +++++++++++++++----
>  3 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 43ac75834e61..83bccb9f62d6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1247,7 +1247,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
>  
>  	if (INTEL_INFO(dev_priv)->gen >= 6)
> -		gen6_rps_boost(dev_priv, rps);
> +		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
>  
>  	/* Record current time in case interrupted by signal, or wedged */
>  	trace_i915_gem_request_wait_begin(req);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 9eb0a911343a..34cfa61f3321 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1361,7 +1361,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
>  void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
>  void gen6_rps_idle(struct drm_i915_private *dev_priv);
>  void gen6_rps_boost(struct drm_i915_private *dev_priv,
> -		    struct intel_rps_client *rps);
> +		    struct intel_rps_client *rps,
> +		    unsigned long submitted);
>  void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  				       struct drm_i915_gem_request *rq);
>  void ilk_wm_get_hw_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index dcc52f928650..850e02e1c7eb 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -4122,10 +4122,17 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
>  }
>  
>  void gen6_rps_boost(struct drm_i915_private *dev_priv,
> -		    struct intel_rps_client *rps)
> +		    struct intel_rps_client *rps,
> +		    unsigned long submitted)
>  {
>  	u32 val;
>  
> +	/* Force a RPS boost (and don't count it against the client) if
> +	 * the GPU is severely congested.
> +	 */
> +	if (rps && time_after(jiffies, submitted + msecs_to_jiffies(20)))
> +		rps = NULL;
> +
>  	mutex_lock(&dev_priv->rps.hw_lock);
>  	val = dev_priv->rps.max_freq_softlimit;
>  	if (dev_priv->rps.enabled &&
> @@ -6825,11 +6832,12 @@ struct request_boost {
>  static void __intel_rps_boost_work(struct work_struct *work)
>  {
>  	struct request_boost *boost = container_of(work, struct request_boost, work);
> +	struct drm_i915_gem_request *rq = boost->rq;
>  
> -	if (!i915_gem_request_completed(boost->rq, true))
> -		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
> +	if (!i915_gem_request_completed(rq, true))
> +		gen6_rps_boost(to_i915(rq->ring->dev), 0, rq->emitted_jiffies);
>  
> -	i915_gem_request_unreference__unlocked(boost->rq);
> +	i915_gem_request_unreference__unlocked(rq);
>  	kfree(boost);
>  }
>  
> @@ -6841,6 +6849,9 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
>  	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
>  		return;
>  
> +	if (i915_gem_request_completed(rq, true))
> +		return;
> +
>  	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
>  	if (boost == NULL)
>  		return;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency
  2015-05-21 12:55   ` Daniel Vetter
@ 2015-05-21 13:05     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2015-05-21 13:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, May 21, 2015 at 02:55:16PM +0200, Daniel Vetter wrote:
> On Mon, Apr 27, 2015 at 01:41:27PM +0100, Chris Wilson wrote:
> > Ignore the restriction imposed by the user for when the GPU is stalling
> > the clients and dropping frames. We will return back to the user limits
> > immediately once the stall is over.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 0cc9e95f70d3..9205b5fe2186 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -4161,7 +4161,7 @@ void intel_rps_boost(struct drm_i915_private *dev_priv,
> >  		rps = NULL;
> >  
> >  	mutex_lock(&dev_priv->rps.hw_lock);
> > -	val = dev_priv->rps.max_freq_softlimit;
> > +	val = dev_priv->rps.max_freq;
> 
> On second thought this has the risk of going into the overclocking range,
> which we probably don't want to do behind the user's back. I've left this
> one out for now.

Hmm, I thought I resent this using rps.boost_freq... Anyway I have
already respun this using boost_freq which fortunately avoids the
overclock range (since it is initialised before we define the
overclocking) and then exposes it to the user through sysfs (which
should also allow the naysayers to disable RPS boosting).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-05-21 13:05 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27 12:41 RPS tuning Chris Wilson
2015-04-27 12:41 ` [PATCH 01/16] drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level Chris Wilson
2015-04-29 14:50   ` Tvrtko Ursulin
2015-04-29 15:15     ` Chris Wilson
2015-04-27 12:41 ` [PATCH 02/16] drm/i915: Only remove objects pinned to the display from the available aperture Chris Wilson
2015-04-29 15:05   ` Tvrtko Ursulin
2015-04-29 15:48     ` Chris Wilson
2015-04-27 12:41 ` [PATCH 03/16] drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() Chris Wilson
2015-05-11 16:43   ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 04/16] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
2015-05-11 16:46   ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 05/16] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
2015-05-11 16:51   ` Daniel Vetter
2015-05-11 20:23     ` Chris Wilson
2015-05-12  8:43       ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 06/16] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
2015-04-29 13:51   ` Tvrtko Ursulin
2015-04-27 12:41 ` [PATCH 07/16] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
2015-04-29 14:03   ` Tvrtko Ursulin
2015-04-29 14:22     ` Chris Wilson
2015-04-27 12:41 ` [PATCH 08/16] drm/i915: Add RPS thresholds to debugfs/i915_frequency_info Chris Wilson
2015-05-04 14:36   ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 09/16] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
2015-05-04 14:38   ` Daniel Vetter
2015-05-04 14:46     ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 10/16] drm/i915: Limit mmio flip " Chris Wilson
2015-04-27 12:41 ` [PATCH 11/16] drm/i915: Convert RPS tracking to a intel_rps_client struct Chris Wilson
2015-04-27 12:41 ` [PATCH 12/16] drm/i915: Don't downclock whilst we have clients waiting for GPU results Chris Wilson
2015-04-27 12:41 ` [PATCH 13/16] drm/i915: Free RPS boosts for all laggards Chris Wilson
2015-05-21 12:50   ` Daniel Vetter
2015-05-21 12:56   ` Daniel Vetter
2015-04-27 12:41 ` [PATCH 14/16] drm/i915: Make the RPS interface gen agnostic Chris Wilson
2015-04-27 12:41 ` [PATCH 15/16] drm/i915, intel_ips: Enable GPU wait-boosting with IPS Chris Wilson
2015-04-27 12:41 ` [PATCH 16/16] drm/i915: Allow RPS waitboosting to use max GPU frequency Chris Wilson
2015-05-04 14:51   ` Daniel Vetter
2015-05-04 14:58     ` Chris Wilson
2015-05-21 12:55   ` Daniel Vetter
2015-05-21 13:05     ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.