All of lore.kernel.org
 help / color / mirror / Atom feed
* A picking of low hanging fruit
@ 2015-03-27 11:01 Chris Wilson
  2015-03-27 11:01 ` [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
                   ` (48 more replies)
  0 siblings, 49 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

I was looking at the performance degredation due to execlists and
decided to pick a few of the easier microoptimisations. Individually
they may not amount to much (except for spinning on requests!) but the
volume does quickly add up, giving benefit to quite a few of our more
driver bound tests.

Almost all of these I have posted before.
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail Chris Wilson
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem.c |  4 ++++
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9490403db23a..701079429832 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1985,6 +1985,10 @@ struct drm_i915_gem_object {
 
 	struct sg_table *pages;
 	int pages_pin_count;
+	struct get_page {
+		struct scatterlist *sg;
+		int last;
+	} get_page;
 
 	/* prime dma-buf support */
 	void *dma_buf_vmapping;
@@ -2642,15 +2646,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 				    int *needs_clflush);
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
-static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
+
+static inline int __sg_page_count(struct scatterlist *sg)
+{
+	return sg->length >> PAGE_SHIFT;
+}
+
+static inline struct page *
+i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
-	struct sg_page_iter sg_iter;
+	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
+		return NULL;
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
-		return sg_page_iter_page(&sg_iter);
+	if (n < obj->get_page.last) {
+		obj->get_page.sg = obj->pages->sgl;
+		obj->get_page.last = 0;
+	}
+
+	while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
+		obj->get_page.last += __sg_page_count(obj->get_page.sg++);
+		if (unlikely(sg_is_chain(obj->get_page.sg)))
+			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
+	}
 
-	return NULL;
+	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
 }
+
 static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 {
 	BUG_ON(obj->pages == NULL);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c4762d58e97a..476687a9d067 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2178,6 +2178,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 		return ret;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+
+	obj->get_page.sg = obj->pages->sgl;
+	obj->get_page.last = 0;
+
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
  2015-03-27 11:01 ` [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-04-02 11:21   ` Deepak S
  2015-03-27 11:01 ` [PATCH 03/49] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
                   ` (46 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Rodrigo Vivi

Reuse the same reclocking strategy for Baytail as on its bigger brethren,
Sandybridge and Ivybridge. In particular, this makes the device quicker
to reclock (both up and down) though the tendency now is to downclock
more aggressively to compensate for the RPS boosts.

v2: Rebase
v3: Exclude Cherrytrail as Deepak was concerned that the increased
number of register writes would wake the common powerwell too often.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h | 3 +++
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h | 2 --
 drivers/gpu/drm/i915/intel_pm.c | 8 +++++++-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 701079429832..c80e2e5e591a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1033,6 +1033,9 @@ struct intel_gen6_power_mgmt {
 	u8 rp0_freq;		/* Non-overclocked max frequency. */
 	u32 cz_freq;
 
+	u8 up_threshold; /* Current %busy required to uplock */
+	u8 down_threshold; /* Current %busy required to downclock */
+
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 14ecb4d13a1a..128a6f40b450 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1049,7 +1049,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 	if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
 		if (!vlv_c0_above(dev_priv,
 				  &dev_priv->rps.down_ei, &now,
-				  VLV_RP_DOWN_EI_THRESHOLD))
+				  dev_priv->rps.down_threshold))
 			events |= GEN6_PM_RP_DOWN_THRESHOLD;
 		dev_priv->rps.down_ei = now;
 	}
@@ -1057,7 +1057,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 	if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
 		if (vlv_c0_above(dev_priv,
 				 &dev_priv->rps.up_ei, &now,
-				 VLV_RP_UP_EI_THRESHOLD))
+				 dev_priv->rps.up_threshold))
 			events |= GEN6_PM_RP_UP_THRESHOLD;
 		dev_priv->rps.up_ei = now;
 	}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index b522eb6e59a4..faf8f829e61f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -671,8 +671,6 @@ enum skl_disp_power_wells {
 #define   FB_FMAX_VMIN_FREQ_LO_MASK		0xf8000000
 
 #define VLV_CZ_CLOCK_TO_MILLI_SEC		100000
-#define VLV_RP_UP_EI_THRESHOLD			90
-#define VLV_RP_DOWN_EI_THRESHOLD		70
 
 /* vlv2 north clock has */
 #define CCK_FUSE_REG				0x8
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index fa4ccb346389..65b33a4f82fc 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3930,6 +3930,8 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 		    GEN6_RP_DOWN_IDLE_AVG);
 
 	dev_priv->rps.power = new_power;
+	dev_priv->rps.up_threshold = threshold_up;
+	dev_priv->rps.down_threshold = threshold_down;
 	dev_priv->rps.last_adj = 0;
 }
 
@@ -4001,8 +4003,11 @@ static void valleyview_set_rps(struct drm_device *dev, u8 val)
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
-	if (val != dev_priv->rps.cur_freq)
+	if (val != dev_priv->rps.cur_freq) {
 		vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
+		if (!IS_CHERRYVIEW(dev_priv))
+			gen6_set_rps_thresholds(dev_priv, val);
+	}
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
@@ -4051,6 +4056,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 				& GENFREQSTATUS) == 0, 100))
 		DRM_ERROR("timed out waiting for Punit\n");
 
+	gen6_set_rps_thresholds(dev_priv, val);
 	vlv_force_gfx_clock(dev_priv, false);
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/49] drm/i915: Fix computation of last_adjustment for RPS autotuning
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
  2015-03-27 11:01 ` [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
  2015-03-27 11:01 ` [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked Chris Wilson
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

The issue is that by computing the last_adj value after applying the
clamping, we can end up with a bogus value for feeding into the next RPS
autotuning step.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 128a6f40b450..8b5e0358c592 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1095,21 +1095,20 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	pm_iir |= vlv_wa_c0_ei(dev_priv, pm_iir);
 
 	adj = dev_priv->rps.last_adj;
+	new_delay = dev_priv->rps.cur_freq;
 	if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
 		if (adj > 0)
 			adj *= 2;
-		else {
-			/* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv->dev) ? 2 : 1;
-		}
-		new_delay = dev_priv->rps.cur_freq + adj;
-
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(dev_priv) ? 2 : 1;
 		/*
 		 * For better performance, jump directly
 		 * to RPe if we're below it.
 		 */
-		if (new_delay < dev_priv->rps.efficient_freq)
+		if (new_delay < dev_priv->rps.efficient_freq - adj) {
 			new_delay = dev_priv->rps.efficient_freq;
+			adj = 0;
+		}
 	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
 		if (dev_priv->rps.cur_freq > dev_priv->rps.efficient_freq)
 			new_delay = dev_priv->rps.efficient_freq;
@@ -1119,24 +1118,22 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
 		if (adj < 0)
 			adj *= 2;
-		else {
-			/* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv->dev) ? -2 : -1;
-		}
-		new_delay = dev_priv->rps.cur_freq + adj;
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(dev_priv) ? -2 : -1;
 	} else { /* unknown event */
-		new_delay = dev_priv->rps.cur_freq;
+		adj = 0;
 	}
 
+	dev_priv->rps.last_adj = adj;
+
 	/* sysfs frequency interfaces may have snuck in while servicing the
 	 * interrupt
 	 */
+	new_delay += adj;
 	new_delay = clamp_t(int, new_delay,
 			    dev_priv->rps.min_freq_softlimit,
 			    dev_priv->rps.max_freq_softlimit);
 
-	dev_priv->rps.last_adj = new_delay - dev_priv->rps.cur_freq;
-
 	intel_set_rps(dev_priv->dev, new_delay);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (2 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 03/49] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 16:42   ` Tvrtko Ursulin
  2015-03-27 11:01 ` [PATCH 05/49] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
                   ` (44 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

We were missing a convenience stub to aquire the right mutex whilst
dropping the request, so add it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_gem.c |  8 ++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c80e2e5e591a..fa91ca33d07c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2149,6 +2149,19 @@ i915_gem_request_unreference(struct drm_i915_gem_request *req)
 	kref_put(&req->ref, i915_gem_request_free);
 }
 
+static inline void
+i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
+{
+	if (req && !atomic_add_unless(&req->ref.refcount, -1, 1)) {
+		struct drm_device *dev = req->ring->dev;
+
+		mutex_lock(&dev->struct_mutex);
+		if (likely(atomic_dec_and_test(&req->ref.refcount)))
+			i915_gem_request_free(&req->ref);
+		mutex_unlock(&dev->struct_mutex);
+	}
+}
+
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 					   struct drm_i915_gem_request *src)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 476687a9d067..a46372ebb3bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2870,9 +2870,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	ret = __i915_wait_request(req, reset_counter, true,
 				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 				  file->driver_priv);
-	mutex_lock(&dev->struct_mutex);
-	i915_gem_request_unreference(req);
-	mutex_unlock(&dev->struct_mutex);
+	i915_gem_request_unreference__unlocked(req);
 	return ret;
 
 out:
@@ -4104,9 +4102,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	mutex_lock(&dev->struct_mutex);
-	i915_gem_request_unreference(target);
-	mutex_unlock(&dev->struct_mutex);
+	i915_gem_request_unreference__unlocked(target);
 
 	return ret;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/49] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (3 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 06/49] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ander Conselvan de Oliveira

As we perform the mmio-flip without any locking and then try to acquire
the struct_mutex prior to dereferencing the request, it is possible for
userspace to queue a new pageflip before the worker can finish clearing
the old state - and then it will clear the new flip request. The result
is that the new flip could be completed before the GPU has finished
rendering.

The bugs stems from removing the seqno checking in
commit 536f5b5e86b225dab94c7ff8061ae482b6077387
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Thu Nov 6 11:03:40 2014 +0200

    drm/i915: Make mmio flip wait for seqno in the work function

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  6 ++++--
 drivers/gpu/drm/i915/intel_display.c | 39 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/intel_drv.h     |  4 ++--
 3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fa91ca33d07c..18cefd8226c1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2136,10 +2136,12 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
 	return req ? req->ring : NULL;
 }
 
-static inline void
+static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
-	kref_get(&req->ref);
+	if (req)
+		kref_get(&req->ref);
+	return req;
 }
 
 static inline void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 2afa3acf5452..0d944afe5427 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9984,22 +9984,18 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 
 static void intel_mmio_flip_work_func(struct work_struct *work)
 {
-	struct intel_crtc *crtc =
-		container_of(work, struct intel_crtc, mmio_flip.work);
-	struct intel_mmio_flip *mmio_flip;
+	struct intel_mmio_flip *mmio_flip =
+		container_of(work, struct intel_mmio_flip, work);
 
-	mmio_flip = &crtc->mmio_flip;
-	if (mmio_flip->req)
-		WARN_ON(__i915_wait_request(mmio_flip->req,
-					    crtc->reset_counter,
-					    false, NULL, NULL) != 0);
+	if (mmio_flip->rq)
+		WARN_ON(__i915_wait_request(mmio_flip->rq,
+					    mmio_flip->crtc->reset_counter,
+					    false, NULL, NULL));
 
-	intel_do_mmio_flip(crtc);
-	if (mmio_flip->req) {
-		mutex_lock(&crtc->base.dev->struct_mutex);
-		i915_gem_request_assign(&mmio_flip->req, NULL);
-		mutex_unlock(&crtc->base.dev->struct_mutex);
-	}
+	intel_do_mmio_flip(mmio_flip->crtc);
+
+	i915_gem_request_unreference__unlocked(mmio_flip->rq);
+	kfree(mmio_flip);
 }
 
 static int intel_queue_mmio_flip(struct drm_device *dev,
@@ -10009,12 +10005,17 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 				 struct intel_engine_cs *ring,
 				 uint32_t flags)
 {
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_mmio_flip *mmio_flip;
+
+	mmio_flip = kmalloc(sizeof(*mmio_flip), GFP_KERNEL);
+	if (mmio_flip == NULL)
+		return -ENOMEM;
 
-	i915_gem_request_assign(&intel_crtc->mmio_flip.req,
-				obj->last_write_req);
+	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
+	mmio_flip->crtc = to_intel_crtc(crtc);
 
-	schedule_work(&intel_crtc->mmio_flip.work);
+	INIT_WORK(&mmio_flip->work, intel_mmio_flip_work_func);
+	schedule_work(&mmio_flip->work);
 
 	return 0;
 }
@@ -12912,8 +12913,6 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
 	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
 	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
 
-	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
-
 	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
 
 	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 6036e3b73b7b..62dae400d600 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -403,8 +403,9 @@ struct intel_pipe_wm {
 };
 
 struct intel_mmio_flip {
-	struct drm_i915_gem_request *req;
 	struct work_struct work;
+	struct drm_i915_gem_request *rq;
+	struct intel_crtc *crtc;
 };
 
 struct skl_pipe_wm {
@@ -490,7 +491,6 @@ struct intel_crtc {
 	} wm;
 
 	int scanline_offset;
-	struct intel_mmio_flip mmio_flip;
 
 	struct intel_crtc_atomic_commit atomic;
 };
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/49] drm/i915: Boost GPU frequency if we detect outstanding pageflips
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (4 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 05/49] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 07/49] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we hit a vblank and see that have a pageflip queue but not yet
processed, ensure that the GPU is running at maximum in order to clear
the backlog. Pageflips are only queued for the following vblank, if we
miss it, there will be a visible stutter. Boosting the GPU frequency
doesn't prevent us from missing the target vblank, but it should help
the subsequent frames hitting theirs.

v2: Reorder vblank vs flip-complete so that we only check for a missed
flip after processing the completion events, and avoid spurious boosts.

v3: Rename missed_vblank
v4: Rebase
v5: Cancel the outstanding work in runtime suspend
v6: Rebase
v7: Rebase required fixing

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Deepak S<deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 11 ++++++++---
 drivers/gpu/drm/i915/intel_drv.h     |  2 ++
 drivers/gpu/drm/i915/intel_pm.c      | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0d944afe5427..5eb159bcd599 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10074,6 +10074,7 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_unpin_work *work;
 
 	WARN_ON(!in_interrupt());
 
@@ -10081,12 +10082,16 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
 		return;
 
 	spin_lock(&dev->event_lock);
-	if (intel_crtc->unpin_work && __intel_pageflip_stall_check(dev, crtc)) {
+	work = intel_crtc->unpin_work;
+	if (work != NULL && __intel_pageflip_stall_check(dev, crtc)) {
 		WARN_ONCE(1, "Kicking stuck page flip: queued at %d, now %d\n",
-			 intel_crtc->unpin_work->flip_queued_vblank,
-			 drm_vblank_count(dev, pipe));
+			 work->flip_queued_vblank, drm_vblank_count(dev, pipe));
 		page_flip_completed(intel_crtc);
+		work = NULL;
 	}
+	if (work != NULL &&
+	    drm_vblank_count(dev, pipe) - work->flip_queued_vblank > 1)
+		intel_queue_rps_boost_for_request(dev, work->flip_queued_req);
 	spin_unlock(&dev->event_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 62dae400d600..4a6a51b99b22 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1265,6 +1265,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv);
+void intel_queue_rps_boost_for_request(struct drm_device *dev,
+				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
 void skl_wm_get_hw_state(struct drm_device *dev);
 void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 65b33a4f82fc..55f754aa11b6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6768,6 +6768,41 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 		return val / GT_FREQUENCY_MULTIPLIER;
 }
 
+struct request_boost {
+	struct work_struct work;
+	struct drm_i915_gem_request *rq;
+};
+
+static void __intel_rps_boost_work(struct work_struct *work)
+{
+	struct request_boost *boost = container_of(work, struct request_boost, work);
+
+	if (!i915_gem_request_completed(boost->rq, true))
+		gen6_rps_boost(to_i915(boost->rq->ring->dev));
+
+	i915_gem_request_unreference__unlocked(boost->rq);
+	kfree(boost);
+}
+
+void intel_queue_rps_boost_for_request(struct drm_device *dev,
+				       struct drm_i915_gem_request *rq)
+{
+	struct request_boost *boost;
+
+	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
+		return;
+
+	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
+	if (boost == NULL)
+		return;
+
+	i915_gem_request_reference(rq);
+	boost->rq = rq;
+
+	INIT_WORK(&boost->work, __intel_rps_boost_work);
+	queue_work(to_i915(dev)->wq, &boost->work);
+}
+
 void intel_pm_setup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/49] drm/i915: Deminish contribution of wait-boosting from clients
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (5 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 06/49] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 39 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h     |  9 ++++++---
 drivers/gpu/drm/i915/i915_gem.c     | 35 ++++++++-------------------------
 drivers/gpu/drm/i915/intel_drv.h    |  3 ++-
 drivers/gpu/drm/i915/intel_pm.c     | 18 ++++++++++++++---
 5 files changed, 70 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 007c7d7d8295..c537cc7d617c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2226,6 +2226,44 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_rps_boost_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	ret = mutex_lock_interruptible(&dev_priv->rps.hw_lock);
+	if (ret)
+		goto unlock;
+
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		struct task_struct *task;
+
+		rcu_read_lock();
+		task = pid_task(file->pid, PIDTYPE_PID);
+		seq_printf(m, "%s [%d]: %d boosts%s\n",
+			   task ? task->comm : "<unknown>",
+			   task ? task->pid : -1,
+			   file_priv->rps_boosts,
+			   list_empty(&file_priv->rps_boost) ? "" : ", active");
+		rcu_read_unlock();
+	}
+	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
+
+	mutex_unlock(&dev_priv->rps.hw_lock);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+}
+
 static int i915_llc(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -4691,6 +4729,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_ddb_info", i915_ddb_info, 0},
 	{"i915_sseu_status", i915_sseu_status, 0},
 	{"i915_drrs_status", i915_drrs_status, 0},
+	{"i915_rps_boost_info", i915_rps_boost_info, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 18cefd8226c1..1ff8629264a5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1041,6 +1041,8 @@ struct intel_gen6_power_mgmt {
 
 	bool enabled;
 	struct delayed_work delayed_resume_work;
+	struct list_head clients;
+	unsigned boosts;
 
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
@@ -2189,12 +2191,13 @@ struct drm_i915_file_private {
 	struct {
 		spinlock_t lock;
 		struct list_head request_list;
-		struct delayed_work idle_work;
 	} mm;
 	struct idr context_idr;
 
-	atomic_t rps_wait_boost;
-	struct  intel_engine_cs *bsd_ring;
+	struct list_head rps_boost;
+	struct intel_engine_cs *bsd_ring;
+
+	unsigned rps_boosts;
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a46372ebb3bc..d54f6a277d82 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1181,14 +1181,6 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
 	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
-static bool can_wait_boost(struct drm_i915_file_private *file_priv)
-{
-	if (file_priv == NULL)
-		return true;
-
-	return !atomic_xchg(&file_priv->rps_wait_boost, true);
-}
-
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
@@ -1230,13 +1222,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev)->gen >= 6 && ring->id == RCS && can_wait_boost(file_priv)) {
-		gen6_rps_boost(dev_priv);
-		if (file_priv)
-			mod_delayed_work(dev_priv->wq,
-					 &file_priv->mm.idle_work,
-					 msecs_to_jiffies(100));
-	}
+	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
+		gen6_rps_boost(dev_priv, file_priv);
 
 	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
 		return -ENODEV;
@@ -5046,8 +5033,6 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
-	cancel_delayed_work_sync(&file_priv->mm.idle_work);
-
 	/* Clean up our request list when the client is going away, so that
 	 * later retire_requests won't dereference our soon-to-be-gone
 	 * file_priv.
@@ -5063,15 +5048,12 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 		request->file_priv = NULL;
 	}
 	spin_unlock(&file_priv->mm.lock);
-}
-
-static void
-i915_gem_file_idle_work_handler(struct work_struct *work)
-{
-	struct drm_i915_file_private *file_priv =
-		container_of(work, typeof(*file_priv), mm.idle_work.work);
 
-	atomic_set(&file_priv->rps_wait_boost, false);
+	if (!list_empty(&file_priv->rps_boost)) {
+		mutex_lock(&to_i915(dev)->rps.hw_lock);
+		list_del(&file_priv->rps_boost);
+		mutex_unlock(&to_i915(dev)->rps.hw_lock);
+	}
 }
 
 int i915_gem_open(struct drm_device *dev, struct drm_file *file)
@@ -5088,11 +5070,10 @@ int i915_gem_open(struct drm_device *dev, struct drm_file *file)
 	file->driver_priv = file_priv;
 	file_priv->dev_priv = dev->dev_private;
 	file_priv->file = file;
+	INIT_LIST_HEAD(&file_priv->rps_boost);
 
 	spin_lock_init(&file_priv->mm.lock);
 	INIT_LIST_HEAD(&file_priv->mm.request_list);
-	INIT_DELAYED_WORK(&file_priv->mm.idle_work,
-			  i915_gem_file_idle_work_handler);
 
 	ret = i915_gem_context_open(dev, file);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 4a6a51b99b22..7e0ff13d2aea 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1264,7 +1264,8 @@ void gen6_update_ring_freq(struct drm_device *dev);
 void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
-void gen6_rps_boost(struct drm_i915_private *dev_priv);
+void gen6_rps_boost(struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv);
 void intel_queue_rps_boost_for_request(struct drm_device *dev,
 				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 55f754aa11b6..bcb86cdd1be5 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4087,10 +4087,14 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 		dev_priv->rps.last_adj = 0;
 		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
 	}
+
+	while (!list_empty(&dev_priv->rps.clients))
+		list_del_init(dev_priv->rps.clients.next);
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
-void gen6_rps_boost(struct drm_i915_private *dev_priv)
+void gen6_rps_boost(struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv)
 {
 	u32 val;
 
@@ -4098,9 +4102,16 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv)
 	val = dev_priv->rps.max_freq_softlimit;
 	if (dev_priv->rps.enabled &&
 	    dev_priv->mm.busy &&
-	    dev_priv->rps.cur_freq < val) {
+	    dev_priv->rps.cur_freq < val &&
+	    (file_priv == NULL || list_empty(&file_priv->rps_boost))) {
 		intel_set_rps(dev_priv->dev, val);
 		dev_priv->rps.last_adj = 0;
+
+		if (file_priv != NULL) {
+			list_add(&file_priv->rps_boost, &dev_priv->rps.clients);
+			file_priv->rps_boosts++;
+		} else
+			dev_priv->rps.boosts++;
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
@@ -6778,7 +6789,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 
 	if (!i915_gem_request_completed(boost->rq, true))
-		gen6_rps_boost(to_i915(boost->rq->ring->dev));
+		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
 
 	i915_gem_request_unreference__unlocked(boost->rq);
 	kfree(boost);
@@ -6811,6 +6822,7 @@ void intel_pm_setup(struct drm_device *dev)
 
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
+	INIT_LIST_HEAD(&dev_priv->rps.clients);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (6 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 07/49] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-04-02 11:09   ` Deepak S
  2015-03-27 11:01 ` [PATCH 09/49] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
                   ` (40 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

This reverts commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d54f6a277d82..05f94ee8ea37 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,7 +1222,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
 	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/49] drm/i915: Split i915_gem_batch_pool into its own header
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (7 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 10/49] drm/i915: Tidy batch pool logic Chris Wilson
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

In the next patch, I want to use the structure elsewhere and so require
it defined earlier. Rather than move the definition to an earlier location
where it feels very odd, place it in its own header file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 13 +--------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  1 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.h | 42 ++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ff8629264a5..3a551f07baff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,6 +37,7 @@
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
+#include "i915_gem_batch_pool.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
@@ -1140,11 +1141,6 @@ struct intel_l3_parity {
 	int which_slice;
 };
 
-struct i915_gem_batch_pool {
-	struct drm_device *dev;
-	struct list_head cache_list;
-};
-
 struct i915_gem_mm {
 	/** Memory allocator for GTT stolen memory */
 	struct drm_mm stolen;
@@ -3067,13 +3063,6 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
 
-/* i915_gem_batch_pool.c */
-void i915_gem_batch_pool_init(struct drm_device *dev,
-			      struct i915_gem_batch_pool *pool);
-void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
-struct drm_i915_gem_object*
-i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
-
 /* i915_cmd_parser.c */
 int i915_cmd_parser_get_version(void);
 int i915_cmd_parser_init_ring(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index c690170a1c4f..564be7c5ea7e 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -23,6 +23,7 @@
  */
 
 #include "i915_drv.h"
+#include "i915_gem_batch_pool.h"
 
 /**
  * DOC: batch pool
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
new file mode 100644
index 000000000000..5ed70ef6a887
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef I915_GEM_BATCH_POOL_H
+#define I915_GEM_BATCH_POOL_H
+
+#include "i915_drv.h"
+
+struct i915_gem_batch_pool {
+	struct drm_device *dev;
+	struct list_head cache_list;
+};
+
+/* i915_gem_batch_pool.c */
+void i915_gem_batch_pool_init(struct drm_device *dev,
+			      struct i915_gem_batch_pool *pool);
+void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
+struct drm_i915_gem_object*
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
+
+#endif /* I915_GEM_BATCH_POOL_H */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/49] drm/i915: Tidy batch pool logic
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (8 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 09/49] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:59   ` Tvrtko Ursulin
  2015-03-27 11:01 ` [PATCH 11/49] drm/i915: Split the batch pool by engine Chris Wilson
                   ` (38 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 12 +++------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 39 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 ++++------
 3 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 61ae8ff4eaed..9605ff8f2fcd 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -869,6 +869,9 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	    batch_len + batch_start_offset > src_obj->base.size)
 		return ERR_PTR(-E2BIG);
 
+	if (WARN_ON(dest_obj->pages_pin_count == 0))
+		return ERR_PTR(-ENODEV);
+
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
@@ -882,13 +885,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		goto unpin_src;
 	}
 
-	ret = i915_gem_object_get_pages(dest_obj);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
-		goto unmap_src;
-	}
-	i915_gem_object_pin_pages(dest_obj);
-
 	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
@@ -898,7 +894,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	dst = vmap_batch(dest_obj, 0, batch_len);
 	if (!dst) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		i915_gem_object_unpin_pages(dest_obj);
 		ret = -ENOMEM;
 		goto unmap_src;
 	}
@@ -1129,7 +1124,6 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	}
 
 	vunmap(batch_base);
-	i915_gem_object_unpin_pages(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 564be7c5ea7e..21f3356cc0ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -67,25 +67,23 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 					 struct drm_i915_gem_object,
 					 batch_pool_list);
 
-		WARN_ON(obj->active);
-
-		list_del_init(&obj->batch_pool_list);
+		list_del(&obj->batch_pool_list);
 		drm_gem_object_unreference(&obj->base);
 	}
 }
 
 /**
- * i915_gem_batch_pool_get() - select a buffer from the pool
+ * i915_gem_batch_pool_get() - allocate a buffer from the pool
  * @pool: the batch buffer pool
  * @size: the minimum desired size of the returned buffer
  *
- * Finds or allocates a batch buffer in the pool with at least the requested
- * size. The caller is responsible for any domain, active/inactive, or
- * purgeability management for the returned buffer.
+ * Returns an inactive buffer from @pool with at least @size bytes,
+ * with the pages pinned. The caller must i915_gem_object_unpin_pages()
+ * on the returned object.
  *
  * Note: Callers must hold the struct_mutex
  *
- * Return: the selected batch buffer object
+ * Return: the buffer object or an error pointer
  */
 struct drm_i915_gem_object *
 i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
@@ -97,8 +95,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
 	list_for_each_entry_safe(tmp, next,
-			&pool->cache_list, batch_pool_list) {
-
+				 &pool->cache_list, batch_pool_list) {
 		if (tmp->active)
 			continue;
 
@@ -114,25 +111,27 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		 * but not 'too much' bigger. A better way to do this
 		 * might be to bucket the pool objects based on size.
 		 */
-		if (tmp->base.size >= size &&
-		    tmp->base.size <= (2 * size)) {
+		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
 			obj = tmp;
 			break;
 		}
 	}
 
-	if (!obj) {
+	if (obj == NULL) {
+		int ret;
+
 		obj = i915_gem_alloc_object(pool->dev, size);
-		if (!obj)
+		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
 
-		list_add_tail(&obj->batch_pool_list, &pool->cache_list);
-	}
-	else
-		/* Keep list in LRU order */
-		list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+		ret = i915_gem_object_get_pages(obj);
+		if (ret)
+			return ERR_PTR(ret);
 
-	obj->madv = I915_MADV_WILLNEED;
+		obj->madv = I915_MADV_DONTNEED;
+	}
 
+	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+	i915_gem_object_pin_pages(obj);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index cfb6526b42c9..e13dcde5038c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -37,7 +37,6 @@
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1<<29)
 #define  __EXEC_OBJECT_NEEDS_BIAS (1<<28)
-#define  __EXEC_OBJECT_PURGEABLE (1<<27)
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -224,12 +223,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
 		vma->pin_count--;
 
-	if (entry->flags & __EXEC_OBJECT_PURGEABLE)
-		obj->madv = I915_MADV_DONTNEED;
-
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE |
-			  __EXEC_OBJECT_HAS_PIN |
-			  __EXEC_OBJECT_PURGEABLE);
+	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
 
 static void eb_destroy(struct eb_vmas *eb)
@@ -1185,11 +1179,13 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
+	i915_gem_object_unpin_pages(shadow_batch_obj);
+
 	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
 	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
-	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
+	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
@@ -1198,6 +1194,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	return shadow_batch_obj;
 
 err:
+	i915_gem_object_unpin_pages(shadow_batch_obj);
 	if (ret == -EACCES) /* unhandled chained batch */
 		return batch_obj;
 	else
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/49] drm/i915: Split the batch pool by engine
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (9 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 10/49] drm/i915: Tidy batch pool logic Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 12/49] drm/i915: Free batch pool when idle Chris Wilson
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 33 ++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_dma.c            |  1 -
 drivers/gpu/drm/i915/i915_drv.h            |  8 --------
 drivers/gpu/drm/i915/i915_gem.c            |  2 --
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  3 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
 drivers/gpu/drm/i915/intel_lrc.c           |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  8 ++++++++
 9 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c537cc7d617c..1c803ffe4a3b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -377,13 +377,17 @@ static void print_batch_pool_stats(struct seq_file *m,
 {
 	struct drm_i915_gem_object *obj;
 	struct file_stats stats;
+	struct intel_engine_cs *ring;
+	int i;
 
 	memset(&stats, 0, sizeof(stats));
 
-	list_for_each_entry(obj,
-			    &dev_priv->mm.batch_pool.cache_list,
-			    batch_pool_list)
-		per_file_stats(0, obj, &stats);
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(obj,
+				    &ring->batch_pool.cache_list,
+				    batch_pool_list)
+			per_file_stats(0, obj, &stats);
+	}
 
 	print_file_stats(m, "batch pool", stats);
 }
@@ -613,21 +617,24 @@ static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *ring;
 	int count = 0;
-	int ret;
+	int ret, i;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
-	seq_puts(m, "cache:\n");
-	list_for_each_entry(obj,
-			    &dev_priv->mm.batch_pool.cache_list,
-			    batch_pool_list) {
-		seq_puts(m, "   ");
-		describe_obj(m, obj);
-		seq_putc(m, '\n');
-		count++;
+	for_each_ring(ring, dev_priv, i) {
+		seq_printf(m, "%s cache:\n", ring->name);
+		list_for_each_entry(obj,
+				    &ring->batch_pool.cache_list,
+				    batch_pool_list) {
+			seq_puts(m, "   ");
+			describe_obj(m, obj);
+			seq_putc(m, '\n');
+			count++;
+		}
 	}
 
 	seq_printf(m, "total: %d\n", count);
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 68e0c85a17cf..8f5428b46a27 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1072,7 +1072,6 @@ int i915_driver_unload(struct drm_device *dev)
 
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
-	i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
 	i915_gem_cleanup_stolen(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3a551f07baff..84dcf391b244 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,7 +37,6 @@
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
-#include "i915_gem_batch_pool.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
@@ -1154,13 +1153,6 @@ struct i915_gem_mm {
 	 */
 	struct list_head unbound_list;
 
-	/*
-	 * A pool of objects to use as shadow copies of client batch buffers
-	 * when the command parser is enabled. Prevents the client from
-	 * modifying the batch contents after software parsing.
-	 */
-	struct i915_gem_batch_pool batch_pool;
-
 	/** Usable portion of the GTT for GEM */
 	unsigned long stolen_base; /* limited to low memory (32-bit) */
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05f94ee8ea37..d6be4cf3d64b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5024,8 +5024,6 @@ i915_gem_load(struct drm_device *dev)
 
 	i915_gem_shrinker_init(dev_priv);
 
-	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
-
 	mutex_init(&dev_priv->fb_tracking.lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 21f3356cc0ab..1287abf55b84 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -96,8 +96,9 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 
 	list_for_each_entry_safe(tmp, next,
 				 &pool->cache_list, batch_pool_list) {
+		/* The batches are strictly LRU ordered */
 		if (tmp->active)
-			continue;
+			break;
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e13dcde5038c..c30334435e8e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1156,12 +1156,11 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  u32 batch_len,
 			  bool is_master)
 {
-	struct drm_i915_private *dev_priv = to_i915(batch_obj->base.dev);
 	struct drm_i915_gem_object *shadow_batch_obj;
 	struct i915_vma *vma;
 	int ret;
 
-	shadow_batch_obj = i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
+	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
 						   PAGE_ALIGN(batch_len));
 	if (IS_ERR(shadow_batch_obj))
 		return shadow_batch_obj;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fcb074bd55dc..1d0fb8450adc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1404,6 +1404,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		ring->cleanup(ring);
 
 	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
 
 	if (ring->status_page.obj) {
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
@@ -1421,6 +1422,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 441e2502b889..a351178913f7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1972,6 +1972,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
@@ -2050,6 +2051,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	cleanup_status_page(ring);
 
 	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
 
 	kfree(ringbuf);
 	ring->buffer = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c761fe05ad6f..1d08d8f9149d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -2,6 +2,7 @@
 #define _INTEL_RINGBUFFER_H_
 
 #include <linux/hashtable.h>
+#include "i915_gem_batch_pool.h"
 
 #define I915_CMD_HASH_ORDER 9
 
@@ -133,6 +134,13 @@ struct  intel_engine_cs {
 	struct		drm_device *dev;
 	struct intel_ringbuffer *buffer;
 
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
 	struct intel_hw_status_page status_page;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/49] drm/i915: Free batch pool when idle
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (10 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 11/49] drm/i915: Split the batch pool by engine Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 13/49] drm/i915: Split batch pool into size buckets Chris Wilson
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d6be4cf3d64b..a99e434126ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2758,8 +2758,19 @@ i915_gem_idle_work_handler(struct work_struct *work)
 {
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv), mm.idle_work.work);
+	struct drm_device *dev = dev_priv->dev;
+
+	intel_mark_idle(dev);
 
-	intel_mark_idle(dev_priv->dev);
+	if (mutex_trylock(&dev->struct_mutex)) {
+		struct intel_engine_cs *ring;
+		int i;
+
+		for_each_ring(ring, dev_priv, i)
+			i915_gem_batch_pool_fini(&ring->batch_pool);
+
+		mutex_unlock(&dev->struct_mutex);
+	}
 }
 
 /**
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/49] drm/i915: Split batch pool into size buckets
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (11 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 12/49] drm/i915: Free batch pool when idle Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 14/49] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 46 ++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            |  2 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 49 +++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_batch_pool.h |  2 +-
 5 files changed, 64 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1c803ffe4a3b..b37b7c2ae5e2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -378,15 +378,17 @@ static void print_batch_pool_stats(struct seq_file *m,
 	struct drm_i915_gem_object *obj;
 	struct file_stats stats;
 	struct intel_engine_cs *ring;
-	int i;
+	int i, j;
 
 	memset(&stats, 0, sizeof(stats));
 
 	for_each_ring(ring, dev_priv, i) {
-		list_for_each_entry(obj,
-				    &ring->batch_pool.cache_list,
-				    batch_pool_list)
-			per_file_stats(0, obj, &stats);
+		for (j = 0; j < ARRAY_SIZE(ring->batch_pool.cache_list); j++) {
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link)
+				per_file_stats(0, obj, &stats);
+		}
 	}
 
 	print_file_stats(m, "batch pool", stats);
@@ -618,26 +620,38 @@ static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
 	struct intel_engine_cs *ring;
-	int count = 0;
-	int ret, i;
+	int total = 0;
+	int ret, i, j;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
 	for_each_ring(ring, dev_priv, i) {
-		seq_printf(m, "%s cache:\n", ring->name);
-		list_for_each_entry(obj,
-				    &ring->batch_pool.cache_list,
-				    batch_pool_list) {
-			seq_puts(m, "   ");
-			describe_obj(m, obj);
-			seq_putc(m, '\n');
-			count++;
+		for (j = 0; j < ARRAY_SIZE(ring->batch_pool.cache_list); j++) {
+			int count;
+
+			count = 0;
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link)
+				count++;
+			seq_printf(m, "%s cache[%d]: %d objects\n",
+				   ring->name, j, count);
+
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link) {
+				seq_puts(m, "   ");
+				describe_obj(m, obj);
+				seq_putc(m, '\n');
+			}
+
+			total += count;
 		}
 	}
 
-	seq_printf(m, "total: %d\n", count);
+	seq_printf(m, "total: %d\n", total);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 84dcf391b244..896aae1c10ac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1909,7 +1909,7 @@ struct drm_i915_gem_object {
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
-	struct list_head batch_pool_list;
+	struct list_head batch_pool_link;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a99e434126ba..413b0e78b161 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4413,7 +4413,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->ring_list);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
-	INIT_LIST_HEAD(&obj->batch_pool_list);
+	INIT_LIST_HEAD(&obj->batch_pool_link);
 
 	obj->ops = ops;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 1287abf55b84..7bf2f3f2968e 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -47,8 +47,12 @@
 void i915_gem_batch_pool_init(struct drm_device *dev,
 			      struct i915_gem_batch_pool *pool)
 {
+	int n;
+
 	pool->dev = dev;
-	INIT_LIST_HEAD(&pool->cache_list);
+
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
+		INIT_LIST_HEAD(&pool->cache_list[n]);
 }
 
 /**
@@ -59,16 +63,20 @@ void i915_gem_batch_pool_init(struct drm_device *dev,
  */
 void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 {
+	int n;
+
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
-	while (!list_empty(&pool->cache_list)) {
-		struct drm_i915_gem_object *obj =
-			list_first_entry(&pool->cache_list,
-					 struct drm_i915_gem_object,
-					 batch_pool_list);
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) {
+		while (!list_empty(&pool->cache_list[n])) {
+			struct drm_i915_gem_object *obj =
+				list_first_entry(&pool->cache_list[n],
+						 struct drm_i915_gem_object,
+						 batch_pool_link);
 
-		list_del(&obj->batch_pool_list);
-		drm_gem_object_unreference(&obj->base);
+			list_del(&obj->batch_pool_link);
+			drm_gem_object_unreference(&obj->base);
+		}
 	}
 }
 
@@ -91,28 +99,33 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 {
 	struct drm_i915_gem_object *obj = NULL;
 	struct drm_i915_gem_object *tmp, *next;
+	struct list_head *list;
+	int n;
 
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
-	list_for_each_entry_safe(tmp, next,
-				 &pool->cache_list, batch_pool_list) {
+	/* Compute a power-of-two bucket, but throw everything greater than
+	 * 16KiB into the same bucket: i.e. the the buckets hold objects of
+	 * (1 page, 2 pages, 4 pages, 8+ pages).
+	 */
+	n = fls(size >> PAGE_SHIFT) - 1;
+	if (n >= ARRAY_SIZE(pool->cache_list))
+		n = ARRAY_SIZE(pool->cache_list) - 1;
+	list = &pool->cache_list[n];
+
+	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
 		if (tmp->active)
 			break;
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
-			list_del(&tmp->batch_pool_list);
+			list_del(&tmp->batch_pool_link);
 			drm_gem_object_unreference(&tmp->base);
 			continue;
 		}
 
-		/*
-		 * Select a buffer that is at least as big as needed
-		 * but not 'too much' bigger. A better way to do this
-		 * might be to bucket the pool objects based on size.
-		 */
-		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
+		if (tmp->base.size >= size) {
 			obj = tmp;
 			break;
 		}
@@ -132,7 +145,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		obj->madv = I915_MADV_DONTNEED;
 	}
 
-	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+	list_move_tail(&obj->batch_pool_link, list);
 	i915_gem_object_pin_pages(obj);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
index 5ed70ef6a887..848e90703eed 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -29,7 +29,7 @@
 
 struct i915_gem_batch_pool {
 	struct drm_device *dev;
-	struct list_head cache_list;
+	struct list_head cache_list[4];
 };
 
 /* i915_gem_batch_pool.c */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/49] drm/i915: Include active flag when describing objects in debugfs
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (12 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 13/49] drm/i915: Split batch pool into size buckets Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 15/49] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

Since we use obj->active as a hint in many places throughout the code,
knowing its state in debugfs is extremely useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b37b7c2ae5e2..14322bbaced6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -123,8 +123,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int pin_count = 0;
 
-	seq_printf(m, "%pK: %s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
+	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
 		   &obj->base,
+		   obj->active ? "*" : " ",
 		   get_pin_flag(obj),
 		   get_tiling_flag(obj),
 		   get_global_flag(obj),
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/49] drm/i915: Suppress empty lines from debugfs/i915_gem_objects
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (13 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 14/49] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 16/49] drm/i915: Optimistically spin for the request completion Chris Wilson
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

This is just so that I don't have to read about the batch pool on
systems that are not using it! Rather than using a newline between the
kernel clients and userspace clients, just distinguish the internal
allocations with a '[k]'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 14322bbaced6..7ef6295438e9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -362,16 +362,18 @@ static int per_file_stats(int id, void *ptr, void *data)
 	return 0;
 }
 
-#define print_file_stats(m, name, stats) \
-	seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
-		   name, \
-		   stats.count, \
-		   stats.total, \
-		   stats.active, \
-		   stats.inactive, \
-		   stats.global, \
-		   stats.shared, \
-		   stats.unbound)
+#define print_file_stats(m, name, stats) do { \
+	if (stats.count) \
+		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
+			   name, \
+			   stats.count, \
+			   stats.total, \
+			   stats.active, \
+			   stats.inactive, \
+			   stats.global, \
+			   stats.shared, \
+			   stats.unbound); \
+} while (0)
 
 static void print_batch_pool_stats(struct seq_file *m,
 				   struct drm_i915_private *dev_priv)
@@ -392,7 +394,7 @@ static void print_batch_pool_stats(struct seq_file *m,
 		}
 	}
 
-	print_file_stats(m, "batch pool", stats);
+	print_file_stats(m, "[k]batch pool", stats);
 }
 
 #define count_vmas(list, member) do { \
@@ -478,8 +480,6 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 
 	seq_putc(m, '\n');
 	print_batch_pool_stats(m, dev_priv);
-
-	seq_putc(m, '\n');
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
 		struct task_struct *task;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/49] drm/i915: Optimistically spin for the request completion
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (14 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 15/49] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:42   ` Tvrtko Ursulin
  2015-03-27 11:01 ` [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
                   ` (32 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, Rantala, Valtteri

This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 44 +++++++++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 413b0e78b161..dc3eafe7f7d4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1181,6 +1181,29 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
 	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
+static int __i915_spin_request(struct drm_i915_gem_request *rq)
+{
+	unsigned long timeout;
+
+	if (i915_gem_request_get_ring(rq)->irq_refcount)
+		return -EBUSY;
+
+	timeout = jiffies + 1;
+	while (!need_resched()) {
+		if (i915_gem_request_completed(rq, true))
+			return 0;
+
+		if (time_after_eq(jiffies, timeout))
+			break;
+
+		cpu_relax_lowlatency();
+	}
+	if (i915_gem_request_completed(rq, false))
+		return 0;
+
+	return -EAGAIN;
+}
+
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
@@ -1225,12 +1248,20 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (INTEL_INFO(dev)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
-	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
-		return -ENODEV;
-
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
 	before = ktime_get_raw_ns();
+
+	/* Optimistic spin for the next jiffie before touching IRQs */
+	ret = __i915_spin_request(req);
+	if (ret == 0)
+		goto out;
+
+	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
+		ret = -ENODEV;
+		goto out;
+	}
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1279,14 +1310,15 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	now = ktime_get_raw_ns();
-	trace_i915_gem_request_wait_end(req);
-
 	if (!irq_test_in_progress)
 		ring->irq_put(ring);
 
 	finish_wait(&ring->irq_queue, &wait);
 
+out:
+	now = ktime_get_raw_ns();
+	trace_i915_gem_request_wait_end(req);
+
 	if (timeout) {
 		s64 tres = *timeout - (now - before);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (15 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 16/49] drm/i915: Optimistically spin for the request completion Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-30 13:52   ` Tvrtko Ursulin
  2015-03-27 11:01 ` [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
                   ` (31 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Lionel Landwerlin

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  16 +-
 drivers/gpu/drm/i915/i915_drv.h         |  19 +-
 drivers/gpu/drm/i915/i915_gem.c         | 516 +++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_context.c |   2 -
 drivers/gpu/drm/i915/i915_gem_debug.c   |  92 ++----
 drivers/gpu/drm/i915/i915_gpu_error.c   |  19 +-
 drivers/gpu/drm/i915/intel_display.c    |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  15 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  10 +-
 9 files changed, 386 insertions(+), 307 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7ef6295438e9..5cea9a9c1cb9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -120,10 +120,13 @@ static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	struct intel_engine_cs *ring;
 	struct i915_vma *vma;
 	int pin_count = 0;
+	int i;
 
-	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
+	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
 		   &obj->base,
 		   obj->active ? "*" : " ",
 		   get_pin_flag(obj),
@@ -131,8 +134,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   get_global_flag(obj),
 		   obj->base.size / 1024,
 		   obj->base.read_domains,
-		   obj->base.write_domain,
-		   i915_gem_request_get_seqno(obj->last_read_req),
+		   obj->base.write_domain);
+	for_each_ring(ring, dev_priv, i)
+		seq_printf(m, "%x ",
+				i915_gem_request_get_seqno(obj->last_read_req[i]));
+	seq_printf(m, "] %x %x%s%s%s",
 		   i915_gem_request_get_seqno(obj->last_write_req),
 		   i915_gem_request_get_seqno(obj->last_fenced_req),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
@@ -169,9 +175,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		*t = '\0';
 		seq_printf(m, " (%s mappable)", s);
 	}
-	if (obj->last_read_req != NULL)
+	if (obj->last_write_req != NULL)
 		seq_printf(m, " (%s)",
-			   i915_gem_request_get_ring(obj->last_read_req)->name);
+			   i915_gem_request_get_ring(obj->last_write_req)->name);
 	if (obj->frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 896aae1c10ac..7cf5d1b0a749 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -500,7 +500,7 @@ struct drm_i915_error_state {
 	struct drm_i915_error_buffer {
 		u32 size;
 		u32 name;
-		u32 rseqno, wseqno;
+		u32 rseqno[I915_NUM_RINGS], wseqno;
 		u32 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
@@ -1905,7 +1905,7 @@ struct drm_i915_gem_object {
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	struct list_head ring_list;
+	struct list_head ring_list[I915_NUM_RINGS];
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
@@ -1916,7 +1916,7 @@ struct drm_i915_gem_object {
 	 * rendering and so a non-zero seqno), and is not set if it i s on
 	 * inactive (ready to be unbound) list.
 	 */
-	unsigned int active:1;
+	unsigned int active:I915_NUM_RINGS;
 
 	/**
 	 * This is set if the object has been written to since last bound
@@ -1987,8 +1987,17 @@ struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/** Breadcrumb of last rendering to the buffer. */
-	struct drm_i915_gem_request *last_read_req;
+	/** Breadcrumb of last rendering to the buffer.
+	 * There can only be one writer, but we allow for multiple readers.
+	 * If there is a writer that necessarily implies that all other
+	 * read requests are complete - but we may only be lazily clearing
+	 * the read requests. A read request is naturally the most recent
+	 * request on a ring, so we may have two different write and read
+	 * requests on one ring where the write request is older than the
+	 * read request. This allows for the CPU to read from an active
+	 * buffer by only waiting for the write to complete.
+	 * */
+	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
 	struct drm_i915_gem_request *last_write_req;
 	/** Breadcrumb of last fenced GPU access to the buffer. */
 	struct drm_i915_gem_request *last_fenced_req;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dc3eafe7f7d4..7e6f2560bf35 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,14 +38,17 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+#define RQ_BUG_ON(expr)
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
+static void
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
+static void
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
 static __must_check int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly);
-static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj);
-
 static void i915_gem_write_fence(struct drm_device *dev, int reg,
 				 struct drm_i915_gem_object *obj);
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
@@ -518,8 +521,6 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 		ret = i915_gem_object_wait_rendering(obj, true);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 
 	ret = i915_gem_object_get_pages(obj);
@@ -939,8 +940,6 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 	/* Same trick applies to invalidate partially written cachelines read
 	 * before writing. */
@@ -1239,6 +1238,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
+	if (list_empty(&req->list))
+		return 0;
+
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
@@ -1338,6 +1340,56 @@ out:
 	return ret;
 }
 
+static inline void
+i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+{
+	struct drm_i915_file_private *file_priv = request->file_priv;
+
+	if (!file_priv)
+		return;
+
+	spin_lock(&file_priv->mm.lock);
+	list_del(&request->client_list);
+	request->file_priv = NULL;
+	spin_unlock(&file_priv->mm.lock);
+}
+
+static void i915_gem_request_retire(struct drm_i915_gem_request *request)
+{
+	trace_i915_gem_request_retire(request);
+
+	list_del_init(&request->list);
+	i915_gem_request_remove_from_client(request);
+
+	put_pid(request->pid);
+
+	i915_gem_request_unreference(request);
+}
+
+static void
+__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *engine = rq->ring;
+
+	lockdep_assert_held(&engine->dev->struct_mutex);
+
+	if (list_empty(&rq->list))
+		return;
+
+	rq->ringbuf->last_retired_head = rq->postfix;
+
+	do {
+		struct drm_i915_gem_request *prev =
+			list_entry(rq->list.prev, typeof(*rq), list);
+
+		i915_gem_request_retire(rq);
+
+		rq = prev;
+	} while (&rq->list != &engine->request_list);
+
+	WARN_ON(i915_verify_lists(engine->dev));
+}
+
 /**
  * Waits for a request to be signaled, and cleans up the
  * request and object lists appropriately for that event.
@@ -1348,7 +1400,6 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	struct drm_device *dev;
 	struct drm_i915_private *dev_priv;
 	bool interruptible;
-	unsigned reset_counter;
 	int ret;
 
 	BUG_ON(req == NULL);
@@ -1367,29 +1418,13 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
-	ret = __i915_wait_request(req, reset_counter,
+	ret = __i915_wait_request(req,
+				  atomic_read(&dev_priv->gpu_error.reset_counter),
 				  interruptible, NULL, NULL);
-	i915_gem_request_unreference(req);
-	return ret;
-}
-
-static int
-i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
-{
-	if (!obj->active)
-		return 0;
-
-	/* Manually manage the write flush as we may have not yet
-	 * retired the buffer.
-	 *
-	 * Note that the last_write_req is always the earlier of
-	 * the two (read/write) requests, so if we haved successfully waited,
-	 * we know we have passed the last write.
-	 */
-	i915_gem_request_assign(&obj->last_write_req, NULL);
+	if (ret)
+		return ret;
 
+	__i915_gem_request_retire__upto(req);
 	return 0;
 }
 
@@ -1401,18 +1436,38 @@ static __must_check int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly)
 {
-	struct drm_i915_gem_request *req;
-	int ret;
+	int ret, i;
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
-	ret = i915_wait_request(req);
-	if (ret)
-		return ret;
+	if (readonly) {
+		if (obj->last_write_req != NULL) {
+			ret = i915_wait_request(obj->last_write_req);
+			if (ret)
+				return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+			i = obj->last_write_req->ring->id;
+			if (obj->last_read_req[i] == obj->last_write_req)
+				i915_gem_object_retire__read(obj, i);
+			else
+				i915_gem_object_retire__write(obj);
+		}
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			if (obj->last_read_req[i] == NULL)
+				continue;
+
+			ret = i915_wait_request(obj->last_read_req[i]);
+			if (ret)
+				return ret;
+
+			i915_gem_object_retire__read(obj, i);
+		}
+		RQ_BUG_ON(obj->active);
+	}
+
+	return 0;
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
@@ -1423,37 +1478,72 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 					    struct drm_i915_file_private *file_priv,
 					    bool readonly)
 {
-	struct drm_i915_gem_request *req;
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret;
+	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 	BUG_ON(!dev_priv->mm.interruptible);
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_olr(req);
-	if (ret)
-		return ret;
-
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	if (readonly) {
+		struct drm_i915_gem_request *rq;
+
+		rq = obj->last_write_req;
+		if (rq == NULL)
+			return 0;
+
+		ret = i915_gem_check_olr(rq);
+		if (ret)
+			goto err;
+
+		requests[n++] = i915_gem_request_reference(rq);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			struct drm_i915_gem_request *rq;
+
+			rq = obj->last_read_req[i];
+			if (rq == NULL)
+				continue;
+
+			ret = i915_gem_check_olr(rq);
+			if (ret)
+				goto err;
+
+			requests[n++] = i915_gem_request_reference(rq);
+		}
+	}
+
 	mutex_unlock(&dev->struct_mutex);
-	ret = __i915_wait_request(req, reset_counter, true, NULL, file_priv);
+	for (i = 0; ret == 0 && i < n; i++)
+		ret = __i915_wait_request(requests[i], reset_counter, true,
+					  NULL, file_priv);
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_request_unreference(req);
-	if (ret)
-		return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+err:
+	for (i = 0; i < n; i++) {
+		if (ret == 0) {
+			int ring = requests[i]->ring->id;
+			if (obj->last_read_req[ring] == requests[i])
+				i915_gem_object_retire__read(obj, ring);
+			if (obj->last_write_req == requests[i])
+				i915_gem_object_retire__write(obj);
+			__i915_gem_request_retire__upto(requests[i]);
+		}
+		i915_gem_request_unreference(requests[i]);
+	}
+
+	return ret;
 }
 
 /**
@@ -2204,78 +2294,58 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static void
-i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
-			       struct intel_engine_cs *ring)
+void i915_vma_move_to_active(struct i915_vma *vma,
+			     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req;
-	struct intel_engine_cs *old_ring;
-
-	BUG_ON(ring == NULL);
-
-	req = intel_ring_get_request(ring);
-	old_ring = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (old_ring != ring && obj->last_write_req) {
-		/* Keep the request relative to the current ring */
-		i915_gem_request_assign(&obj->last_write_req, req);
-	}
+	struct drm_i915_gem_object *obj = vma->obj;
 
 	/* Add a reference if we're newly entering the active list. */
-	if (!obj->active) {
+	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
-		obj->active = 1;
-	}
+	obj->active |= intel_ring_flag(ring);
 
-	list_move_tail(&obj->ring_list, &ring->active_list);
+	list_move_tail(&obj->ring_list[ring->id], &ring->active_list);
+	i915_gem_request_assign(&obj->last_read_req[ring->id],
+				intel_ring_get_request(ring));
 
-	i915_gem_request_assign(&obj->last_read_req, req);
+	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
 
-void i915_vma_move_to_active(struct i915_vma *vma,
-			     struct intel_engine_cs *ring)
+static void
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	list_move_tail(&vma->mm_list, &vma->vm->active_list);
-	return i915_gem_object_move_to_active(vma->obj, ring);
+	RQ_BUG_ON(obj->last_write_req == NULL);
+	RQ_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
+
+	i915_gem_request_assign(&obj->last_write_req, NULL);
+	intel_fb_obj_flush(obj, true);
 }
 
 static void
-i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS);
-	BUG_ON(!obj->active);
+	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
+	RQ_BUG_ON(!(obj->active & (1 << ring)));
+
+	list_del_init(&obj->ring_list[ring]);
+	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+
+	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
+		i915_gem_object_retire__write(obj);
+
+	obj->active &= ~(1 << ring);
+	if (obj->active)
+		return;
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (!list_empty(&vma->mm_list))
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
-	intel_fb_obj_flush(obj, true);
-
-	list_del_init(&obj->ring_list);
-
-	i915_gem_request_assign(&obj->last_read_req, NULL);
-	i915_gem_request_assign(&obj->last_write_req, NULL);
-	obj->base.write_domain = 0;
-
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
-
-	obj->active = 0;
 	drm_gem_object_unreference(&obj->base);
-
-	WARN_ON(i915_verify_lists(dev));
-}
-
-static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj)
-{
-	if (obj->last_read_req == NULL)
-		return;
-
-	if (i915_gem_request_completed(obj->last_read_req, true))
-		i915_gem_object_move_to_inactive(obj);
 }
 
 static int
@@ -2452,20 +2522,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
-{
-	struct drm_i915_file_private *file_priv = request->file_priv;
-
-	if (!file_priv)
-		return;
-
-	spin_lock(&file_priv->mm.lock);
-	list_del(&request->client_list);
-	request->file_priv = NULL;
-	spin_unlock(&file_priv->mm.lock);
-}
-
 static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
 				   const struct intel_context *ctx)
 {
@@ -2511,16 +2567,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_free_request(struct drm_i915_gem_request *request)
-{
-	list_del(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	put_pid(request->pid);
-
-	i915_gem_request_unreference(request);
-}
-
 void i915_gem_request_free(struct kref *req_ref)
 {
 	struct drm_i915_gem_request *req = container_of(req_ref,
@@ -2583,9 +2629,9 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		obj = list_first_entry(&ring->active_list,
 				       struct drm_i915_gem_object,
-				       ring_list);
+				       ring_list[ring->id]);
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	/*
@@ -2622,7 +2668,8 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 					   struct drm_i915_gem_request,
 					   list);
 
-		i915_gem_free_request(request);
+		request->ringbuf->last_retired_head = request->tail;
+		i915_gem_request_retire(request);
 	}
 
 	/* This may not have been flushed before the reset, so clean it now */
@@ -2670,6 +2717,8 @@ void i915_gem_reset(struct drm_device *dev)
 	i915_gem_context_reset(dev);
 
 	i915_gem_restore_fences(dev);
+
+	WARN_ON(i915_verify_lists(dev));
 }
 
 /**
@@ -2678,11 +2727,11 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
-	if (list_empty(&ring->request_list))
-		return;
-
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	if (list_empty(&ring->active_list))
+		return;
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2698,16 +2747,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		if (!i915_gem_request_completed(request, true))
 			break;
 
-		trace_i915_gem_request_retire(request);
-
 		/* We know the GPU must have read the request to have
 		 * sent us the seqno + interrupt, so use the position
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
 		request->ringbuf->last_retired_head = request->postfix;
-
-		i915_gem_free_request(request);
+		i915_gem_request_retire(request);
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
@@ -2719,12 +2765,12 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		obj = list_first_entry(&ring->active_list,
 				      struct drm_i915_gem_object,
-				      ring_list);
+				      ring_list[ring->id]);
 
-		if (!i915_gem_request_completed(obj->last_read_req, true))
+		if (!list_empty(&obj->last_read_req[ring->id]->list))
 			break;
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	if (unlikely(ring->trace_irq_req &&
@@ -2813,17 +2859,23 @@ i915_gem_idle_work_handler(struct work_struct *work)
 static int
 i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
-	struct intel_engine_cs *ring;
-	int ret;
+	int ret, i;
 
-	if (obj->active) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+	if (!obj->active)
+		return 0;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_request *rq;
+
+		rq = obj->last_read_req[i];
+		if (rq == NULL)
+			continue;
 
-		ret = i915_gem_check_olr(obj->last_read_req);
+		ret = i915_gem_check_olr(rq);
 		if (ret)
 			return ret;
 
-		i915_gem_retire_requests_ring(ring);
+		i915_gem_retire_requests_ring(rq->ring);
 	}
 
 	return 0;
@@ -2857,9 +2909,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *req;
+	struct drm_i915_gem_request *req[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret = 0;
+	int i, n = 0;
+	int ret;
 
 	if (args->flags != 0)
 		return -EINVAL;
@@ -2879,11 +2932,9 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (ret)
 		goto out;
 
-	if (!obj->active || !obj->last_read_req)
+	if (!obj->active)
 		goto out;
 
-	req = obj->last_read_req;
-
 	/* Do this after OLR check to make sure we make forward progress polling
 	 * on this IOCTL with a timeout == 0 (like busy ioctl)
 	 */
@@ -2894,13 +2945,23 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	drm_gem_object_unreference(&obj->base);
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		if (obj->last_read_req[i] == NULL)
+			continue;
+
+		req[n++] = i915_gem_request_reference(obj->last_read_req[i]);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __i915_wait_request(req, reset_counter, true,
-				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-				  file->driver_priv);
-	i915_gem_request_unreference__unlocked(req);
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			ret = __i915_wait_request(req[i], reset_counter, true,
+						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
+						  file->driver_priv);
+		i915_gem_request_unreference__unlocked(req[i]);
+	}
 	return ret;
 
 out:
@@ -2909,6 +2970,62 @@ out:
 	return ret;
 }
 
+static int
+__i915_gem_object_sync(struct drm_i915_gem_object *obj,
+		       struct intel_engine_cs *to,
+		       struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *from;
+	int ret;
+
+	if (rq == NULL)
+		return 0;
+
+	from = i915_gem_request_get_ring(rq);
+	if (to == from)
+		return 0;
+
+	if (i915_gem_request_completed(rq, true))
+		return 0;
+
+	ret = i915_gem_check_olr(rq);
+	if (ret)
+		return ret;
+
+	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+		ret = __i915_wait_request(rq,
+					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
+					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
+		if (ret)
+			return ret;
+
+		if (obj->last_read_req[from->id] == rq)
+			i915_gem_object_retire__read(obj, from->id);
+		if (obj->last_write_req == rq)
+			i915_gem_object_retire__write(obj);
+	} else {
+		int idx = intel_ring_sync_index(from, to);
+		u32 seqno = i915_gem_request_get_seqno(rq);
+
+		if (seqno <= from->semaphore.sync_seqno[idx])
+			return 0;
+
+		trace_i915_gem_ring_sync_to(from, to, rq);
+		ret = to->semaphore.sync_to(to, from, seqno);
+		if (ret)
+			return ret;
+
+		/* We use last_read_req because sync_to()
+		 * might have just caused seqno wrap under
+		 * the radar.
+		 */
+		from->semaphore.sync_seqno[idx] =
+			i915_gem_request_get_seqno(obj->last_read_req[from->id]);
+	}
+
+	return 0;
+}
+
 /**
  * i915_gem_object_sync - sync an object to a ring.
  *
@@ -2917,7 +3034,17 @@ out:
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
- * rather than a particular GPU ring.
+ * rather than a particular GPU ring. Conceptually we serialise writes
+ * between engines inside the GPU. We only allow on engine to write
+ * into a buffer at any time, but multiple readers. To ensure each has
+ * a coherent view of memory, we must:
+ *
+ * - If there is an outstanding write request to the object, the new
+ *   request must wait for it to complete (either CPU or in hw, requests
+ *   on the same ring will be naturally ordered).
+ *
+ * - If we are a write request (pending_write_domain is set), the new
+ *   request must wait for outstanding read requests to complete.
  *
  * Returns 0 if successful, else propagates up the lower layer error.
  */
@@ -2925,39 +3052,25 @@ int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		     struct intel_engine_cs *to)
 {
-	struct intel_engine_cs *from;
-	u32 seqno;
-	int ret, idx;
-
-	from = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (from == NULL || to == from)
-		return 0;
-
-	if (to == NULL || !i915_semaphore_is_enabled(obj->base.dev))
-		return i915_gem_object_wait_rendering(obj, false);
-
-	idx = intel_ring_sync_index(from, to);
+	const bool readonly = obj->base.pending_write_domain == 0;
+	int ret, i;
 
-	seqno = i915_gem_request_get_seqno(obj->last_read_req);
-	/* Optimization: Avoid semaphore sync when we are sure we already
-	 * waited for an object with higher seqno */
-	if (seqno <= from->semaphore.sync_seqno[idx])
+	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_olr(obj->last_read_req);
-	if (ret)
-		return ret;
-
-	trace_i915_gem_ring_sync_to(from, to, obj->last_read_req);
-	ret = to->semaphore.sync_to(to, from, seqno);
-	if (!ret)
-		/* We use last_read_req because sync_to()
-		 * might have just caused seqno wrap under
-		 * the radar.
-		 */
-		from->semaphore.sync_seqno[idx] =
-				i915_gem_request_get_seqno(obj->last_read_req);
+	if (to == NULL) {
+		ret = i915_gem_object_wait_rendering(obj, readonly);
+	} else if (readonly) {
+		ret = __i915_gem_object_sync(obj, to,
+					     obj->last_write_req);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			ret = __i915_gem_object_sync(obj, to,
+						     obj->last_read_req[i]);
+			if (ret)
+				break;
+		}
+	}
 
 	return ret;
 }
@@ -3044,10 +3157,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
 	if (list_empty(&obj->vma_list)) {
-		/* Throw away the active reference before
-		 * moving to the unbound list. */
-		i915_gem_object_retire(obj);
-
 		i915_gem_gtt_finish_object(obj);
 		list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
 	}
@@ -3080,6 +3189,7 @@ int i915_gpu_idle(struct drm_device *dev)
 			return ret;
 	}
 
+	WARN_ON(i915_verify_lists(dev));
 	return 0;
 }
 
@@ -3713,8 +3823,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
-
 	/* Flush and acquire obj->pages so that we are coherent through
 	 * direct access in memory with previous cached writes through
 	 * shmemfs and that our cache domain tracking remains valid.
@@ -3940,11 +4048,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	bool was_pin_display;
 	int ret;
 
-	if (pipelined != i915_gem_request_get_ring(obj->last_read_req)) {
-		ret = i915_gem_object_sync(obj, pipelined);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_object_sync(obj, pipelined);
+	if (ret)
+		return ret;
 
 	/* Mark the pin_display early so that we account for the
 	 * display coherency whilst setting up the cache domains.
@@ -4049,7 +4155,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
 	i915_gem_object_flush_gtt_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -4359,15 +4464,15 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * necessary flushes here.
 	 */
 	ret = i915_gem_object_flush_active(obj);
+	if (ret)
+		goto unref;
 
-	args->busy = obj->active;
-	if (obj->last_read_req) {
-		struct intel_engine_cs *ring;
-		BUILD_BUG_ON(I915_NUM_RINGS > 16);
-		ring = i915_gem_request_get_ring(obj->last_read_req);
-		args->busy |= intel_ring_flag(ring) << 16;
-	}
+	BUILD_BUG_ON(I915_NUM_RINGS > 16);
+	args->busy = obj->active << 16;
+	if (obj->last_write_req)
+		args->busy |= intel_ring_flag(obj->last_write_req->ring);
 
+unref:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
@@ -4441,8 +4546,11 @@ unlock:
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			  const struct drm_i915_gem_object_ops *ops)
 {
+	int i;
+
 	INIT_LIST_HEAD(&obj->global_list);
-	INIT_LIST_HEAD(&obj->ring_list);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		INIT_LIST_HEAD(&obj->ring_list[i]);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f3e84c44d009..18900f745bc6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -768,8 +768,6 @@ static int do_switch(struct intel_engine_cs *ring,
 		 * swapped, but there is no way to do that yet.
 		 */
 		from->legacy_hw_ctx.rcs_state->dirty = 1;
-		BUG_ON(i915_gem_request_get_ring(
-			from->legacy_hw_ctx.rcs_state->last_read_req) != ring);
 
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
index f462d1b51d97..17299d04189f 100644
--- a/drivers/gpu/drm/i915/i915_gem_debug.c
+++ b/drivers/gpu/drm/i915/i915_gem_debug.c
@@ -34,82 +34,34 @@ int
 i915_verify_lists(struct drm_device *dev)
 {
 	static int warned;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *ring;
 	int err = 0;
+	int i;
 
 	if (warned)
 		return 0;
 
-	list_for_each_entry(obj, &dev_priv->render_ring.active_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed render active %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid render active %p (a %d r %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.read_domains);
-			err++;
-		} else if (obj->base.write_domain && list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid render active %p (w %x, gwl %d)\n",
-				  obj,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.flushing_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed flushing %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0 ||
-			   list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid flushing %p (a %d w %x gwl %d)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.gpu_write_list, gpu_write_list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed gpu write %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid gpu write %p (a %d w %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain);
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &i915_gtt_vm->inactive_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed inactive %p\n", obj);
-			err++;
-			break;
-		} else if (obj->pin_count || obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS)) {
-			DRM_ERROR("invalid inactive %p (p %d a %d w %x)\n",
-				  obj,
-				  obj->pin_count, obj->active,
-				  obj->base.write_domain);
-			err++;
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
+			if (obj->base.dev != dev ||
+			    !atomic_read(&obj->base.refcount.refcount)) {
+				DRM_ERROR("%s: freed active obj %p\n",
+					  ring->name, obj);
+				err++;
+				break;
+			} else if (!obj->active ||
+				   obj->last_read_req[ring->id] == NULL) {
+				DRM_ERROR("%s: invalid active obj %p\n",
+					  ring->name, obj);
+				err++;
+			} else if (obj->base.write_domain) {
+				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
+					  ring->name,
+					  obj, obj->base.write_domain);
+				err++;
+			}
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1d4e60df8883..5f798961266f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -192,15 +192,20 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 				struct drm_i915_error_buffer *err,
 				int count)
 {
+	int i;
+
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x %x %x",
+		err_printf(m, "    %08x %8u %02x %02x [ ",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
-			   err->write_domain,
-			   err->rseqno, err->wseqno);
+			   err->write_domain);
+		for (i = 0; i < I915_NUM_RINGS; i++)
+			err_printf(m, "%02x ", err->rseqno[i]);
+
+		err_printf(m, "] %02x", err->wseqno);
 		err_puts(m, pin_flag(err->pinned));
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
@@ -679,10 +684,12 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
+	int i;
 
 	err->size = obj->base.size;
 	err->name = obj->base.name;
-	err->rseqno = i915_gem_request_get_seqno(obj->last_read_req);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read_req[i]);
 	err->wseqno = i915_gem_request_get_seqno(obj->last_write_req);
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
@@ -695,8 +702,8 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
-	err->ring = obj->last_read_req ?
-			i915_gem_request_get_ring(obj->last_read_req)->id : -1;
+	err->ring = obj->last_write_req ?
+			i915_gem_request_get_ring(obj->last_write_req)->id : -1;
 	err->cache_level = obj->cache_level;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 5eb159bcd599..64b67df94d33 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9895,7 +9895,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 	else if (i915.enable_execlists)
 		return true;
 	else
-		return ring != i915_gem_request_get_ring(obj->last_read_req);
+		return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc)
@@ -10199,7 +10199,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
 		ring = &dev_priv->ring[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+		ring = i915_gem_request_get_ring(obj->last_write_req);
 		if (ring == NULL || ring->id != RCS)
 			ring = &dev_priv->ring[BCS];
 	} else {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1d0fb8450adc..fb4f3792fd78 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -901,6 +901,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
 	struct drm_i915_gem_request *request;
+	unsigned space;
 	int ret;
 
 	if (intel_ring_space(ringbuf) >= bytes)
@@ -912,15 +913,14 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 		 * from multiple ringbuffers. Here, we must ignore any that
 		 * aren't from the ringbuffer we're considering.
 		 */
-		struct intel_context *ctx = request->ctx;
-		if (ctx->engine[ring->id].ringbuf != ringbuf)
+		if (request->ringbuf != ringbuf)
 			continue;
 
 		/* Would completion of this request free enough space? */
-		if (__intel_ring_space(request->tail, ringbuf->tail,
-				       ringbuf->size) >= bytes) {
+		space = __intel_ring_space(request->tail, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= bytes)
 			break;
-		}
 	}
 
 	if (&request->list == &ring->request_list)
@@ -930,9 +930,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
-	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
+	ringbuf->space = bytes;
+	return 0;
 }
 
 static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a351178913f7..a1184e700d1d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2061,16 +2061,17 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_request *request;
+	unsigned space;
 	int ret;
 
 	if (intel_ring_space(ringbuf) >= n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (__intel_ring_space(request->postfix, ringbuf->tail,
-				       ringbuf->size) >= n) {
+		space = __intel_ring_space(request->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= n)
 			break;
-		}
 	}
 
 	if (&request->list == &ring->request_list)
@@ -2080,8 +2081,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
+	ringbuf->space = space;
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (16 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 15:34   ` Paulo Zanoni
  2015-03-27 11:01 ` [PATCH 19/49] drm/i915: Record ring->start address in error state Chris Wilson
                   ` (30 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala, Paulo Zanoni

Delay the expensive read on the FPGA_DBG register from once per mmio to
once per forcewake section when we are doing the general wellbeing
check rather than the targetted error detection. This almost reduces
the overhead of the debug facility (for example when submitting execlists)
to zero whilst keeping the debug checks around.

v2: Enable one-shot mmio debugging from the interrupt check as well as a
    safeguard to catch invalid display writes from outside the powerwell.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index ab5cc94588e1..0e32bbbcada8 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
 }
 
 static void
+hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
+{
+	static bool mmio_debug_once = true;
+
+	if (i915.mmio_debug || !mmio_debug_once)
+		return;
+
+	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
+		DRM_DEBUG("Unclaimed register detected, "
+			  "enabling oneshot unclaimed register reporting. "
+			  "Please use i915.mmio_debug=N for more information.\n");
+		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
+		i915.mmio_debug = mmio_debug_once--;
+	}
+}
+
+static void
+fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
+{
+	hsw_unclaimed_reg_detect(dev_priv);
+	fw_domains_put(dev_priv, fw_domains);
+}
+
+static void
 fw_domains_posting_read(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *d;
@@ -561,23 +585,6 @@ hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, bool read,
 	}
 }
 
-static void
-hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
-{
-	static bool mmio_debug_once = true;
-
-	if (i915.mmio_debug || !mmio_debug_once)
-		return;
-
-	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
-		DRM_DEBUG("Unclaimed register detected, "
-			  "enabling oneshot unclaimed register reporting. "
-			  "Please use i915.mmio_debug=N for more information.\n");
-		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
-		i915.mmio_debug = mmio_debug_once--;
-	}
-}
-
 #define GEN2_READ_HEADER(x) \
 	u##x val = 0; \
 	assert_device_not_suspended(dev_priv);
@@ -829,7 +836,6 @@ hsw_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
 		gen6_gt_check_fifodbg(dev_priv); \
 	} \
 	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
-	hsw_unclaimed_reg_detect(dev_priv); \
 	GEN6_WRITE_FOOTER; \
 }
 
@@ -871,7 +877,6 @@ gen8_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace
 		__force_wake_get(dev_priv, FORCEWAKE_RENDER); \
 	__raw_i915_write##x(dev_priv, reg, val); \
 	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
-	hsw_unclaimed_reg_detect(dev_priv); \
 	GEN6_WRITE_FOOTER; \
 }
 
@@ -1120,6 +1125,10 @@ static void intel_uncore_fw_domains_init(struct drm_device *dev)
 			       FORCEWAKE, FORCEWAKE_ACK);
 	}
 
+	if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
+	    dev_priv->uncore.funcs.force_wake_put == fw_domains_put)
+		dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
+
 	/* All future platforms are expected to require complex power gating */
 	WARN_ON(dev_priv->uncore.fw_domains == 0);
 }
@@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
 
 void intel_uncore_check_errors(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
-	    (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
-		DRM_ERROR("Unclaimed register before interrupt\n");
-		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
-	}
+	if (HAS_FPGA_DBG_UNCLAIMED(dev))
+		hsw_unclaimed_reg_detect(to_i915(dev));
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/49] drm/i915: Record ring->start address in error state
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (17 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 20/49] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

This is mostly useful for execlists where the rings switch between
contexts (and so checking that the ring's start register matches the
context is important).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7cf5d1b0a749..68a50891830f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -455,6 +455,7 @@ struct drm_i915_error_state {
 		u32 semaphore_seqno[I915_NUM_RINGS - 1];
 
 		/* Register state */
+		u32 start;
 		u32 tail;
 		u32 head;
 		u32 ctl;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5f798961266f..17dc2fcaba10 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -256,10 +256,11 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 		return;
 
 	err_printf(m, "%s command stream:\n", ring_str(ring_idx));
-	err_printf(m, "  HEAD: 0x%08x\n", ring->head);
-	err_printf(m, "  TAIL: 0x%08x\n", ring->tail);
-	err_printf(m, "  CTL: 0x%08x\n", ring->ctl);
-	err_printf(m, "  HWS: 0x%08x\n", ring->hws);
+	err_printf(m, "  START: 0x%08x\n", ring->start);
+	err_printf(m, "  HEAD:  0x%08x\n", ring->head);
+	err_printf(m, "  TAIL:  0x%08x\n", ring->tail);
+	err_printf(m, "  CTL:   0x%08x\n", ring->ctl);
+	err_printf(m, "  HWS:   0x%08x\n", ring->hws);
 	err_printf(m, "  ACTHD: 0x%08x %08x\n", (u32)(ring->acthd>>32), (u32)ring->acthd);
 	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
 	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
@@ -890,6 +891,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
 	ering->seqno = ring->get_seqno(ring, false);
 	ering->acthd = intel_ring_get_active_head(ring);
+	ering->start = I915_READ_START(ring);
 	ering->head = I915_READ_HEAD(ring);
 	ering->tail = I915_READ_TAIL(ring);
 	ering->ctl = I915_READ_CTL(ring);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/49] drm/i915: Use simpler form of spin_lock_irq(execlist_lock)
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (18 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 19/49] drm/i915: Record ring->start address in error state Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

We can use the simpler spinlock form to disable interrupts as we are
always outside of an irq/softirq handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fb4f3792fd78..9b7824ac35dc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -501,7 +501,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_request *cursor;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	unsigned long flags;
 	int num_elements = 0;
 
 	if (to != ring->default_context)
@@ -528,7 +527,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	intel_runtime_pm_get(dev_priv);
 
-	spin_lock_irqsave(&ring->execlist_lock, flags);
+	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
 		if (++num_elements > 2)
@@ -554,7 +553,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
-	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+	spin_unlock_irq(&ring->execlist_lock);
 
 	return 0;
 }
@@ -723,7 +722,6 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *tmp;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	unsigned long flags;
 	struct list_head retired_list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -731,9 +729,9 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 		return;
 
 	INIT_LIST_HEAD(&retired_list);
-	spin_lock_irqsave(&ring->execlist_lock, flags);
+	spin_lock_irq(&ring->execlist_lock);
 	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
-	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+	spin_unlock_irq(&ring->execlist_lock);
 
 	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
 		struct intel_context *ctx = req->ctx;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (19 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 20/49] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 14:19   ` Daniel Vetter
  2015-03-27 11:01 ` [PATCH 22/49] drm/i915: Map the execlists context regs once during pinning Chris Wilson
                   ` (27 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c  | 1 -
 drivers/gpu/drm/i915/intel_lrc.c | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e6f2560bf35..4ec195a63d60 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2646,7 +2646,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 				struct drm_i915_gem_request,
 				execlist_link);
 		list_del(&submit_req->execlist_link);
-		intel_runtime_pm_put(dev_priv);
 
 		if (submit_req->ctx != ring->default_context)
 			intel_lr_context_unpin(ring, submit_req->ctx);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9b7824ac35dc..2ed1cf448c6f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -525,8 +525,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	}
 	request->tail = tail;
 
-	intel_runtime_pm_get(dev_priv);
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
@@ -740,7 +738,6 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 
 		if (ctx_obj && (ctx != ring->default_context))
 			intel_lr_context_unpin(ring, ctx);
-		intel_runtime_pm_put(dev_priv);
 		list_del(&req->execlist_link);
 		i915_gem_request_unreference(req);
 	}
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/49] drm/i915: Map the execlists context regs once during pinning
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (20 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 23/49] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

When we pin the execlists context on queuing, it the ideal time to map
the register page that we need to update when we submit the request to
the hardware (and keep it around for future requests).

This avoids having to do an atomic kmap on every submission. On the
other hand, it does depend upon correct request construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |  10 --
 drivers/gpu/drm/i915/intel_lrc.c        | 157 ++++++++++++--------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   2 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 4 files changed, 57 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4ec195a63d60..cc23a8773a89 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2574,13 +2574,6 @@ void i915_gem_request_free(struct kref *req_ref)
 	struct intel_context *ctx = req->ctx;
 
 	if (ctx) {
-		if (i915.enable_execlists) {
-			struct intel_engine_cs *ring = req->ring;
-
-			if (ctx != ring->default_context)
-				intel_lr_context_unpin(ring, ctx);
-		}
-
 		i915_gem_context_unreference(ctx);
 	}
 
@@ -2647,9 +2640,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 				execlist_link);
 		list_del(&submit_req->execlist_link);
 
-		if (submit_req->ctx != ring->default_context)
-			intel_lr_context_unpin(ring, submit_req->ctx);
-
 		i915_gem_request_unreference(submit_req);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2ed1cf448c6f..61c103b9ba22 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -203,9 +203,6 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
-static int intel_lr_context_pin(struct intel_engine_cs *ring,
-		struct intel_context *ctx);
-
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
  * @dev: DRM device.
@@ -318,47 +315,18 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
-static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
-				    struct drm_i915_gem_object *ring_obj,
-				    u32 tail)
-{
-	struct page *page;
-	uint32_t *reg_state;
-
-	page = i915_gem_object_get_page(ctx_obj, 1);
-	reg_state = kmap_atomic(page);
-
-	reg_state[CTX_RING_TAIL+1] = tail;
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
-
-	kunmap_atomic(reg_state);
-
-	return 0;
-}
-
 static void execlists_submit_contexts(struct intel_engine_cs *ring,
 				      struct intel_context *to0, u32 tail0,
 				      struct intel_context *to1, u32 tail1)
 {
 	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf0 = to0->engine[ring->id].ringbuf;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
-	struct intel_ringbuffer *ringbuf1 = NULL;
-
-	BUG_ON(!ctx_obj0);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
-	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	to0->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail0;
 
 	if (to1) {
-		ringbuf1 = to1->engine[ring->id].ringbuf;
+		to1->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail1;
 		ctx_obj1 = to1->engine[ring->id].state;
-		BUG_ON(!ctx_obj1);
-		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
-		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
-
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -500,29 +468,17 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct drm_i915_gem_request *request)
 {
 	struct drm_i915_gem_request *cursor;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int num_elements = 0;
 
-	if (to != ring->default_context)
-		intel_lr_context_pin(ring, to);
+	if (WARN_ON(request == NULL))
+		return -ENODEV;
+
+	if (WARN_ON(to->engine[ring->id].pin_count == 0))
+		return -ENODEV;
+
+	i915_gem_request_reference(request);
+	WARN_ON(to != request->ctx);
 
-	if (!request) {
-		/*
-		 * If there isn't a request associated with this submission,
-		 * create one as a temporary holder.
-		 */
-		request = kzalloc(sizeof(*request), GFP_KERNEL);
-		if (request == NULL)
-			return -ENOMEM;
-		request->ring = ring;
-		request->ctx = to;
-		kref_init(&request->ref);
-		request->uniq = dev_priv->request_uniq++;
-		i915_gem_context_reference(request->ctx);
-	} else {
-		i915_gem_request_reference(request);
-		WARN_ON(to != request->ctx);
-	}
 	request->tail = tail;
 
 	spin_lock_irq(&ring->execlist_lock);
@@ -716,30 +672,42 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 	return 0;
 }
 
+static void intel_lr_context_unpin(struct intel_engine_cs *ring,
+				   struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+
+	if (--ctx->engine[ring->id].pin_count)
+		return;
+
+	kunmap(i915_gem_object_get_page(ctx_obj, 1));
+	ringbuf->regs = NULL;
+
+	intel_unpin_ringbuffer_obj(ringbuf);
+	i915_gem_object_ggtt_unpin(ctx_obj);
+}
+
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req, *tmp;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct list_head retired_list;
+	struct list_head list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (list_empty(&ring->execlist_retired_req_list))
 		return;
 
-	INIT_LIST_HEAD(&retired_list);
 	spin_lock_irq(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
+	list_replace_init(&ring->execlist_retired_req_list, &list);
 	spin_unlock_irq(&ring->execlist_lock);
 
-	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
-		struct intel_context *ctx = req->ctx;
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
+	while (!list_empty(&list)) {
+		struct drm_i915_gem_request *rq;
+
+		rq = list_first_entry(&list, typeof(*rq), execlist_link);
+		list_del(&rq->execlist_link);
 
-		if (ctx_obj && (ctx != ring->default_context))
-			intel_lr_context_unpin(ring, ctx);
-		list_del(&req->execlist_link);
-		i915_gem_request_unreference(req);
+		intel_lr_context_unpin(ring, rq->ctx);
+		i915_gem_request_unreference(rq);
 	}
 }
 
@@ -807,25 +775,29 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 }
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
-		struct intel_context *ctx)
+				struct intel_context *ctx)
 {
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
-	int ret = 0;
+	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (ctx->engine[ring->id].pin_count++ == 0) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj,
-				GEN8_LR_CONTEXT_ALIGN, 0);
-		if (ret)
-			goto reset_pin_count;
+	if (ctx->engine[ring->id].pin_count++)
+		return 0;
 
-		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
-		if (ret)
-			goto unpin_ctx_obj;
-	}
+	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret)
+		goto reset_pin_count;
 
-	return ret;
+	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+	if (ret)
+		goto unpin_ctx_obj;
+
+	ringbuf->regs = kmap(i915_gem_object_get_page(ctx_obj, 1));
+	ringbuf->regs[CTX_RING_BUFFER_START+1] =
+		i915_gem_obj_ggtt_offset(ringbuf->obj);
+
+	return 0;
 
 unpin_ctx_obj:
 	i915_gem_object_ggtt_unpin(ctx_obj);
@@ -835,21 +807,6 @@ reset_pin_count:
 	return ret;
 }
 
-void intel_lr_context_unpin(struct intel_engine_cs *ring,
-		struct intel_context *ctx)
-{
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
-
-	if (ctx_obj) {
-		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-		if (--ctx->engine[ring->id].pin_count == 0) {
-			intel_unpin_ringbuffer_obj(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
-		}
-	}
-}
-
 static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 				      struct intel_context *ctx)
 {
@@ -864,12 +821,10 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 	if (request == NULL)
 		return -ENOMEM;
 
-	if (ctx != ring->default_context) {
-		ret = intel_lr_context_pin(ring, ctx);
-		if (ret) {
-			kfree(request);
-			return ret;
-		}
+	ret = intel_lr_context_pin(ring, ctx);
+	if (ret) {
+		kfree(request);
+		return ret;
 	}
 
 	kref_init(&request->ref);
@@ -1990,7 +1945,7 @@ error_unpin_ctx:
 }
 
 void intel_lr_context_reset(struct drm_device *dev,
-			struct intel_context *ctx)
+			    struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index adb731e49c57..1d24d4f963f1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -71,8 +71,6 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
-void intel_lr_context_unpin(struct intel_engine_cs *ring,
-		struct intel_context *ctx);
 void intel_lr_context_reset(struct drm_device *dev,
 			struct intel_context *ctx);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1d08d8f9149d..2477cf3e3906 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -97,6 +97,7 @@ struct intel_ring_hangcheck {
 struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
+	uint32_t *regs;
 
 	struct intel_engine_cs *ring;
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 23/49] drm/i915: Remove vestigal DRI1 ring quiescing code
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (21 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 22/49] drm/i915: Map the execlists context regs once during pinning Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 24/49] drm/i915: Tidy execlist submission Chris Wilson
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

After the removal of DRI1, all access to the rings are through requests
and so we can always be sure that there is a request to wait upon to
free up available space. The fallback code only existed so that we could
quiesce the GPU following unmediated access by DRI1.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_trace.h       | 27 ----------------
 drivers/gpu/drm/i915/intel_lrc.c        | 57 +++------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 56 ++------------------------------
 3 files changed, 6 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b3070a4501ab..97483e21c9b4 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -597,33 +597,6 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_wait_end,
 	    TP_ARGS(req)
 );
 
-DECLARE_EVENT_CLASS(i915_ring,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring),
-
-	    TP_STRUCT__entry(
-			     __field(u32, dev)
-			     __field(u32, ring)
-			     ),
-
-	    TP_fast_assign(
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   ),
-
-	    TP_printk("dev=%u, ring=%u", __entry->dev, __entry->ring)
-);
-
-DEFINE_EVENT(i915_ring, i915_ring_wait_begin,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring)
-);
-
-DEFINE_EVENT(i915_ring, i915_ring_wait_end,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring)
-);
-
 TRACE_EVENT(i915_flip_request,
 	    TP_PROTO(int plane, struct drm_i915_gem_object *obj),
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 61c103b9ba22..3cd40699522e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -846,8 +846,9 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
-				     int bytes)
+static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
+				       struct intel_context *ctx,
+				       int bytes)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
 	struct drm_i915_gem_request *request;
@@ -873,7 +874,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 			break;
 	}
 
-	if (&request->list == &ring->request_list)
+	if (WARN_ON(&request->list == &ring->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(request);
@@ -884,56 +885,6 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 	return 0;
 }
 
-static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
-				       struct intel_context *ctx,
-				       int bytes)
-{
-	struct intel_engine_cs *ring = ringbuf->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long end;
-	int ret;
-
-	ret = logical_ring_wait_request(ringbuf, bytes);
-	if (ret != -ENOSPC)
-		return ret;
-
-	/* Force the context submission in case we have been skipping it */
-	intel_logical_ring_advance_and_submit(ringbuf, ctx, NULL);
-
-	/* With GEM the hangcheck timer should kick us out of the loop,
-	 * leaving it early runs the risk of corrupting GEM state (due
-	 * to running on almost untested codepaths). But on resume
-	 * timers don't work yet, so prevent a complete hang in that
-	 * case by choosing an insanely large timeout. */
-	end = jiffies + 60 * HZ;
-
-	ret = 0;
-	do {
-		if (intel_ring_space(ringbuf) >= bytes)
-			break;
-
-		msleep(1);
-
-		if (dev_priv->mm.interruptible && signal_pending(current)) {
-			ret = -ERESTARTSYS;
-			break;
-		}
-
-		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-					   dev_priv->mm.interruptible);
-		if (ret)
-			break;
-
-		if (time_after(jiffies, end)) {
-			ret = -EBUSY;
-			break;
-		}
-	} while (1);
-
-	return ret;
-}
-
 static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
 				    struct intel_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a1184e700d1d..7e3281de417c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2057,7 +2057,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	ring->buffer = NULL;
 }
 
-static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
+static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_request *request;
@@ -2074,7 +2074,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 			break;
 	}
 
-	if (&request->list == &ring->request_list)
+	if (WARN_ON(&request->list == &ring->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(request);
@@ -2085,58 +2085,6 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	return 0;
 }
 
-static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
-{
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	unsigned long end;
-	int ret;
-
-	ret = intel_ring_wait_request(ring, n);
-	if (ret != -ENOSPC)
-		return ret;
-
-	/* force the tail write in case we have been skipping them */
-	__intel_ring_advance(ring);
-
-	/* With GEM the hangcheck timer should kick us out of the loop,
-	 * leaving it early runs the risk of corrupting GEM state (due
-	 * to running on almost untested codepaths). But on resume
-	 * timers don't work yet, so prevent a complete hang in that
-	 * case by choosing an insanely large timeout. */
-	end = jiffies + 60 * HZ;
-
-	ret = 0;
-	trace_i915_ring_wait_begin(ring);
-	do {
-		if (intel_ring_space(ringbuf) >= n)
-			break;
-		ringbuf->head = I915_READ_HEAD(ring);
-		if (intel_ring_space(ringbuf) >= n)
-			break;
-
-		msleep(1);
-
-		if (dev_priv->mm.interruptible && signal_pending(current)) {
-			ret = -ERESTARTSYS;
-			break;
-		}
-
-		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-					   dev_priv->mm.interruptible);
-		if (ret)
-			break;
-
-		if (time_after(jiffies, end)) {
-			ret = -EBUSY;
-			break;
-		}
-	} while (1);
-	trace_i915_ring_wait_end(ring);
-	return ret;
-}
-
 static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 {
 	uint32_t __iomem *virt;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 24/49] drm/i915: Tidy execlist submission
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (22 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 23/49] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 25/49] drm/i915: Move the execlists retirement to the right spot Chris Wilson
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

The list handling during submission was quite confusing as the retired
requests were out of order - making it much harder in future to reduce
the extra lists. Simplify the submission mechanism to explicitly track
the actual requests current on each port and so trim the amount of work
required to track hardware and making execlists more consistent with the
GEM core.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  21 +--
 drivers/gpu/drm/i915/i915_drv.h         |   4 -
 drivers/gpu/drm/i915/i915_gem.c         |  15 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 293 ++++++++++++--------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   1 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +-
 6 files changed, 125 insertions(+), 212 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5cea9a9c1cb9..21e2d67d3e23 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1929,8 +1929,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+	seq_printf(m, "CONTEXT: %s\n", ring->name);
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2016,7 +2015,7 @@ static int i915_execlists(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, ring_id) {
-		struct drm_i915_gem_request *head_req = NULL;
+		struct drm_i915_gem_request *rq[2];
 		int count = 0;
 		unsigned long flags;
 
@@ -2046,22 +2045,16 @@ static int i915_execlists(struct seq_file *m, void *data)
 		}
 
 		spin_lock_irqsave(&ring->execlist_lock, flags);
+		memcpy(rq, ring->execlist_port, sizeof(rq));
 		list_for_each(cursor, &ring->execlist_queue)
 			count++;
-		head_req = list_first_entry_or_null(&ring->execlist_queue,
-				struct drm_i915_gem_request, execlist_link);
 		spin_unlock_irqrestore(&ring->execlist_lock, flags);
 
 		seq_printf(m, "\t%d requests in queue\n", count);
-		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
-			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
-			seq_printf(m, "\tHead request tail: %u\n",
-				   head_req->tail);
-		}
+		seq_printf(m, "\tPort[0] seqno: %u\n",
+			   rq[0] ? rq[0]->seqno : 0);
+		seq_printf(m, "\tPort[1] seqno: %u\n",
+			   rq[1] ? rq[1]->seqno : 0);
 
 		seq_putc(m, '\n');
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 68a50891830f..ee51540e169a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2116,10 +2116,6 @@ struct drm_i915_gem_request {
 
 	/** Execlist link in the submission queue.*/
 	struct list_head execlist_link;
-
-	/** Execlists no. of times this request has been sent to the ELSP */
-	int elsp_submitted;
-
 };
 
 void i915_gem_request_free(struct kref *req_ref);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cc23a8773a89..db4a53f248a2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2632,15 +2632,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * are the ones that keep the context and ringbuffer backing objects
 	 * pinned in place.
 	 */
-	while (!list_empty(&ring->execlist_queue)) {
-		struct drm_i915_gem_request *submit_req;
-
-		submit_req = list_first_entry(&ring->execlist_queue,
-				struct drm_i915_gem_request,
-				execlist_link);
-		list_del(&submit_req->execlist_link);
+	if (i915.enable_execlists) {
+		spin_lock_irq(&ring->execlist_lock);
+		list_splice_tail_init(&ring->execlist_queue,
+				      &ring->execlist_completed);
+		memset(&ring->execlist_port, 0, sizeof(ring->execlist_port));
+		spin_unlock_irq(&ring->execlist_lock);
 
-		i915_gem_request_unreference(submit_req);
+		intel_execlists_retire_requests(ring);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3cd40699522e..a013239f5e26 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,78 +230,54 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-/**
- * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
- *
- * Do not confuse with ctx->id! Unfortunately we have a name overload
- * here: the old context ID we pass to userspace as a handler so that
- * they can refer to a context, and the new context ID we pass to the
- * ELSP so that the GPU can inform us of the context status via
- * interrupts.
- *
- * Return: 20-bits globally unique context ID.
- */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
-{
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
-
-	/* LRCA is required to be 4K aligned so the more significant 20 bits
-	 * are globally unique */
-	return lrca >> 12;
-}
-
-static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
+static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 					 struct drm_i915_gem_object *ctx_obj)
 {
-	struct drm_device *dev = ring->dev;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
+	uint32_t desc;
 
 	desc = GEN8_CTX_VALID;
 	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+	desc |= i915_gem_obj_ggtt_offset(ctx_obj);
 
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
 	/* desc |= GEN8_CTX_FORCE_RESTORE; */
 
 	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(dev) &&
-	    INTEL_REVID(dev) <= SKL_REVID_B0 &&
+	if (IS_GEN9(ring->dev) && INTEL_REVID(ring->dev) <= SKL_REVID_B0 &&
 	    (ring->id == BCS || ring->id == VCS ||
-	    ring->id == VECS || ring->id == VCS2))
+	     ring->id == VECS || ring->id == VCS2))
 		desc |= GEN8_CTX_FORCE_RESTORE;
 
 	return desc;
 }
 
-static void execlists_elsp_write(struct intel_engine_cs *ring,
-				 struct drm_i915_gem_object *ctx_obj0,
-				 struct drm_i915_gem_object *ctx_obj1)
+static uint32_t execlists_request_write_tail(struct intel_engine_cs *ring,
+					     struct drm_i915_gem_request *rq)
+
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint64_t temp = 0;
+	rq->ctx->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = rq->tail;
+	return execlists_ctx_descriptor(ring, rq->ctx->engine[ring->id].state);
+}
+
+static void execlists_submit_pair(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
 	uint32_t desc[4];
 
-	/* XXX: You must always write both descriptors in the order below. */
-	if (ctx_obj1)
-		temp = execlists_ctx_descriptor(ring, ctx_obj1);
-	else
-		temp = 0;
-	desc[1] = (u32)(temp >> 32);
-	desc[0] = (u32)temp;
+	if (ring->execlist_port[1]) {
+		desc[0] = execlists_request_write_tail(ring,
+						       ring->execlist_port[1]);
+		desc[1] = ring->execlist_port[1]->seqno;
+	} else
+		desc[1] = desc[0] = 0;
 
-	temp = execlists_ctx_descriptor(ring, ctx_obj0);
-	desc[3] = (u32)(temp >> 32);
-	desc[2] = (u32)temp;
+	desc[2] = execlists_request_write_tail(ring, ring->execlist_port[0]);
+	desc[3] = ring->execlist_port[0]->seqno;
 
+	/* Note: You must always write both descriptors in the order below. */
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	I915_WRITE(RING_ELSP(ring), desc[1]);
 	I915_WRITE(RING_ELSP(ring), desc[0]);
@@ -310,96 +286,82 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	/* The context is automatically loaded after the following */
 	I915_WRITE(RING_ELSP(ring), desc[2]);
 
-	/* ELSP is a wo register, so use another nearby reg for posting instead */
+	/* ELSP is a wo register, use another nearby reg for posting instead */
 	POSTING_READ(RING_EXECLIST_STATUS(ring));
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
-static void execlists_submit_contexts(struct intel_engine_cs *ring,
-				      struct intel_context *to0, u32 tail0,
-				      struct intel_context *to1, u32 tail1)
+static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
-	struct drm_i915_gem_object *ctx_obj1 = NULL;
+	struct drm_i915_gem_request *cursor;
+	bool submit = false;
+	int i = 0;
 
-	to0->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail0;
+	assert_spin_locked(&ring->execlist_lock);
 
-	if (to1) {
-		to1->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail1;
-		ctx_obj1 = to1->engine[ring->id].state;
+	/* Try to read in pairs */
+	cursor = ring->execlist_port[0];
+	if (cursor == NULL)
+		cursor = list_first_entry(&ring->execlist_queue,
+					  typeof(*cursor),
+					  execlist_link);
+	else
+		cursor = list_next_entry(cursor, execlist_link);
+	while (&cursor->execlist_link != &ring->execlist_queue) {
+		/* Same ctx: ignore earlier request, as the
+		 * second request extends the first.
+		 */
+		if (ring->execlist_port[i] &&
+		    cursor->ctx != ring->execlist_port[i]->ctx) {
+			if (++i == ARRAY_SIZE(ring->execlist_port))
+				break;
+		}
+
+		ring->execlist_port[i] = cursor;
+		submit = true;
+
+		cursor = list_next_entry(cursor, execlist_link);
 	}
 
-	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
+	if (submit)
+		execlists_submit_pair(ring);
 }
 
-static void execlists_context_unqueue(struct intel_engine_cs *ring)
+static bool execlists_complete_requests(struct intel_engine_cs *ring,
+					u32 seqno)
 {
-	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
-	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
-
 	assert_spin_locked(&ring->execlist_lock);
 
-	if (list_empty(&ring->execlist_queue))
-		return;
-
-	/* Try to read in pairs */
-	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
-				 execlist_link) {
-		if (!req0) {
-			req0 = cursor;
-		} else if (req0->ctx == cursor->ctx) {
-			/* Same ctx: ignore first request, as second request
-			 * will update tail past first request's workload */
-			cursor->elsp_submitted = req0->elsp_submitted;
-			list_del(&req0->execlist_link);
-			list_add_tail(&req0->execlist_link,
-				&ring->execlist_retired_req_list);
-			req0 = cursor;
-		} else {
-			req1 = cursor;
-			break;
-		}
-	}
+	if (seqno == 0)
+		return false;
 
-	WARN_ON(req1 && req1->elsp_submitted);
+	do {
+		struct drm_i915_gem_request *rq;
 
-	execlists_submit_contexts(ring, req0->ctx, req0->tail,
-				  req1 ? req1->ctx : NULL,
-				  req1 ? req1->tail : 0);
+		rq = ring->execlist_port[0];
+		if (rq == NULL)
+			break;
 
-	req0->elsp_submitted++;
-	if (req1)
-		req1->elsp_submitted++;
-}
+		if (!i915_seqno_passed(seqno, rq->seqno))
+			break;
 
-static bool execlists_check_remove_request(struct intel_engine_cs *ring,
-					   u32 request_id)
-{
-	struct drm_i915_gem_request *head_req;
+		do {
+			struct drm_i915_gem_request *prev =
+				list_entry(rq->execlist_link.prev,
+					   typeof(*rq),
+					   execlist_link);
 
-	assert_spin_locked(&ring->execlist_lock);
+			list_move_tail(&rq->execlist_link,
+				       &ring->execlist_completed);
 
-	head_req = list_first_entry_or_null(&ring->execlist_queue,
-					    struct drm_i915_gem_request,
-					    execlist_link);
+			rq = prev;
+		} while (&rq->execlist_link != &ring->execlist_queue);
 
-	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			WARN(head_req->elsp_submitted == 0,
-			     "Never submitted head request\n");
-
-			if (--head_req->elsp_submitted <= 0) {
-				list_del(&head_req->execlist_link);
-				list_add_tail(&head_req->execlist_link,
-					&ring->execlist_retired_req_list);
-				return true;
-			}
-		}
-	}
+		ring->execlist_port[0] = ring->execlist_port[1];
+		ring->execlist_port[1] = NULL;
+	} while (1);
 
-	return false;
+	return ring->execlist_port[1] == NULL;
 }
 
 /**
@@ -411,53 +373,34 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
  */
 void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	u32 status_pointer;
-	u8 read_pointer;
-	u8 write_pointer;
-	u32 status;
-	u32 status_id;
-	u32 submit_contexts = 0;
-
-	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
-
-	read_pointer = ring->next_context_status_buffer;
-	write_pointer = status_pointer & 0x07;
-	if (read_pointer > write_pointer)
-		write_pointer += 6;
-
-	spin_lock(&ring->execlist_lock);
-
-	while (read_pointer < write_pointer) {
-		read_pointer++;
-		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
-				(read_pointer % 6) * 8);
-		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
-				(read_pointer % 6) * 8 + 4);
-
-		if (status & GEN8_CTX_STATUS_PREEMPTED) {
-			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
-				if (execlists_check_remove_request(ring, status_id))
-					WARN(1, "Lite Restored request removed from queue\n");
-			} else
-				WARN(1, "Preemption without Lite Restore\n");
-		}
-
-		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
-		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
-			if (execlists_check_remove_request(ring, status_id))
-				submit_contexts++;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	u8 head, tail;
+	u32 seqno = 0;
+
+	head = ring->next_context_status_buffer;
+	tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
+	if (head > tail)
+		tail += 6;
+
+	while (head++ < tail) {
+		u32 reg = RING_CONTEXT_STATUS_BUF(ring) + (head % 6)*8;
+		u32 status = I915_READ(reg);
+		if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED && 0)) {
+			DRM_ERROR("Pre-empted request %x %s Lite Restore\n",
+				  I915_READ(reg + 4),
+				  status & GEN8_CTX_STATUS_LITE_RESTORE ? "with" : "without");
 		}
+		if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
+			      GEN8_CTX_STATUS_ELEMENT_SWITCH))
+			seqno = I915_READ(reg + 4);
 	}
 
-	if (submit_contexts != 0)
+	spin_lock(&ring->execlist_lock);
+	if (execlists_complete_requests(ring, seqno))
 		execlists_context_unqueue(ring);
-
 	spin_unlock(&ring->execlist_lock);
 
-	WARN(submit_contexts > 2, "More than two context complete events?\n");
-	ring->next_context_status_buffer = write_pointer % 6;
-
+	ring->next_context_status_buffer = tail % 6;
 	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
 		   ((u32)ring->next_context_status_buffer & 0x07) << 8);
 }
@@ -467,9 +410,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   u32 tail,
 				   struct drm_i915_gem_request *request)
 {
-	struct drm_i915_gem_request *cursor;
-	int num_elements = 0;
-
 	if (WARN_ON(request == NULL))
 		return -ENODEV;
 
@@ -483,28 +423,8 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	spin_lock_irq(&ring->execlist_lock);
 
-	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
-		if (++num_elements > 2)
-			break;
-
-	if (num_elements > 2) {
-		struct drm_i915_gem_request *tail_req;
-
-		tail_req = list_last_entry(&ring->execlist_queue,
-					   struct drm_i915_gem_request,
-					   execlist_link);
-
-		if (to == tail_req->ctx) {
-			WARN(tail_req->elsp_submitted != 0,
-				"More than 2 already-submitted reqs queued\n");
-			list_del(&tail_req->execlist_link);
-			list_add_tail(&tail_req->execlist_link,
-				&ring->execlist_retired_req_list);
-		}
-	}
-
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
-	if (num_elements == 0)
+	if (ring->execlist_port[0] == NULL)
 		execlists_context_unqueue(ring);
 
 	spin_unlock_irq(&ring->execlist_lock);
@@ -693,11 +613,11 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 	struct list_head list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (list_empty(&ring->execlist_retired_req_list))
+	if (list_empty(&ring->execlist_completed))
 		return;
 
 	spin_lock_irq(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &list);
+	list_replace_init(&ring->execlist_completed, &list);
 	spin_unlock_irq(&ring->execlist_lock);
 
 	while (!list_empty(&list)) {
@@ -789,6 +709,11 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
+	if (WARN_ON(i915_gem_obj_ggtt_offset(ctx_obj) & 0xFFFFFFFF00000FFFULL)) {
+		ret = -ENODEV;
+		goto unpin_ctx_obj;
+	}
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
@@ -1326,7 +1251,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
+	INIT_LIST_HEAD(&ring->execlist_completed);
 	spin_lock_init(&ring->execlist_lock);
 
 	ret = i915_cmd_parser_init_ring(ring);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 1d24d4f963f1..03e69c8636b0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -83,7 +83,6 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct list_head *vmas,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 dispatch_flags);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2477cf3e3906..870a1d008db9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -239,8 +239,9 @@ struct  intel_engine_cs {
 
 	/* Execlists */
 	spinlock_t execlist_lock;
+	struct drm_i915_gem_request *execlist_port[2];
 	struct list_head execlist_queue;
-	struct list_head execlist_retired_req_list;
+	struct list_head execlist_completed;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 25/49] drm/i915: Move the execlists retirement to the right spot
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (23 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 24/49] drm/i915: Tidy execlist submission Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 26/49] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

We want to run the execlists retire-ring callback whilst we retire the
requests on a particular ring. Having done so, we know that the per-ring
request list is the superset of all requests and so can simplify the
is-idle check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index db4a53f248a2..5366162e4983 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2720,6 +2720,9 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	if (list_empty(&ring->active_list))
 		return;
 
+	if (i915.enable_execlists)
+		intel_execlists_retire_requests(ring);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2781,15 +2784,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
-		if (i915.enable_execlists) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&ring->execlist_lock, flags);
-			idle &= list_empty(&ring->execlist_queue);
-			spin_unlock_irqrestore(&ring->execlist_lock, flags);
-
-			intel_execlists_retire_requests(ring);
-		}
 	}
 
 	if (idle)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 26/49] drm/i915: Map the ringbuffer using WB on LLC machines
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (24 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 25/49] drm/i915: Move the execlists retirement to the right spot Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 11:01 ` [PATCH 27/49] drm/i915: Use a separate slab for requests Chris Wilson
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

If we have llc coherency, we can write directly into the ringbuffer
using ordinary cached writes rather than forcing WC access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 60 +++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7e3281de417c..2e5c39123d24 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1896,11 +1896,35 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
-	iounmap(ringbuf->virtual_start);
+	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
+		vunmap(ringbuf->virtual_start);
+	else
+		iounmap(ringbuf->virtual_start);
 	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
+static u32 *vmap_obj(struct drm_i915_gem_object *obj)
+{
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+	void *addr;
+	int i;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL)
+		return NULL;
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
+		pages[i++] = sg_page_iter_page(&sg_iter);
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	drm_free_large(pages);
+
+	return addr;
+}
+
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 				     struct intel_ringbuffer *ringbuf)
 {
@@ -1912,17 +1936,31 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 	if (ret)
 		return ret;
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret) {
-		i915_gem_object_ggtt_unpin(obj);
-		return ret;
-	}
+	if (HAS_LLC(dev_priv) && !obj->stolen) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+		if (ret) {
+			i915_gem_object_ggtt_unpin(obj);
+			return ret;
+		}
 
-	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
-			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		i915_gem_object_ggtt_unpin(obj);
-		return -EINVAL;
+		ringbuf->virtual_start = vmap_obj(obj);
+		if (ringbuf->virtual_start == NULL) {
+			i915_gem_object_ggtt_unpin(obj);
+			return -ENOMEM;
+		}
+	} else {
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		if (ret) {
+			i915_gem_object_ggtt_unpin(obj);
+			return ret;
+		}
+
+		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
+						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
+		if (ringbuf->virtual_start == NULL) {
+			i915_gem_object_ggtt_unpin(obj);
+			return -EINVAL;
+		}
 	}
 
 	return 0;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 27/49] drm/i915: Use a separate slab for requests
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (25 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 26/49] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
@ 2015-03-27 11:01 ` Chris Wilson
  2015-03-27 14:20   ` Daniel Vetter
  2015-03-27 11:02 ` [PATCH 28/49] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
                   ` (21 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:01 UTC (permalink / raw)
  To: intel-gfx

requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c         | 12 ++++++++----
 drivers/gpu/drm/i915/i915_drv.h         |  5 ++++-
 drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.c        | 11 +++++------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++-----
 5 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 8f5428b46a27..180b5d92b279 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1006,8 +1006,10 @@ out_regs:
 put_bridge:
 	pci_dev_put(dev_priv->bridge_dev);
 free_priv:
-	if (dev_priv->slab)
-		kmem_cache_destroy(dev_priv->slab);
+	if (dev_priv->requests)
+		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->objects)
+		kmem_cache_destroy(dev_priv->objects);
 	kfree(dev_priv);
 	return ret;
 }
@@ -1090,8 +1092,10 @@ int i915_driver_unload(struct drm_device *dev)
 	if (dev_priv->regs != NULL)
 		pci_iounmap(dev->pdev, dev_priv->regs);
 
-	if (dev_priv->slab)
-		kmem_cache_destroy(dev_priv->slab);
+	if (dev_priv->requests)
+		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->objects)
+		kmem_cache_destroy(dev_priv->objects);
 
 	pci_dev_put(dev_priv->bridge_dev);
 	kfree(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ee51540e169a..b728250d6550 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1556,7 +1556,8 @@ struct i915_virtual_gpu {
 
 struct drm_i915_private {
 	struct drm_device *dev;
-	struct kmem_cache *slab;
+	struct kmem_cache *objects;
+	struct kmem_cache *requests;
 
 	const struct intel_device_info info;
 
@@ -2052,6 +2053,7 @@ struct drm_i915_gem_request {
 	struct kref ref;
 
 	/** On Which ring this request was generated */
+	struct drm_i915_private *i915;
 	struct intel_engine_cs *ring;
 
 	/** GEM sequence number associated with this request. */
@@ -2118,6 +2120,7 @@ struct drm_i915_gem_request {
 	struct list_head execlist_link;
 };
 
+struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i915);
 void i915_gem_request_free(struct kref *req_ref);
 
 static inline uint32_t
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5366162e4983..900cbe17c49a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -381,13 +381,13 @@ out:
 void *i915_gem_object_alloc(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	return kmem_cache_zalloc(dev_priv->slab, GFP_KERNEL);
+	return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
 }
 
 void i915_gem_object_free(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	kmem_cache_free(dev_priv->slab, obj);
+	kmem_cache_free(dev_priv->objects, obj);
 }
 
 static int
@@ -2567,6 +2567,19 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
+struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i915)
+{
+	struct drm_i915_gem_request *rq;
+
+	rq = kmem_cache_zalloc(i915->requests, GFP_KERNEL);
+	if (rq == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&rq->ref);
+	rq->i915 = i915;
+	return rq;
+}
+
 void i915_gem_request_free(struct kref *req_ref)
 {
 	struct drm_i915_gem_request *req = container_of(req_ref,
@@ -2577,7 +2590,7 @@ void i915_gem_request_free(struct kref *req_ref)
 		i915_gem_context_unreference(ctx);
 	}
 
-	kfree(req);
+	kmem_cache_free(req->i915->requests, req);
 }
 
 struct drm_i915_gem_request *
@@ -5110,11 +5123,16 @@ i915_gem_load(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
-	dev_priv->slab =
+	dev_priv->objects =
 		kmem_cache_create("i915_gem_object",
 				  sizeof(struct drm_i915_gem_object), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
+	dev_priv->requests =
+		kmem_cache_create("i915_gem_request",
+				  sizeof(struct drm_i915_gem_request), 0,
+				  SLAB_HWCACHE_ALIGN,
+				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->vm_list);
 	i915_init_vm(dev_priv, &dev_priv->gtt.base);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a013239f5e26..5e51ed5232e8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -742,24 +742,23 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 	if (ring->outstanding_lazy_request)
 		return 0;
 
-	request = kzalloc(sizeof(*request), GFP_KERNEL);
-	if (request == NULL)
-		return -ENOMEM;
+	request = i915_gem_request_alloc(dev_private);
+	if (IS_ERR(request))
+		return PTR_ERR(request);
 
 	ret = intel_lr_context_pin(ring, ctx);
 	if (ret) {
-		kfree(request);
+		i915_gem_request_free(&request->ref);
 		return ret;
 	}
 
-	kref_init(&request->ref);
 	request->ring = ring;
 	request->uniq = dev_private->request_uniq++;
 
 	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
 	if (ret) {
 		intel_lr_context_unpin(ring, ctx);
-		kfree(request);
+		i915_gem_request_free(&request->ref);
 		return ret;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2e5c39123d24..f7097a80dea3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2179,18 +2179,17 @@ intel_ring_alloc_request(struct intel_engine_cs *ring)
 	if (ring->outstanding_lazy_request)
 		return 0;
 
-	request = kzalloc(sizeof(*request), GFP_KERNEL);
-	if (request == NULL)
-		return -ENOMEM;
+	request = i915_gem_request_alloc(dev_private);
+	if (IS_ERR(request))
+		return PTR_ERR(request);
 
-	kref_init(&request->ref);
 	request->ring = ring;
 	request->ringbuf = ring->buffer;
 	request->uniq = dev_private->request_uniq++;
 
 	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
 	if (ret) {
-		kfree(request);
+		i915_gem_request_free(&request->ref);
 		return ret;
 	}
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 28/49] drm/i915: Use the new rq->i915 field where appropriate
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (26 preceding siblings ...)
  2015-03-27 11:01 ` [PATCH 27/49] drm/i915: Use a separate slab for requests Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

In a few cases, having a direct pointer to the drm_i915_private from the
request is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
 drivers/gpu/drm/i915/intel_pm.c |  2 +-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 900cbe17c49a..721213c7d9d0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1227,8 +1227,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			struct drm_i915_file_private *file_priv)
 {
 	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = req->i915;
 	const bool irq_test_in_progress =
 		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
 	DEFINE_WAIT(wait);
@@ -1247,7 +1246,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev_priv)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
 	/* Record current time in case interrupted by signal, or wedged */
@@ -1397,18 +1396,16 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
 int
 i915_wait_request(struct drm_i915_gem_request *req)
 {
-	struct drm_device *dev;
 	struct drm_i915_private *dev_priv;
 	bool interruptible;
 	int ret;
 
 	BUG_ON(req == NULL);
 
-	dev = req->ring->dev;
-	dev_priv = dev->dev_private;
+	dev_priv = req->i915;
 	interruptible = dev_priv->mm.interruptible;
 
-	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index bcb86cdd1be5..0f1242c9d29b 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6789,7 +6789,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 
 	if (!i915_gem_request_completed(boost->rq, true))
-		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
+		gen6_rps_boost(boost->rq->i915, NULL);
 
 	i915_gem_request_unreference__unlocked(boost->rq);
 	kfree(boost);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (27 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 28/49] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 14:26   ` Daniel Vetter
  2015-03-27 11:02 ` [PATCH 30/49] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
                   ` (19 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     | 4 +---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 ++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b728250d6550..209c9b612509 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2880,9 +2880,7 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
 static inline bool i915_is_ggtt(struct i915_address_space *vm)
 {
-	struct i915_address_space *ggtt =
-		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
-	return vm == ggtt;
+	return vm->is_ggtt;
 }
 
 static inline struct i915_hw_ppgtt *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e33b1214c4d8..68c1f49f2864 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2504,6 +2504,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
 		return ret;
 
 	gtt->base.dev = dev;
+	gtt->base.is_ggtt = true;
 
 	/* GMADR is the PCI mmio aperture into the global GTT. */
 	DRM_INFO("Memory usable by graphics device = %zdM\n",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3f0ad9f25441..20398a18a8a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -234,6 +234,8 @@ struct i915_address_space {
 	unsigned long start;		/* Start offset always 0 for dri2 */
 	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
 
+	bool is_ggtt;
+
 	struct {
 		dma_addr_t addr;
 		struct page *page;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 30/49] drm/i915: Squash more pointer indirection for i915_is_gtt
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (28 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 31/49] drm/i915: Reduce locking in execlist command submission Chris Wilson
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

12:58 < jlahtine> there're actually equally many i915_is_ggtt(vma->vm)
calls
12:58 < jlahtine> (one less)
12:59 < jlahtine> so while at it I'd make it vm->is_ggtt and
vma->is_ggtt
12:59 < jlahtine> then get rid of the whole helper, maybe
13:00 < ickle> you preempted my beautiful macro
13:03 < ickle> just don't complain about the increased churn

* to be squashed into the previous patch if desired
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h            |  7 +------
 drivers/gpu/drm/i915/i915_gem.c            | 32 ++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 21 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/i915_trace.h          | 18 ++++++-----------
 8 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 21e2d67d3e23..fc0d1c8aa117 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -156,7 +156,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!i915_is_ggtt(vma->vm))
+		if (!vma->is_ggtt)
 			seq_puts(m, " (pp");
 		else
 			seq_puts(m, " (g");
@@ -335,7 +335,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			if (i915_is_ggtt(vma->vm)) {
+			if (vma->is_ggtt) {
 				stats->global += obj->base.size;
 				continue;
 			}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 209c9b612509..0c6e4356fa06 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2878,16 +2878,11 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 /* Some GGTT VM helpers */
 #define i915_obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
-static inline bool i915_is_ggtt(struct i915_address_space *vm)
-{
-	return vm->is_ggtt;
-}
 
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
-	WARN_ON(i915_is_ggtt(vm));
-
+	WARN_ON(vm->is_ggtt);
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 721213c7d9d0..91b0c1db05ca 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3118,8 +3118,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	 * cause memory corruption through use-after-free.
 	 */
 
-	if (i915_is_ggtt(vma->vm) &&
-	    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
+	if (vma->is_ggtt && vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 		i915_gem_object_finish_gtt(obj);
 
 		/* release the fence reg _after_ flushing */
@@ -3133,7 +3132,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	vma->unbind_vma(vma);
 
 	list_del_init(&vma->mm_list);
-	if (i915_is_ggtt(vma->vm)) {
+	if (vma->is_ggtt) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 			obj->map_and_fenceable = false;
 		} else if (vma->ggtt_view.pages) {
@@ -3576,7 +3575,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	if(WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
 
 	fence_size = i915_gem_get_gtt_size(dev,
@@ -3674,8 +3673,7 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
-		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
-				VM_TO_TRACE_NAME(vma->vm));
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size);
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
@@ -4278,13 +4276,13 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
 		return -ENODEV;
 
-	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
+	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !vm->is_ggtt))
 		return -EINVAL;
 
 	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
 		return -EINVAL;
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return -EINVAL;
 
 	vma = ggtt_view ? i915_gem_obj_to_ggtt_view(obj, ggtt_view) :
@@ -4374,7 +4372,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    uint64_t flags)
 {
 	return i915_gem_object_do_pin(obj, vm,
-				      i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL,
+				      vm->is_ggtt ? &i915_ggtt_view_normal : NULL,
 				      size, alignment, flags);
 }
 
@@ -4706,7 +4704,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -4741,7 +4739,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
 
 	vm = vma->vm;
 
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
 
 	list_del(&vma->vma_link);
@@ -5105,7 +5103,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 void i915_init_vm(struct drm_i915_private *dev_priv,
 		  struct i915_address_space *vm)
 {
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		drm_mm_init(&vm->mm, vm->start, vm->total);
 	vm->dev = dev_priv->dev;
 	INIT_LIST_HEAD(&vm->active_list);
@@ -5265,7 +5263,7 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -5273,7 +5271,7 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	}
 
 	WARN(1, "%s vma for this object not found.\n",
-	     i915_is_ggtt(vm) ? "global" : "ppgtt");
+	     vm->is_ggtt ? "global" : "ppgtt");
 	return -1;
 }
 
@@ -5298,7 +5296,7 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
@@ -5345,7 +5343,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 	BUG_ON(list_empty(&o->vma_list));
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -5358,7 +5356,7 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 {
 	struct i915_vma *vma;
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->pin_count > 0)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c30334435e8e..9345436e4d95 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -642,7 +642,7 @@ need_reloc_mappable(struct i915_vma *vma)
 	if (entry->relocation_count == 0)
 		return false;
 
-	if (!i915_is_ggtt(vma->vm))
+	if (!vma->is_ggtt)
 		return false;
 
 	/* See also use_cpu_reloc() */
@@ -661,8 +661,7 @@ eb_vma_misplaced(struct i915_vma *vma)
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-	       !i915_is_ggtt(vma->vm));
+	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->is_ggtt);
 
 	if (entry->alignment &&
 	    vma->node.start & (entry->alignment - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 68c1f49f2864..543fff104401 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1706,7 +1706,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 					container_of(vm, struct i915_hw_ppgtt,
 						     base);
 
-			if (i915_is_ggtt(vm))
+			if (vm->is_ggtt)
 				ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 			gen6_write_page_range(dev_priv, &ppgtt->pd,
@@ -1884,7 +1884,7 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
-	BUG_ON(!i915_is_ggtt(vma->vm));
+	BUG_ON(!vma->is_ggtt);
 	intel_gtt_insert_sg_entries(vma->ggtt_view.pages, entry, flags);
 	vma->bound = GLOBAL_BIND;
 }
@@ -1904,7 +1904,7 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	const unsigned int first = vma->node.start >> PAGE_SHIFT;
 	const unsigned int size = vma->obj->base.size >> PAGE_SHIFT;
 
-	BUG_ON(!i915_is_ggtt(vma->vm));
+	BUG_ON(!vma->is_ggtt);
 	vma->bound = 0;
 	intel_gtt_clear_range(first, size);
 }
@@ -1922,7 +1922,7 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	if (obj->gt_ro)
 		flags |= PTE_READ_ONLY;
 
-	if (i915_is_ggtt(vma->vm))
+	if (vma->is_ggtt)
 		pages = vma->ggtt_view.pages;
 
 	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
@@ -2534,7 +2534,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
 	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	if (vma == NULL)
@@ -2545,9 +2545,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
+	vma->is_ggtt = vm->is_ggtt;
 
 	if (INTEL_INFO(vm->dev)->gen >= 6) {
-		if (i915_is_ggtt(vm)) {
+		if (vm->is_ggtt) {
 			vma->ggtt_view = *ggtt_view;
 
 			vma->unbind_vma = ggtt_unbind_vma;
@@ -2557,14 +2558,14 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 			vma->bind_vma = ppgtt_bind_vma;
 		}
 	} else {
-		BUG_ON(!i915_is_ggtt(vm));
+		BUG_ON(!vm->is_ggtt);
 		vma->ggtt_view = *ggtt_view;
 		vma->unbind_vma = i915_ggtt_unbind_vma;
 		vma->bind_vma = i915_ggtt_bind_vma;
 	}
 
 	list_add_tail(&vma->vma_link, &obj->vma_list);
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	return vma;
@@ -2579,7 +2580,7 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 	vma = i915_gem_obj_to_vma(obj, vm);
 	if (!vma)
 		vma = __i915_gem_vma_create(obj, vm,
-					    i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL);
+					    vm->is_ggtt ? &i915_ggtt_view_normal : NULL);
 
 	return vma;
 }
@@ -2750,7 +2751,7 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
 int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		  u32 flags)
 {
-	if (i915_is_ggtt(vma->vm)) {
+	if (vma->is_ggtt) {
 		int ret = i915_get_ggtt_vma_pages(vma);
 
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 20398a18a8a6..dafb3b0da466 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -159,6 +159,7 @@ struct i915_vma {
 #define LOCAL_BIND	(1<<1)
 #define PTE_READ_ONLY	(1<<2)
 	unsigned int bound : 4;
+	unsigned is_ggtt : 1;
 
 	/**
 	 * Support different GGTT views into the same object.
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 17dc2fcaba10..8832f1b2a495 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -606,7 +606,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 		dst->gtt_offset = -1;
 
 	reloc_offset = dst->gtt_offset;
-	if (i915_is_ggtt(vm))
+	if (vm->is_ggtt)
 		vma = i915_gem_obj_to_ggtt(src);
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   vma && (vma->bound & GLOBAL_BIND) &&
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97483e21c9b4..ce8ee9e8bced 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,35 +156,29 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-#define VM_TO_TRACE_NAME(vm) \
-	(i915_is_ggtt(vm) ? "G" : \
-		      "P")
-
 DECLARE_EVENT_CLASS(i915_va,
-	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	TP_ARGS(vm, start, length, name),
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	TP_ARGS(vm, start, length),
 
 	TP_STRUCT__entry(
 		__field(struct i915_address_space *, vm)
 		__field(u64, start)
 		__field(u64, end)
-		__string(name, name)
 	),
 
 	TP_fast_assign(
 		__entry->vm = vm;
 		__entry->start = start;
 		__entry->end = start + length - 1;
-		__assign_str(name, name);
 	),
 
-	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
-		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+	TP_printk("vm=%p (%c), 0x%llx-0x%llx",
+		  __entry->vm, __entry->vm->is_ggtt ? 'G' : 'P',  __entry->start, __entry->end)
 );
 
 DEFINE_EVENT(i915_va, i915_va_alloc,
-	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	     TP_ARGS(vm, start, length, name)
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
 );
 
 DECLARE_EVENT_CLASS(i915_page_table_entry,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 31/49] drm/i915: Reduce locking in execlist command submission
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (29 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 30/49] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:40   ` Tvrtko Ursulin
  2015-03-27 11:02 ` [PATCH 32/49] drm/i915: Reduce more " Chris Wilson
                   ` (17 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

This eliminates six needless spin lock/unlock pairs when writing out
ELSP.

v2: Respin with my preferred colour.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [v2]
---
 drivers/gpu/drm/i915/i915_drv.h     | 18 ++++++++
 drivers/gpu/drm/i915/intel_lrc.c    | 14 +++---
 drivers/gpu/drm/i915/intel_uncore.c | 86 ++++++++++++++++++++++++++-----------
 3 files changed, 86 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0c6e4356fa06..4b51169c37ea 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2540,6 +2540,13 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 				enum forcewake_domains domains);
 void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 				enum forcewake_domains domains);
+/* Like above but take and hold the uncore lock for the duration.
+ * Must be used with I915_READ_FW and friends.
+ */
+void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
+				enum forcewake_domains domains);
+void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
+				   enum forcewake_domains domains);
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv);
 static inline bool intel_vgpu_active(struct drm_device *dev)
 {
@@ -3232,6 +3239,17 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
 #define POSTING_READ(reg)	(void)I915_READ_NOTRACE(reg)
 #define POSTING_READ16(reg)	(void)I915_READ16_NOTRACE(reg)
 
+/* These are untraced mmio-accessors that are only valid to be used inside
+ * criticial sections inside IRQ handlers where forcewake is explicitly
+ * controlled.
+ * Think twice, and think again, before using these.
+ * Note: Should only be used between intel_uncore_forcewake_irqlock() and
+ * intel_uncore_forcewake_irqunlock().
+ */
+#define I915_READ_FW(reg__) readl(dev_priv->regs + (reg__))
+#define I915_WRITE_FW(reg__, val__) writel(val__, dev_priv->regs + (reg__))
+#define POSTING_READ_FW(reg__) (void)I915_READ_FW(reg__)
+
 /* "Broadcast RGB" property */
 #define INTEL_BROADCAST_RGB_AUTO 0
 #define INTEL_BROADCAST_RGB_FULL 1
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5e51ed5232e8..454bb7df27fe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -278,17 +278,17 @@ static void execlists_submit_pair(struct intel_engine_cs *ring)
 	desc[3] = ring->execlist_port[0]->seqno;
 
 	/* Note: You must always write both descriptors in the order below. */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-	I915_WRITE(RING_ELSP(ring), desc[1]);
-	I915_WRITE(RING_ELSP(ring), desc[0]);
-	I915_WRITE(RING_ELSP(ring), desc[3]);
+	intel_uncore_forcewake_irqlock(dev_priv, FORCEWAKE_ALL);
+	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
 
 	/* The context is automatically loaded after the following */
-	I915_WRITE(RING_ELSP(ring), desc[2]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
 
 	/* ELSP is a wo register, use another nearby reg for posting instead */
-	POSTING_READ(RING_EXECLIST_STATUS(ring));
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	POSTING_READ_FW(RING_EXECLIST_STATUS(ring));
+	intel_uncore_forcewake_irqunlock(dev_priv, FORCEWAKE_ALL);
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 0e32bbbcada8..a063f7d9f31b 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -399,6 +399,26 @@ void intel_uncore_sanitize(struct drm_device *dev)
 	intel_disable_gt_powersave(dev);
 }
 
+static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
+					 enum forcewake_domains fw_domains)
+{
+	struct intel_uncore_forcewake_domain *domain;
+	enum forcewake_domain_id id;
+
+	if (!dev_priv->uncore.funcs.force_wake_get)
+		return;
+
+	fw_domains &= dev_priv->uncore.fw_domains;
+
+	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+		if (domain->wake_count++)
+			fw_domains &= ~(1 << id);
+	}
+
+	if (fw_domains)
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
+}
+
 /**
  * intel_uncore_forcewake_get - grab forcewake domain references
  * @dev_priv: i915 device instance
@@ -416,41 +436,30 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 				enum forcewake_domains fw_domains)
 {
 	unsigned long irqflags;
-	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
 	WARN_ON(dev_priv->pm.suspended);
 
-	fw_domains &= dev_priv->uncore.fw_domains;
-
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	__intel_uncore_forcewake_get(dev_priv, fw_domains);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
 
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
-		if (domain->wake_count++)
-			fw_domains &= ~(1 << id);
-	}
-
-	if (fw_domains)
-		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
+void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
+				    enum forcewake_domains fw_domains)
+{
+	if (!dev_priv->uncore.funcs.force_wake_get)
+		return;
 
-	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+	spin_lock(&dev_priv->uncore.lock);
+	__intel_uncore_forcewake_get(dev_priv, fw_domains);
 }
 
-/**
- * intel_uncore_forcewake_put - release a forcewake domain reference
- * @dev_priv: i915 device instance
- * @fw_domains: forcewake domains to put references
- *
- * This function drops the device-level forcewakes for specified
- * domains obtained by intel_uncore_forcewake_get().
- */
-void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
-				enum forcewake_domains fw_domains)
+static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
+					 enum forcewake_domains fw_domains)
 {
-	unsigned long irqflags;
 	struct intel_uncore_forcewake_domain *domain;
 	enum forcewake_domain_id id;
 
@@ -459,8 +468,6 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
-
 	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
 		if (WARN_ON(domain->wake_count == 0))
 			continue;
@@ -471,10 +478,39 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
+}
+
+/**
+ * intel_uncore_forcewake_put - release a forcewake domain reference
+ * @dev_priv: i915 device instance
+ * @fw_domains: forcewake domains to put references
+ *
+ * This function drops the device-level forcewakes for specified
+ * domains obtained by intel_uncore_forcewake_get().
+ */
+void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
+				enum forcewake_domains fw_domains)
+{
+	unsigned long irqflags;
+
+	if (!dev_priv->uncore.funcs.force_wake_put)
+		return;
 
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	__intel_uncore_forcewake_put(dev_priv, fw_domains);
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
+void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
+				      enum forcewake_domains fw_domains)
+{
+	if (!dev_priv->uncore.funcs.force_wake_put)
+		return;
+
+	__intel_uncore_forcewake_put(dev_priv, fw_domains);
+	spin_unlock(&dev_priv->uncore.lock);
+}
+
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *domain;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 32/49] drm/i915: Reduce more locking in execlist command submission
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (30 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 31/49] drm/i915: Reduce locking in execlist command submission Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Slightly more extravagant than the previous patch is to use the
I915_READ_FW() registers for all the bounded reads in
intel_lrc_irq_handler - for even more spinlock reduction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 454bb7df27fe..1c768c05e52e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -278,17 +278,12 @@ static void execlists_submit_pair(struct intel_engine_cs *ring)
 	desc[3] = ring->execlist_port[0]->seqno;
 
 	/* Note: You must always write both descriptors in the order below. */
-	intel_uncore_forcewake_irqlock(dev_priv, FORCEWAKE_ALL);
 	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
 	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
 	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
 
 	/* The context is automatically loaded after the following */
 	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
-
-	/* ELSP is a wo register, use another nearby reg for posting instead */
-	POSTING_READ_FW(RING_EXECLIST_STATUS(ring));
-	intel_uncore_forcewake_irqunlock(dev_priv, FORCEWAKE_ALL);
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
@@ -377,32 +372,36 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 	u8 head, tail;
 	u32 seqno = 0;
 
+	spin_lock(&ring->execlist_lock);
+	intel_uncore_forcewake_irqlock(dev_priv, FORCEWAKE_ALL);
+
 	head = ring->next_context_status_buffer;
-	tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
+	tail = I915_READ_FW(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
 	if (head > tail)
 		tail += 6;
 
 	while (head++ < tail) {
 		u32 reg = RING_CONTEXT_STATUS_BUF(ring) + (head % 6)*8;
-		u32 status = I915_READ(reg);
+		u32 status = I915_READ_FW(reg);
 		if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED && 0)) {
 			DRM_ERROR("Pre-empted request %x %s Lite Restore\n",
-				  I915_READ(reg + 4),
+				  I915_READ_FW(reg + 4),
 				  status & GEN8_CTX_STATUS_LITE_RESTORE ? "with" : "without");
 		}
 		if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
 			      GEN8_CTX_STATUS_ELEMENT_SWITCH))
-			seqno = I915_READ(reg + 4);
+			seqno = I915_READ_FW(reg + 4);
 	}
 
-	spin_lock(&ring->execlist_lock);
 	if (execlists_complete_requests(ring, seqno))
 		execlists_context_unqueue(ring);
-	spin_unlock(&ring->execlist_lock);
 
 	ring->next_context_status_buffer = tail % 6;
-	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
-		   ((u32)ring->next_context_status_buffer & 0x07) << 8);
+	I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring),
+		      ((u32)ring->next_context_status_buffer & 0x07) << 8);
+
+	intel_uncore_forcewake_irqunlock(dev_priv, FORCEWAKE_ALL);
+	spin_unlock(&ring->execlist_lock);
 }
 
 static int execlists_context_queue(struct intel_engine_cs *ring,
@@ -424,8 +423,13 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
-	if (ring->execlist_port[0] == NULL)
+	if (ring->execlist_port[0] == NULL) {
+		intel_uncore_forcewake_irqlock(to_i915(ring->dev),
+					       FORCEWAKE_ALL);
 		execlists_context_unqueue(ring);
+		intel_uncore_forcewake_irqunlock(to_i915(ring->dev),
+						 FORCEWAKE_ALL);
+	}
 
 	spin_unlock_irq(&ring->execlist_lock);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (31 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 32/49] drm/i915: Reduce more " Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 14:13   ` Daniel Vetter
  2015-03-27 11:02 ` [PATCH 34/49] drm/i915: Tidy " Chris Wilson
                   ` (15 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Similar in vain in reducing the number of unrequired spinlocks used for
execlist command submission (where the forcewake is required but
manually controlled), we know that the IRQ registers are outside of the
powerwell and so we can access them directly. Since we now have direct
access exported via I915_READ_FW/I915_WRITE_FW, lets put those to use in
the irq handlers as well.

In the process, reorder the execlist submission to happen as early as
possible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 63 ++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8b5e0358c592..da3b76b9ebd9 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1285,56 +1285,56 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	irqreturn_t ret = IRQ_NONE;
 
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
-		tmp = I915_READ(GEN8_GT_IIR(0));
+		tmp = I915_READ_FW(GEN8_GT_IIR(0));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(0), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
 
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
 			ring = &dev_priv->ring[RCS];
-			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (rcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 
 			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
 			ring = &dev_priv->ring[BCS];
-			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (bcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
 
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
-		tmp = I915_READ(GEN8_GT_IIR(1));
+		tmp = I915_READ_FW(GEN8_GT_IIR(1));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(1), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
 
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
 			ring = &dev_priv->ring[VCS];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
 			ring = &dev_priv->ring[VCS2];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
 
 	if (master_ctl & GEN8_GT_PM_IRQ) {
-		tmp = I915_READ(GEN8_GT_IIR(2));
+		tmp = I915_READ_FW(GEN8_GT_IIR(2));
 		if (tmp & dev_priv->pm_rps_events) {
-			I915_WRITE(GEN8_GT_IIR(2),
-				   tmp & dev_priv->pm_rps_events);
+			I915_WRITE_FW(GEN8_GT_IIR(2),
+				      tmp & dev_priv->pm_rps_events);
 			ret = IRQ_HANDLED;
 			gen6_rps_irq_handler(dev_priv, tmp);
 		} else
@@ -1342,17 +1342,17 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	}
 
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
-		tmp = I915_READ(GEN8_GT_IIR(3));
+		tmp = I915_READ_FW(GEN8_GT_IIR(3));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(3), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
 			ret = IRQ_HANDLED;
 
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
 			ring = &dev_priv->ring[VECS];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
@@ -2178,22 +2178,21 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		aux_mask |=  GEN9_AUX_CHANNEL_B | GEN9_AUX_CHANNEL_C |
 			GEN9_AUX_CHANNEL_D;
 
-	master_ctl = I915_READ(GEN8_MASTER_IRQ);
+	master_ctl = I915_READ_FW(GEN8_MASTER_IRQ);
 	master_ctl &= ~GEN8_MASTER_IRQ_CONTROL;
 	if (!master_ctl)
 		return IRQ_NONE;
 
-	I915_WRITE(GEN8_MASTER_IRQ, 0);
-	POSTING_READ(GEN8_MASTER_IRQ);
+	I915_WRITE_FW(GEN8_MASTER_IRQ, 0);
 
 	/* Find, clear, then process each source of interrupt */
 
 	ret = gen8_gt_irq_handler(dev, dev_priv, master_ctl);
 
 	if (master_ctl & GEN8_DE_MISC_IRQ) {
-		tmp = I915_READ(GEN8_DE_MISC_IIR);
+		tmp = I915_READ_FW(GEN8_DE_MISC_IIR);
 		if (tmp) {
-			I915_WRITE(GEN8_DE_MISC_IIR, tmp);
+			I915_WRITE_FW(GEN8_DE_MISC_IIR, tmp);
 			ret = IRQ_HANDLED;
 			if (tmp & GEN8_DE_MISC_GSE)
 				intel_opregion_asle_intr(dev);
@@ -2205,9 +2204,9 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 	}
 
 	if (master_ctl & GEN8_DE_PORT_IRQ) {
-		tmp = I915_READ(GEN8_DE_PORT_IIR);
+		tmp = I915_READ_FW(GEN8_DE_PORT_IIR);
 		if (tmp) {
-			I915_WRITE(GEN8_DE_PORT_IIR, tmp);
+			I915_WRITE_FW(GEN8_DE_PORT_IIR, tmp);
 			ret = IRQ_HANDLED;
 
 			if (tmp & aux_mask)
@@ -2225,10 +2224,10 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		if (!(master_ctl & GEN8_DE_PIPE_IRQ(pipe)))
 			continue;
 
-		pipe_iir = I915_READ(GEN8_DE_PIPE_IIR(pipe));
+		pipe_iir = I915_READ_FW(GEN8_DE_PIPE_IIR(pipe));
 		if (pipe_iir) {
 			ret = IRQ_HANDLED;
-			I915_WRITE(GEN8_DE_PIPE_IIR(pipe), pipe_iir);
+			I915_WRITE_FW(GEN8_DE_PIPE_IIR(pipe), pipe_iir);
 
 			if (pipe_iir & GEN8_PIPE_VBLANK &&
 			    intel_pipe_handle_vblank(dev, pipe))
@@ -2271,9 +2270,9 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		 * scheme also closed the SDE interrupt handling race we've seen
 		 * on older pch-split platforms. But this needs testing.
 		 */
-		u32 pch_iir = I915_READ(SDEIIR);
+		u32 pch_iir = I915_READ_FW(SDEIIR);
 		if (pch_iir) {
-			I915_WRITE(SDEIIR, pch_iir);
+			I915_WRITE_FW(SDEIIR, pch_iir);
 			ret = IRQ_HANDLED;
 			cpt_irq_handler(dev, pch_iir);
 		} else
@@ -2281,8 +2280,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 
 	}
 
-	I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
-	POSTING_READ(GEN8_MASTER_IRQ);
+	I915_WRITE_FW(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
+	POSTING_READ_FW(GEN8_MASTER_IRQ);
 
 	return ret;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 34/49] drm/i915: Tidy gen8 IRQ handler
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (32 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 35/49] drm/i915: Remove request retirement before each batch Chris Wilson
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Remove some needless variables and parameter passing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 113 +++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index da3b76b9ebd9..d5679297616c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -985,8 +985,7 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
 	return;
 }
 
-static void notify_ring(struct drm_device *dev,
-			struct intel_engine_cs *ring)
+static void notify_ring(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
 		return;
@@ -1248,9 +1247,9 @@ static void ilk_gt_irq_handler(struct drm_device *dev,
 {
 	if (gt_iir &
 	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
-		notify_ring(dev, &dev_priv->ring[RCS]);
+		notify_ring(&dev_priv->ring[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[VCS]);
+		notify_ring(&dev_priv->ring[VCS]);
 }
 
 static void snb_gt_irq_handler(struct drm_device *dev,
@@ -1260,11 +1259,11 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 
 	if (gt_iir &
 	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
-		notify_ring(dev, &dev_priv->ring[RCS]);
+		notify_ring(&dev_priv->ring[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[VCS]);
+		notify_ring(&dev_priv->ring[VCS]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[BCS]);
+		notify_ring(&dev_priv->ring[BCS]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -1275,63 +1274,65 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 		ivybridge_parity_error_irq_handler(dev, gt_iir);
 }
 
-static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
-				       struct drm_i915_private *dev_priv,
+static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
-	struct intel_engine_cs *ring;
-	u32 rcs, bcs, vcs;
-	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
 
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(0));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(0));
 		if (tmp) {
 			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
 
-			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			ring = &dev_priv->ring[RCS];
-			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
-			ring = &dev_priv->ring[BCS];
-			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[RCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[RCS]);
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[BCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[BCS]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
 
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(1));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(1));
 		if (tmp) {
 			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
 
-			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
-			ring = &dev_priv->ring[VCS];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-
-			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
-			ring = &dev_priv->ring[VCS2];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VCS]);
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VCS2]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VCS2]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
 
+	if (master_ctl & GEN8_GT_VECS_IRQ) {
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(3));
+		if (tmp) {
+			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
+			ret = IRQ_HANDLED;
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VECS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VECS]);
+		} else
+			DRM_ERROR("The master control interrupt lied (GT3)!\n");
+	}
+
 	if (master_ctl & GEN8_GT_PM_IRQ) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(2));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(2));
 		if (tmp & dev_priv->pm_rps_events) {
 			I915_WRITE_FW(GEN8_GT_IIR(2),
 				      tmp & dev_priv->pm_rps_events);
@@ -1341,22 +1342,6 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			DRM_ERROR("The master control interrupt lied (PM)!\n");
 	}
 
-	if (master_ctl & GEN8_GT_VECS_IRQ) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(3));
-		if (tmp) {
-			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
-			ret = IRQ_HANDLED;
-
-			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
-			ring = &dev_priv->ring[VECS];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-		} else
-			DRM_ERROR("The master control interrupt lied (GT3)!\n");
-	}
-
 	return ret;
 }
 
@@ -1651,7 +1636,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 
 	if (HAS_VEBOX(dev_priv->dev)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->dev, &dev_priv->ring[VECS]);
+			notify_ring(&dev_priv->ring[VECS]);
 
 		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
@@ -1845,7 +1830,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 			I915_WRITE(VLV_IIR, iir);
 		}
 
-		gen8_gt_irq_handler(dev, dev_priv, master_ctl);
+		gen8_gt_irq_handler(dev_priv, master_ctl);
 
 		/* Call regardless, as some status bits might not be
 		 * signalled in iir */
@@ -2187,7 +2172,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 
 	/* Find, clear, then process each source of interrupt */
 
-	ret = gen8_gt_irq_handler(dev, dev_priv, master_ctl);
+	ret = gen8_gt_irq_handler(dev_priv, master_ctl);
 
 	if (master_ctl & GEN8_DE_MISC_IRQ) {
 		tmp = I915_READ_FW(GEN8_DE_MISC_IIR);
@@ -3692,7 +3677,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		new_iir = I915_READ16(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			int plane = pipe;
@@ -3883,7 +3868,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		new_iir = I915_READ(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			int plane = pipe;
@@ -4110,9 +4095,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		new_iir = I915_READ(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 		if (iir & I915_BSD_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[VCS]);
+			notify_ring(&dev_priv->ring[VCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS &&
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 35/49] drm/i915: Remove request retirement before each batch
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (33 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 34/49] drm/i915: Tidy " Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 36/49] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

This reimplements the denial-of-service protection against igt from

commit 227f782e4667fc622810bce8be8ccdeee45f89c2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 15 10:41:42 2014 +0100

    drm/i915: Retire requests before creating a new one

and transfers the stall from before each batch into a new close handler.
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we make the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking. However, we do wish to do a retire
before forcing an allocation of a new batch pool object (prior to an
execbuffer), but we make the optimisation that we only need to do so if
the oldest available batch pool object is active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            |  2 ++
 drivers/gpu/drm/i915/i915_gem.c            | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  9 +++++++--
 drivers/gpu/drm/i915/i915_gem_batch_pool.h |  6 +++++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 --
 drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 +-
 8 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 82f8be4b6745..df85bfeabc40 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1590,6 +1590,7 @@ static struct drm_driver driver = {
 	.debugfs_init = i915_debugfs_init,
 	.debugfs_cleanup = i915_debugfs_cleanup,
 #endif
+	.gem_close_object = i915_gem_close_object,
 	.gem_free_object = i915_gem_free_object,
 	.gem_vm_ops = &i915_gem_vm_ops,
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4b51169c37ea..8fcf923aae3b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2638,6 +2638,8 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_init_vm(struct drm_i915_private *dev_priv,
 		  struct i915_address_space *vm);
+void i915_gem_close_object(struct drm_gem_object *obj,
+			   struct drm_file *file);
 void i915_gem_free_object(struct drm_gem_object *obj);
 void i915_gem_vma_destroy(struct i915_vma *vma);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 91b0c1db05ca..f87e7b90939c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2181,6 +2181,10 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	for (i = 0; i < page_count; i++) {
 		page = shmem_read_mapping_page_gfp(mapping, i, gfp);
 		if (IS_ERR(page)) {
+			i915_gem_retire_requests(dev_priv->dev);
+			page = shmem_read_mapping_page_gfp(mapping, i, gfp);
+		}
+		if (IS_ERR(page)) {
 			i915_gem_shrink(dev_priv,
 					page_count,
 					I915_SHRINK_BOUND |
@@ -2873,6 +2877,17 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+void i915_gem_close_object(struct drm_gem_object *gem,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj = to_intel_bo(gem);
+
+	if (obj->active && mutex_trylock(&obj->base.dev->struct_mutex)) {
+		(void)i915_gem_object_flush_active(obj);
+		mutex_unlock(&obj->base.dev->struct_mutex);
+	}
+}
+
 /**
  * i915_gem_wait_ioctl - implements DRM_IOCTL_I915_GEM_WAIT
  * @DRM_IOCTL_ARGS: standard ioctl arguments
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 7bf2f3f2968e..03c67f4ad773 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -44,12 +44,13 @@
  * @dev: the drm device
  * @pool: the batch buffer pool
  */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *ring,
 			      struct i915_gem_batch_pool *pool)
 {
 	int n;
 
-	pool->dev = dev;
+	pool->dev = ring->dev;
+	pool->ring = ring;
 
 	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
 		INIT_LIST_HEAD(&pool->cache_list[n]);
@@ -113,6 +114,10 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		n = ARRAY_SIZE(pool->cache_list) - 1;
 	list = &pool->cache_list[n];
 
+	tmp = list_first_entry_or_null(list, typeof(*tmp), batch_pool_link);
+	if (tmp && tmp->active)
+		i915_gem_retire_requests_ring(pool->ring);
+
 	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
 		if (tmp->active)
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
index 848e90703eed..467578c621bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -27,13 +27,17 @@
 
 #include "i915_drv.h"
 
+struct drm_device;
+struct intel_engine_cs;
+
 struct i915_gem_batch_pool {
 	struct drm_device *dev;
+	struct intel_engine_cs *ring;
 	struct list_head cache_list[4];
 };
 
 /* i915_gem_batch_pool.c */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *ring,
 			      struct i915_gem_batch_pool *pool);
 void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
 struct drm_i915_gem_object*
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 9345436e4d95..403450f4e4ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -697,8 +697,6 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	i915_gem_retire_requests_ring(ring);
-
 	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
 
 	INIT_LIST_HEAD(&ordered_vmas);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1c768c05e52e..80f00feb6bf4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1250,7 +1250,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
+	i915_gem_batch_pool_init(ring, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f7097a80dea3..01eacaf4dac1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2010,7 +2010,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
+	i915_gem_batch_pool_init(ring, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 36/49] drm/i915: Cache the GGTT offset for the execlists context
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (34 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 35/49] drm/i915: Remove request retirement before each batch Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 37/49] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

The offset doesn't change once the context is pinned, but the lookup
turns out to be comparatively costly as it gets repeated for every
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 22 ++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 80f00feb6bf4..cf00b507a853 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,8 +230,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
-					 struct drm_i915_gem_object *ctx_obj)
+static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *engine,
+					 uint32_t ggtt_offset)
 {
 	uint32_t desc;
 
@@ -239,27 +239,28 @@ static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= i915_gem_obj_ggtt_offset(ctx_obj);
+	desc |= ggtt_offset;
 
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
 	/* desc |= GEN8_CTX_FORCE_RESTORE; */
 
 	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(ring->dev) && INTEL_REVID(ring->dev) <= SKL_REVID_B0 &&
-	    (ring->id == BCS || ring->id == VCS ||
-	     ring->id == VECS || ring->id == VCS2))
+	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
+	    (engine->id == BCS || engine->id == VCS ||
+	     engine->id == VECS || engine->id == VCS2))
 		desc |= GEN8_CTX_FORCE_RESTORE;
 
 	return desc;
 }
 
-static uint32_t execlists_request_write_tail(struct intel_engine_cs *ring,
+static uint32_t execlists_request_write_tail(struct intel_engine_cs *engine,
 					     struct drm_i915_gem_request *rq)
 
 {
-	rq->ctx->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = rq->tail;
-	return execlists_ctx_descriptor(ring, rq->ctx->engine[ring->id].state);
+	struct intel_ringbuffer *ring = rq->ctx->engine[engine->id].ringbuf;
+	ring->regs[CTX_RING_TAIL+1] = rq->tail;
+	return execlists_ctx_descriptor(engine, ring->ggtt_offset);
 }
 
 static void execlists_submit_pair(struct intel_engine_cs *ring)
@@ -713,7 +714,8 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
-	if (WARN_ON(i915_gem_obj_ggtt_offset(ctx_obj) & 0xFFFFFFFF00000FFFULL)) {
+	ringbuf->ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	if (WARN_ON(ringbuf->ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
 		ret = -ENODEV;
 		goto unpin_ctx_obj;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 870a1d008db9..1f04b607fbcc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -98,6 +98,7 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 	uint32_t *regs;
+	uint32_t ggtt_offset;
 
 	struct intel_engine_cs *ring;
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 37/49] drm/i915: Prefer to check for idleness in worker rather than sync-flush
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (35 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 36/49] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f87e7b90939c..1104a21abc08 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2514,7 +2514,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 
 	i915_queue_hangcheck(ring->dev);
 
-	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -2833,6 +2832,12 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv), mm.idle_work.work);
 	struct drm_device *dev = dev_priv->dev;
+	struct intel_engine_cs *ring;
+	int i;
+
+	for_each_ring(ring, dev_priv, i)
+		if (!list_empty(&ring->request_list))
+			return;
 
 	intel_mark_idle(dev);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (36 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 37/49] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 14:28   ` Daniel Vetter
  2015-03-30 12:02   ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 39/49] drm/i915: Remove request->uniq Chris Wilson
                   ` (10 subsequent siblings)
  48 siblings, 2 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Since

commit 17cabf571e50677d980e9ab2a43c5f11213003ae
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 14 11:20:57 2015 +0000

    drm/i915: Trim the command parser allocations

we may then try to allocate a zero-sized object and attempt to extract
its pages. Understandably this fails.

Testcase: igt/gem_exec_nop #ivb,byt,hsw
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 403450f4e4ee..19c5fc6ae1e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1561,7 +1561,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err;
 	}
 
-	if (i915_needs_cmd_parser(ring)) {
+	if (i915_needs_cmd_parser(ring) && args->batch_len) {
 		batch_obj = i915_gem_execbuffer_parse(ring,
 						      &shadow_exec_entry,
 						      eb,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 39/49] drm/i915: Remove request->uniq
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (37 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 40/49] drm/i915: Cache the reset_counter for the request Chris Wilson
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula

We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  4 ----
 drivers/gpu/drm/i915/i915_trace.h       | 13 ++++---------
 drivers/gpu/drm/i915/intel_lrc.c        |  1 -
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 -
 4 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8fcf923aae3b..536344b99596 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1821,8 +1821,6 @@ struct drm_i915_private {
 		void (*stop_ring)(struct intel_engine_cs *ring);
 	} gt;
 
-	uint32_t request_uniq;
-
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2101,8 +2099,6 @@ struct drm_i915_gem_request {
 	/** process identifier submitting this request */
 	struct pid *pid;
 
-	uint32_t uniq;
-
 	/**
 	 * The ELSP only accepts two elements at a time, so we queue
 	 * context/tail pairs on a given queue (ring->execlist_queue) until the
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index ce8ee9e8bced..6e2eee52aaa2 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -499,7 +499,6 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
-			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     ),
 
@@ -508,13 +507,11 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->uniq,
-		      __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u",
+		      __entry->dev, __entry->ring, __entry->seqno)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
@@ -559,7 +556,6 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
-			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     __field(bool, blocking)
 			     ),
@@ -575,14 +571,13 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   __entry->blocking =
 				     mutex_is_locked(&ring->dev->struct_mutex);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, blocking=%s",
-		      __entry->dev, __entry->ring, __entry->uniq,
+	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
+		      __entry->dev, __entry->ring,
 		      __entry->seqno, __entry->blocking ?  "yes (NB)" : "no")
 );
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index cf00b507a853..f8ff3cf154a1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -759,7 +759,6 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 	}
 
 	request->ring = ring;
-	request->uniq = dev_private->request_uniq++;
 
 	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
 	if (ret) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 01eacaf4dac1..c3a34eec917a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2185,7 +2185,6 @@ intel_ring_alloc_request(struct intel_engine_cs *ring)
 
 	request->ring = ring;
 	request->ringbuf = ring->buffer;
-	request->uniq = dev_private->request_uniq++;
 
 	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
 	if (ret) {
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 40/49] drm/i915: Cache the reset_counter for the request
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (38 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 39/49] drm/i915: Remove request->uniq Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 41/49] drm/i915: Allocate context objects from stolen Chris Wilson
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Instead of querying the reset counter before every access to the ring,
query it the first time we touch the ring, and do a final compare when
submitting the request. For correctness, we need to then sanitize how
the reset_counter is incremented to prevent broken submission and
waiting across resets, in the process fixing the persistent -EIO we
still see today on failed waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         | 32 +++++++-----
 drivers/gpu/drm/i915/i915_drv.h         | 29 +++++++----
 drivers/gpu/drm/i915/i915_gem.c         | 89 ++++++++++++++-------------------
 drivers/gpu/drm/i915/i915_irq.c         | 28 ++++-------
 drivers/gpu/drm/i915/intel_display.c    | 10 ++--
 drivers/gpu/drm/i915/intel_lrc.c        | 10 +---
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 ---
 7 files changed, 90 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index df85bfeabc40..d97b04f91e6f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -814,6 +814,8 @@ int i915_resume_legacy(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	unsigned reset_counter;
 	bool simulated;
 	int ret;
 
@@ -823,17 +825,23 @@ int i915_reset(struct drm_device *dev)
 	intel_reset_gt_powersave(dev);
 
 	mutex_lock(&dev->struct_mutex);
+	reset_counter = atomic_inc_return(&error->reset_counter);
+	if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+		atomic_set_mask(I915_WEDGED, &error->reset_counter);
+		mutex_unlock(&dev->struct_mutex);
+		return -EIO;
+	}
 
 	i915_gem_reset(dev);
 
-	simulated = dev_priv->gpu_error.stop_rings != 0;
+	simulated = error->stop_rings != 0;
 
 	ret = intel_gpu_reset(dev);
 
 	/* Also reset the gpu hangman. */
 	if (simulated) {
 		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-		dev_priv->gpu_error.stop_rings = 0;
+		error->stop_rings = 0;
 		if (ret == -ENODEV) {
 			DRM_INFO("Reset not implemented, but ignoring "
 				 "error for simulated gpu hangs\n");
@@ -846,8 +854,7 @@ int i915_reset(struct drm_device *dev)
 
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
-		mutex_unlock(&dev->struct_mutex);
-		return ret;
+		goto error;
 	}
 
 	intel_overlay_reset(dev_priv);
@@ -866,20 +873,14 @@ int i915_reset(struct drm_device *dev)
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
-	dev_priv->gpu_error.reload_in_reset = true;
-
 	ret = i915_gem_init_hw(dev);
-
-	dev_priv->gpu_error.reload_in_reset = false;
-
-	mutex_unlock(&dev->struct_mutex);
 	if (ret) {
 		DRM_ERROR("Failed hw init on reset %d\n", ret);
-		return ret;
+		goto error;
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	/*
 	 * rps/rc6 re-init is necessary to restore state lost after the
 	 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -890,6 +891,11 @@ int i915_reset(struct drm_device *dev)
 		intel_enable_gt_powersave(dev);
 
 	return 0;
+
+error:
+	atomic_set_mask(I915_WEDGED, &error->reset_counter);
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 536344b99596..fa8f18d2c9b4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1286,9 +1286,6 @@ struct i915_gpu_error {
 
 	/* For missed irq/seqno simulation. */
 	unsigned int test_irq_rings;
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset   */
-	bool reload_in_reset;
 };
 
 enum modeset_restore {
@@ -2053,6 +2050,7 @@ struct drm_i915_gem_request {
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
 	struct intel_engine_cs *ring;
+	unsigned reset_counter;
 
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
@@ -2750,24 +2748,38 @@ i915_gem_find_active_request(struct intel_engine_cs *ring);
 
 bool i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
-int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
+int __must_check i915_gem_check_wedge(unsigned reset_counter,
 				      bool interruptible);
 int __must_check i915_gem_check_olr(struct drm_i915_gem_request *req);
 
+static inline u32 i915_reset_counter(struct i915_gpu_error *error)
+{
+	return atomic_read(&error->reset_counter);
+}
+
+static inline bool __i915_reset_in_progress(u32 reset)
+{
+	return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+}
+
+static inline bool __i915_terminally_wedged(u32 reset)
+{
+	return reset & I915_WEDGED;
+}
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
-	return unlikely(atomic_read(&error->reset_counter)
-			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+	return __i915_reset_in_progress(i915_reset_counter(error));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) & I915_WEDGED;
+	return __i915_terminally_wedged(i915_reset_counter(error));
 }
 
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
+	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
@@ -2799,7 +2811,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 #define i915_add_request(ring) \
 	__i915_add_request(ring, NULL, NULL)
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct drm_i915_file_private *file_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1104a21abc08..abd858701307 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,14 +100,19 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
 	spin_unlock(&dev_priv->mm.object_stat_lock);
 }
 
+inline static bool reset_complete(struct i915_gpu_error *error)
+{
+	unsigned reset_counter = i915_reset_counter(error);
+	return (!__i915_reset_in_progress(reset_counter) ||
+		__i915_terminally_wedged(reset_counter));
+}
+
 static int
 i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress(error) || \
-		   i915_terminally_wedged(error))
-	if (EXIT_COND)
+	if (reset_complete(error))
 		return 0;
 
 	/*
@@ -116,17 +121,16 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 	 * we should simply try to bail out and fail as gracefully as possible.
 	 */
 	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       EXIT_COND,
+					       reset_complete(error),
 					       10*HZ);
 	if (ret == 0) {
 		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
 		return -EIO;
 	} else if (ret < 0) {
 		return ret;
+	} else {
+		return 0;
 	}
-#undef EXIT_COND
-
-	return 0;
 }
 
 int i915_mutex_lock_interruptible(struct drm_device *dev)
@@ -1127,26 +1131,18 @@ put_rpm:
 }
 
 int
-i915_gem_check_wedge(struct i915_gpu_error *error,
+i915_gem_check_wedge(unsigned reset_counter,
 		     bool interruptible)
 {
-	if (i915_reset_in_progress(error)) {
+	if (__i915_reset_in_progress(reset_counter)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
 			return -EIO;
 
 		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
+		if (__i915_terminally_wedged(reset_counter))
 			return -EIO;
-
-		/*
-		 * Check if GPU Reset is in progress - we need intel_ring_begin
-		 * to work properly to reinit the hw state while the gpu is
-		 * still marked as reset-in-progress. Handle this with a flag.
-		 */
-		if (!error->reload_in_reset)
-			return -EAGAIN;
 	}
 
 	return 0;
@@ -1206,7 +1202,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @reset_counter: reset sequence associated with the given request
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  *
@@ -1221,7 +1216,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct drm_i915_file_private *file_priv)
@@ -1271,12 +1265,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) {
-			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
-			 * is truely gone. */
-			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-			if (ret == 0)
-				ret = -EAGAIN;
+		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
+			/* As we do not requeue the request over a GPU reset,
+			 * if one does occur we know that the request is
+			 * effectively complete.
+			 */
+			ret = 0;
 			break;
 		}
 
@@ -1407,17 +1401,11 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-	if (ret)
-		return ret;
-
 	ret = i915_gem_check_olr(req);
 	if (ret)
 		return ret;
 
-	ret = __i915_wait_request(req,
-				  atomic_read(&dev_priv->gpu_error.reset_counter),
-				  interruptible, NULL, NULL);
+	ret = __i915_wait_request(req, interruptible, NULL, NULL);
 	if (ret)
 		return ret;
 
@@ -1478,7 +1466,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -1487,12 +1474,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
-	if (ret)
-		return ret;
-
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-
 	if (readonly) {
 		struct drm_i915_gem_request *rq;
 
@@ -1523,8 +1504,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	mutex_unlock(&dev->struct_mutex);
 	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, file_priv);
+		ret = __i915_wait_request(requests[i], true, NULL, file_priv);
 	mutex_lock(&dev->struct_mutex);
 
 err:
@@ -2433,6 +2413,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	if (WARN_ON(request == NULL))
 		return -ENOMEM;
 
+	if (request->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
+		return -EAGAIN;
+
 	if (i915.enable_execlists) {
 		ringbuf = request->ctx->engine[ring->id].ringbuf;
 	} else
@@ -2569,7 +2552,13 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 
 struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i915)
 {
+	unsigned reset_counter = i915_reset_counter(&i915->gpu_error);
 	struct drm_i915_gem_request *rq;
+	int ret;
+
+	ret = i915_gem_check_wedge(reset_counter, i915->mm.interruptible);
+	if (ret)
+		return ERR_PTR(ret);
 
 	rq = kmem_cache_zalloc(i915->requests, GFP_KERNEL);
 	if (rq == NULL)
@@ -2577,6 +2566,8 @@ struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i91
 
 	kref_init(&rq->ref);
 	rq->i915 = i915;
+	rq->reset_counter = reset_counter;
+
 	return rq;
 }
 
@@ -2918,11 +2909,9 @@ void i915_gem_close_object(struct drm_gem_object *gem,
 int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int i, n = 0;
 	int ret;
 
@@ -2956,7 +2945,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -2969,7 +2957,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			ret = __i915_wait_request(req[i], reset_counter, true,
+			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  file->driver_priv);
 		i915_gem_request_unreference__unlocked(req[i]);
@@ -3006,7 +2994,6 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		ret = __i915_wait_request(rq,
-					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
 					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
 		if (ret)
 			return ret;
@@ -4217,14 +4204,15 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
 	struct drm_i915_gem_request *request, *target = NULL;
-	unsigned reset_counter;
 	int ret;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, false);
+	/* ABI: return -EIO if wedged */
+	ret = i915_gem_check_wedge(i915_reset_counter(&dev_priv->gpu_error),
+				   false);
 	if (ret)
 		return ret;
 
@@ -4235,7 +4223,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
@@ -4243,7 +4230,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_request(target, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d5679297616c..6e9f435bf40a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2299,6 +2299,13 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 		wake_up_all(&dev_priv->gpu_error.reset_queue);
 }
 
+static bool reset_pending(struct i915_gpu_error *error)
+{
+	unsigned reset_counter = i915_reset_counter(error);
+	return (__i915_reset_in_progress(reset_counter) &&
+		!__i915_terminally_wedged(reset_counter));
+}
+
 /**
  * i915_reset_and_wakeup - do process context error handling work
  *
@@ -2308,7 +2315,6 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 static void i915_reset_and_wakeup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_gpu_error *error = &dev_priv->gpu_error;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
 	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
@@ -2326,7 +2332,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
+	if (reset_pending(&dev_priv->gpu_error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
@@ -2354,25 +2360,9 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 
 		intel_runtime_pm_put(dev_priv);
 
-		if (ret == 0) {
-			/*
-			 * After all the gem state is reset, increment the reset
-			 * counter and wake up everyone waiting for the reset to
-			 * complete.
-			 *
-			 * Since unlock operations are a one-sided barrier only,
-			 * we need to insert a barrier here to order any seqno
-			 * updates before
-			 * the counter increment.
-			 */
-			smp_mb__before_atomic();
-			atomic_inc(&dev_priv->gpu_error.reset_counter);
-
+		if (ret == 0)
 			kobject_uevent_env(&dev->primary->kdev->kobj,
 					   KOBJ_CHANGE, reset_done_event);
-		} else {
-			atomic_set_mask(I915_WEDGED, &error->reset_counter);
-		}
 
 		/*
 		 * Note: The wake_up also serves as a memory barrier so that
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 64b67df94d33..c830abe3e242 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3198,8 +3198,7 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	bool pending;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	if (intel_crtc->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -9568,8 +9567,7 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	if (crtc->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
 		return true;
 
 	/*
@@ -9988,9 +9986,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 		container_of(work, struct intel_mmio_flip, work);
 
 	if (mmio_flip->rq)
-		WARN_ON(__i915_wait_request(mmio_flip->rq,
-					    mmio_flip->crtc->reset_counter,
-					    false, NULL, NULL));
+		WARN_ON(__i915_wait_request(mmio_flip->rq, false, NULL, NULL));
 
 	intel_do_mmio_flip(mmio_flip->crtc);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f8ff3cf154a1..42712c2017e9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -874,22 +874,14 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 			     struct intel_context *ctx, int num_dwords)
 {
-	struct intel_engine_cs *ring = ringbuf->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = logical_ring_prepare(ringbuf, ctx, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
 
 	/* Preallocate the olr before touching the ring */
-	ret = logical_ring_alloc_request(ring, ctx);
+	ret = logical_ring_alloc_request(ringbuf->ring, ctx);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c3a34eec917a..2197ed878263 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2220,14 +2220,8 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
 int intel_ring_begin(struct intel_engine_cs *ring,
 		     int num_dwords)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 41/49] drm/i915: Allocate context objects from stolen
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (39 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 40/49] drm/i915: Cache the reset_counter for the request Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 42/49] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

As we never expose context objects directly to userspace, we can forgo
allocating a first-class GEM object for them and prefer to use the
limited resource of reserved/stolen memory for them. Note this means
that their initial contents are undefined.

However, a downside of using stolen objects for execlists is that we
cannot access the physical address directly (thanks MCH!) which prevents
their use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 4 +++-
 drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 18900f745bc6..b9c6b0ad1d0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -157,7 +157,9 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	obj = i915_gem_alloc_object(dev, size);
+	obj = i915_gem_object_create_stolen(dev, size);
+	if (obj == NULL)
+		obj = i915_gem_alloc_object(dev, size);
 	if (obj == NULL)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 42712c2017e9..940dbaece3ae 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1720,7 +1720,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
-	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
+	ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (IS_ERR(ctx_obj)) {
 		ret = PTR_ERR(ctx_obj);
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 42/49] drm/i915: Introduce an internal allocator for disposable private objects
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (40 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 41/49] drm/i915: Allocate context objects from stolen Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 43/49] drm/i915: Do not zero initialise page tables Chris Wilson
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swapped backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small varierty of other internal
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                |   1 +
 drivers/gpu/drm/i915/i915_drv.h              |   5 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  25 ++---
 drivers/gpu/drm/i915/i915_gem_internal.c     | 149 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  10 +-
 6 files changed, 168 insertions(+), 24 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_internal.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a69002e2257d..0054a058477d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_evict.o \
 	  i915_gem_execbuffer.o \
 	  i915_gem_gtt.o \
+	  i915_gem_internal.o \
 	  i915_gem.o \
 	  i915_gem_shrinker.o \
 	  i915_gem_stolen.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fa8f18d2c9b4..0d61215f2817 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3010,6 +3010,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 gtt_offset,
 					       u32 size);
 
+/* i915_gem_internal.c */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size);
+
 /* i915_gem_shrinker.c */
 unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
 			      long target,
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 03c67f4ad773..a81f008ce688 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -101,7 +101,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	struct drm_i915_gem_object *obj = NULL;
 	struct drm_i915_gem_object *tmp, *next;
 	struct list_head *list;
-	int n;
+	int n, ret;
 
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
@@ -123,13 +123,6 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		if (tmp->active)
 			break;
 
-		/* While we're looping, do some clean up */
-		if (tmp->madv == __I915_MADV_PURGED) {
-			list_del(&tmp->batch_pool_link);
-			drm_gem_object_unreference(&tmp->base);
-			continue;
-		}
-
 		if (tmp->base.size >= size) {
 			obj = tmp;
 			break;
@@ -137,20 +130,16 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	}
 
 	if (obj == NULL) {
-		int ret;
-
-		obj = i915_gem_alloc_object(pool->dev, size);
+		obj = i915_gem_object_create_internal(pool->dev, size);
 		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
-
-		ret = i915_gem_object_get_pages(obj);
-		if (ret)
-			return ERR_PTR(ret);
-
-		obj->madv = I915_MADV_DONTNEED;
 	}
 
-	list_move_tail(&obj->batch_pool_link, list);
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
 	i915_gem_object_pin_pages(obj);
+	list_move_tail(&obj->batch_pool_link, list);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_internal.c b/drivers/gpu/drm/i915/i915_gem_internal.c
new file mode 100644
index 000000000000..583908392ff5
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_internal.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
+
+static void __i915_gem_object_free_pages(struct sg_table *st)
+{
+	struct sg_page_iter sg_iter;
+
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
+		page_cache_release(sg_page_iter_page(&sg_iter));
+
+	sg_free_table(st);
+	kfree(st);
+}
+
+static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
+{
+	const unsigned npages = obj->base.size / PAGE_SIZE;
+	struct sg_table *st;
+	struct scatterlist *sg;
+	unsigned long last_pfn = 0;	/* suppress gcc warning */
+	gfp_t gfp;
+	int i;
+
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (st == NULL)
+		return -ENOMEM;
+
+	if (sg_alloc_table(st, npages, GFP_KERNEL)) {
+		kfree(st);
+		return -ENOMEM;
+	}
+
+	sg = st->sgl;
+	st->nents = 0;
+
+	gfp = GFP_KERNEL;
+	gfp |= __GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD;
+	gfp &= ~(__GFP_IO | __GFP_WAIT);
+	for (i = 0; i < npages; i++) {
+		struct page *page;
+
+		page = alloc_page(gfp);
+		if (page == NULL) {
+			i915_gem_shrink_all(to_i915(obj->base.dev));
+			page = alloc_page(GFP_KERNEL);
+			if (page == NULL)
+				goto err;
+		}
+
+		/* XXX page allocator needs to check for SNB bugs */
+
+#ifdef CONFIG_SWIOTLB
+		if (swiotlb_nr_tbl()) {
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+			sg = sg_next(sg);
+			continue;
+		}
+#endif
+		if (!i || page_to_pfn(page) != last_pfn + 1) {
+			if (i)
+				sg = sg_next(sg);
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+		} else {
+			sg->length += PAGE_SIZE;
+		}
+		last_pfn = page_to_pfn(page);
+	}
+#ifdef CONFIG_SWIOTLB
+	if (!swiotlb_nr_tbl())
+#endif
+		sg_mark_end(sg);
+	obj->pages = st;
+	obj->madv = I915_MADV_DONTNEED;
+
+	return 0;
+
+err:
+	sg_mark_end(sg);
+	__i915_gem_object_free_pages(st);
+	return -ENOMEM;
+}
+
+static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj)
+{
+	__i915_gem_object_free_pages(obj->pages);
+
+	obj->dirty = 0;
+	obj->madv = I915_MADV_WILLNEED;
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
+	.get_pages = i915_gem_object_get_pages_internal,
+	.put_pages = i915_gem_object_put_pages_internal,
+};
+
+
+/**
+ * Creates a new object that wraps some internal memory for private use.
+ * This object is not backed by swappable storage, and as such its contents
+ * are volatile and only valid whilst pinned. If the object is reaped by the
+ * shrinker, its pages and data will be discarded. Equally, it is not a full
+ * GEM object and so not valid for access from userspace. This makes it useful
+ * for hardware interfaces like ringbuffers (which are pinned from the time
+ * the request is written to the time the hardware stops accessing it), but
+ * not for contexts (which need to be preserved when not active for later
+ * reuse).
+ */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size)
+{
+	struct drm_i915_gem_object *obj;
+
+	obj = i915_gem_object_alloc(dev);
+	if (obj == NULL)
+		return NULL;
+
+	drm_gem_private_object_init(dev, &obj->base, size);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops);
+
+	return obj;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 521548a08578..4bb91cdadec9 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -57,7 +57,7 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->rodata->batch_items * 4 > 4096)
 		return -EINVAL;
 
-	so->obj = i915_gem_alloc_object(dev, 4096);
+	so->obj = i915_gem_object_create_internal(dev, 4096);
 	if (so->obj == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2197ed878263..6003e13e05b6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -669,7 +669,7 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 
 	WARN_ON(ring->scratch.obj);
 
-	ring->scratch.obj = i915_gem_alloc_object(ring->dev, 4096);
+	ring->scratch.obj = i915_gem_object_create_internal(ring->dev, 4096);
 	if (ring->scratch.obj == NULL) {
 		DRM_ERROR("Failed to allocate seqno page\n");
 		ret = -ENOMEM;
@@ -1834,7 +1834,7 @@ static int init_status_page(struct intel_engine_cs *ring)
 		unsigned flags;
 		int ret;
 
-		obj = i915_gem_alloc_object(ring->dev, 4096);
+		obj = i915_gem_object_create_internal(ring->dev, 4096);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate status page\n");
 			return -ENOMEM;
@@ -1981,7 +1981,7 @@ int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	if (!HAS_LLC(dev))
 		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
 	if (obj == NULL)
-		obj = i915_gem_alloc_object(dev, ringbuf->size);
+		obj = i915_gem_object_create_internal(dev, ringbuf->size);
 	if (obj == NULL)
 		return -ENOMEM;
 
@@ -2477,7 +2477,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	if (INTEL_INFO(dev)->gen >= 8) {
 		if (i915_semaphore_is_enabled(dev)) {
-			obj = i915_gem_alloc_object(dev, 4096);
+			obj = i915_gem_object_create_internal(dev, 4096);
 			if (obj == NULL) {
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 				i915.semaphores = 0;
@@ -2583,7 +2583,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
-		obj = i915_gem_alloc_object(dev, I830_WA_SIZE);
+		obj = i915_gem_object_create_internal(dev, I830_WA_SIZE);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate batch bo\n");
 			return -ENOMEM;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 43/49] drm/i915: Do not zero initialise page tables
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (41 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 42/49] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-04-07 14:46   ` Mika Kuoppala
  2015-03-27 11:02 ` [PATCH 44/49] drm/i915: The argument for postfix is redundant Chris Wilson
                   ` (5 subsequent siblings)
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala

After we successfully allocate them, we will fill them with their
initial contents (either the chain of page tables, or a pointer to the
scratch page).

Regression from
commit 06fda602dbca9c59d87db7da71192e4b54c9f5ff
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Tue Feb 24 16:22:36 2015 +0000

    drm/i915: Create page table allocators

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com> (v3+)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 543fff104401..4a50e1db63dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -426,7 +426,7 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
-	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	pd->page = alloc_page(GFP_KERNEL);
 	if (!pd->page) {
 		kfree(pd);
 		return ERR_PTR(-ENOMEM);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 44/49] drm/i915: The argument for postfix is redundant
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (42 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 43/49] drm/i915: Do not zero initialise page tables Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 45/49] drm/i915: Record the position of the start of the request Chris Wilson
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

We are conservative on the amount of free space available in the ring to
avoid overruning the potential MI_INTERRUPT after the seqno write.
Further undermining the justification for the change was that it was
applied incorrectly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++------------
 drivers/gpu/drm/i915/i915_gem.c         | 11 ++---------
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ++----
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 5 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0d61215f2817..8d6827347fef 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2055,18 +2055,8 @@ struct drm_i915_gem_request {
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
 
-	/** Position in the ringbuffer of the start of the request */
-	u32 head;
-
-	/**
-	 * Position in the ringbuffer of the start of the postfix.
-	 * This is required to calculate the maximum available ringbuffer
-	 * space without overwriting the postfix.
-	 */
-	 u32 postfix;
-
-	/** Position in the ringbuffer of the end of the whole request */
-	u32 tail;
+	/** Position in the ringbuffer of the request */
+	u32 head, tail;
 
 	/**
 	 * Context and ring buffer related to this request
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index abd858701307..5fef69b2ce9f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1369,7 +1369,7 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
 	if (list_empty(&rq->list))
 		return;
 
-	rq->ringbuf->last_retired_head = rq->postfix;
+	rq->ringbuf->last_retired_head = rq->tail;
 
 	do {
 		struct drm_i915_gem_request *prev =
@@ -2439,13 +2439,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 			return ret;
 	}
 
-	/* Record the position of the start of the request so that
-	 * should we detect the updated seqno part-way through the
-	 * GPU processing the request, we never over-estimate the
-	 * position of the head.
-	 */
-	request->postfix = intel_ring_get_tail(ringbuf);
-
 	if (i915.enable_execlists) {
 		ret = ring->emit_request(ringbuf, request);
 		if (ret)
@@ -2747,7 +2740,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		 * of tail of the request to update the last known position
 		 * of the GPU head.
 		 */
-		request->ringbuf->last_retired_head = request->postfix;
+		request->ringbuf->last_retired_head = request->tail;
 		i915_gem_request_retire(request);
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 8832f1b2a495..b7a00e464ba4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1072,7 +1072,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->seqno;
 			erq->jiffies = request->emitted_jiffies;
-			erq->tail = request->postfix;
+			erq->tail = request->tail;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 940dbaece3ae..8fa44c3e8c3c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -407,7 +407,6 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
-				   u32 tail,
 				   struct drm_i915_gem_request *request)
 {
 	if (WARN_ON(request == NULL))
@@ -419,8 +418,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	i915_gem_request_reference(request);
 	WARN_ON(to != request->ctx);
 
-	request->tail = tail;
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
@@ -696,7 +693,8 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 	if (intel_ring_stopped(ring))
 		return;
 
-	execlists_context_queue(ring, ctx, ringbuf->tail, request);
+	request->tail = ringbuf->tail;
+	execlists_context_queue(ring, ctx, request);
 }
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6003e13e05b6..f44e7be17104 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2106,7 +2106,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		space = __intel_ring_space(request->postfix, ringbuf->tail,
+		space = __intel_ring_space(request->tail, ringbuf->tail,
 					   ringbuf->size);
 		if (space >= n)
 			break;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 45/49] drm/i915: Record the position of the start of the request
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (43 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 44/49] drm/i915: The argument for postfix is redundant Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 46/49] drm/i915: Cache the execlist ctx descriptor Chris Wilson
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Not only does it make for good documentation and debugging aide, but it
is also vital for when we want to unwind requests - such as when
throwing away an incomplete request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 12 +++++++----
 drivers/gpu/drm/i915/i915_gem_context.c |  9 +--------
 drivers/gpu/drm/i915/intel_lrc.c        | 36 +--------------------------------
 drivers/gpu/drm/i915/intel_lrc.h        |  2 --
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 5 files changed, 11 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5fef69b2ce9f..706ec143ff1b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2406,7 +2406,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
 	struct intel_ringbuffer *ringbuf;
-	u32 request_start;
 	int ret;
 
 	request = ring->outstanding_lazy_request;
@@ -2421,7 +2420,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	} else
 		ringbuf = ring->buffer;
 
-	request_start = intel_ring_get_tail(ringbuf);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -2449,7 +2447,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 			return ret;
 	}
 
-	request->head = request_start;
 	request->tail = intel_ring_get_tail(ringbuf);
 
 	/* Whilst this request exists, batch_obj will be on the
@@ -2658,7 +2655,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	/* This may not have been flushed before the reset, so clean it now */
-	i915_gem_request_assign(&ring->outstanding_lazy_request, NULL);
+	if (ring->outstanding_lazy_request) {
+		struct drm_i915_gem_request *request;
+
+		request = ring->outstanding_lazy_request;
+		request->ringbuf->tail = request->head;
+
+		i915_gem_request_assign(&ring->outstanding_lazy_request, NULL);
+	}
 }
 
 void i915_gem_restore_fences(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b9c6b0ad1d0f..43e58249235b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -298,15 +298,8 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
-	if (i915.enable_execlists) {
-		struct intel_context *ctx;
-
-		list_for_each_entry(ctx, &dev_priv->context_list, link) {
-			intel_lr_context_reset(dev, ctx);
-		}
-
+	if (i915.enable_execlists)
 		return;
-	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8fa44c3e8c3c..f4535832cf53 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -768,6 +768,7 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
 	request->ctx = ctx;
 	i915_gem_context_reference(request->ctx);
 	request->ringbuf = ctx->engine[ring->id].ringbuf;
+	request->head = intel_ring_get_tail(request->ringbuf);
 
 	ring->outstanding_lazy_request = request;
 	return 0;
@@ -1813,38 +1814,3 @@ error_unpin_ctx:
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
-
-void intel_lr_context_reset(struct drm_device *dev,
-			    struct intel_context *ctx)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring;
-	int i;
-
-	for_each_ring(ring, dev_priv, i) {
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
-		struct intel_ringbuffer *ringbuf =
-				ctx->engine[ring->id].ringbuf;
-		uint32_t *reg_state;
-		struct page *page;
-
-		if (!ctx_obj)
-			continue;
-
-		if (i915_gem_object_get_pages(ctx_obj)) {
-			WARN(1, "Failed get_pages for context obj\n");
-			continue;
-		}
-		page = i915_gem_object_get_page(ctx_obj, 1);
-		reg_state = kmap_atomic(page);
-
-		reg_state[CTX_RING_HEAD+1] = 0;
-		reg_state[CTX_RING_TAIL+1] = 0;
-
-		kunmap_atomic(reg_state);
-
-		ringbuf->head = 0;
-		ringbuf->tail = 0;
-	}
-}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 03e69c8636b0..b276e00773d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -71,8 +71,6 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
-void intel_lr_context_reset(struct drm_device *dev,
-			struct intel_context *ctx);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f44e7be17104..984bfefb8373 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2185,6 +2185,7 @@ intel_ring_alloc_request(struct intel_engine_cs *ring)
 
 	request->ring = ring;
 	request->ringbuf = ring->buffer;
+	request->head = intel_ring_get_tail(ring->buffer);
 
 	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
 	if (ret) {
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 46/49] drm/i915: Cache the execlist ctx descriptor
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (44 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 45/49] drm/i915: Record the position of the start of the request Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 47/49] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 56 +++++++++++++++++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +-
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f4535832cf53..98c0a76fc560 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,37 +230,13 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *engine,
-					 uint32_t ggtt_offset)
-{
-	uint32_t desc;
-
-	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
-	desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= ggtt_offset;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
-
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
-	    (engine->id == BCS || engine->id == VCS ||
-	     engine->id == VECS || engine->id == VCS2))
-		desc |= GEN8_CTX_FORCE_RESTORE;
-
-	return desc;
-}
-
 static uint32_t execlists_request_write_tail(struct intel_engine_cs *engine,
 					     struct drm_i915_gem_request *rq)
 
 {
 	struct intel_ringbuffer *ring = rq->ctx->engine[engine->id].ringbuf;
 	ring->regs[CTX_RING_TAIL+1] = rq->tail;
-	return execlists_ctx_descriptor(engine, ring->ggtt_offset);
+	return ring->descriptor;
 }
 
 static void execlists_submit_pair(struct intel_engine_cs *ring)
@@ -702,6 +678,7 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	u32 ggtt_offset;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -712,11 +689,12 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
-	ringbuf->ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
-	if (WARN_ON(ringbuf->ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
+	ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	if (WARN_ON(ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
 		ret = -ENODEV;
 		goto unpin_ctx_obj;
 	}
+	ringbuf->descriptor = ggtt_offset | ring->execlist_ctx_descriptor;
 
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
@@ -1232,6 +1210,28 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	}
 }
 
+static uint32_t base_ctx_descriptor(struct intel_engine_cs *engine)
+{
+	uint32_t desc;
+
+	desc = GEN8_CTX_VALID;
+	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
+	    (engine->id == BCS || engine->id == VCS ||
+	     engine->id == VECS || engine->id == VCS2))
+		desc |= GEN8_CTX_FORCE_RESTORE;
+
+	return desc;
+}
+
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
 	int ret;
@@ -1253,6 +1253,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	if (ret)
 		return ret;
 
+	ring->execlist_ctx_descriptor = base_ctx_descriptor(ring);
+
 	ret = intel_lr_context_deferred_create(ring->default_context, ring);
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1f04b607fbcc..dc9f5ac21833 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -98,7 +98,7 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 	uint32_t *regs;
-	uint32_t ggtt_offset;
+	uint32_t descriptor;
 
 	struct intel_engine_cs *ring;
 
@@ -243,6 +243,7 @@ struct  intel_engine_cs {
 	struct drm_i915_gem_request *execlist_port[2];
 	struct list_head execlist_queue;
 	struct list_head execlist_completed;
+	u32 execlist_ctx_descriptor;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 47/49] drm/i915: Treat ringbuffer writes as write to normal memory
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (45 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 46/49] drm/i915: Cache the execlist ctx descriptor Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 48/49] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
  2015-03-27 11:02 ` [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

The hardware is documentated as treating the TAIL register update as
serialising, so we can relax the barriers when filling the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.h        |  7 ++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ++++++++++++-----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index b276e00773d9..979229970a91 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -50,8 +50,9 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf,
  */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ringbuf);
 }
+
 /**
  * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
  * @ringbuf: Ringbuffer to write to.
@@ -60,9 +61,9 @@ static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 					   u32 data)
 {
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ringbuf, data);
 }
+
 int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 			     struct intel_context *ctx,
 			     int num_dwords);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index dc9f5ac21833..b4d3dc1922f7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -404,17 +404,24 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring);
 
 int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
 int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
+static inline void intel_ringbuffer_emit(struct intel_ringbuffer *rb,
+					 u32 data)
+{
+	*(uint32_t __force *)(rb->virtual_start + rb->tail) = data;
+	rb->tail += 4;
+}
+static inline void intel_ringbuffer_advance(struct intel_ringbuffer *rb)
+{
+	rb->tail &= rb->size - 1;
+}
 static inline void intel_ring_emit(struct intel_engine_cs *ring,
 				   u32 data)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ring->buffer, data);
 }
 static inline void intel_ring_advance(struct intel_engine_cs *ring)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ring->buffer);
 }
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 48/49] drm/i915: Eliminate vmap overhead for cmd parser
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (46 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 47/49] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-27 11:02 ` [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
  48 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

With a little complexity to handle cmds straddling page boundaries, we
can completely avoiding having to vmap the batch and the shadow batch
objects whilst running the command parser.

On ivb i7-3720MQ:

x11perf -dot before 54.3M, after 53.2M (max 203M)
glxgears before 7110 fps, after 7300 fps (max 7860 fps)

Before:
Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s

After:
Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s

Removing the vmap is mostly a win, except we lose in a few cases where
the batch size is greater than a page due to the extra complexity (loss
of a simple cache efficient large copy, and boundary handling).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 297 +++++++++++++++++----------------
 1 file changed, 150 insertions(+), 147 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 9605ff8f2fcd..60b30b4165d4 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -818,100 +818,6 @@ static bool valid_reg(const u32 *table, int count, u32 addr)
 	return false;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
-		       struct drm_i915_gem_object *src_obj,
-		       u32 batch_start_offset,
-		       u32 batch_len)
-{
-	int needs_clflush = 0;
-	void *src_base, *src;
-	void *dst = NULL;
-	int ret;
-
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
-		return ERR_PTR(ret);
-	}
-
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
-		goto unpin_src;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
-	}
-
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
-		goto unmap_src;
-	}
-
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
-
-	memcpy(dst, src, batch_len);
-
-unmap_src:
-	vunmap(src_base);
-unpin_src:
-	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
-}
-
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -1046,16 +952,34 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	u32 tmp[128];
+	struct sg_page_iter src_iter, dst_iter;
+	const struct drm_i915_cmd_descriptor *desc;
+	int needs_clflush = 0;
+	void *src, *dst;
+	unsigned in, out;
+	u32 *buf, partial = 0, length;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
-		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+	if (batch_len > shadow_batch_obj->base.size ||
+	    batch_len + batch_start_offset > batch_obj->base.size)
+		return -E2BIG;
+
+	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
+		return -ENODEV;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
+		goto unpin;
 	}
 
 	/*
@@ -1063,54 +987,136 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
+	ret = -EINVAL;
+
+	__sg_page_iter_start(&dst_iter,
+			     shadow_batch_obj->pages->sgl,
+			     shadow_batch_obj->pages->nents,
+			     0);
+	__sg_page_iter_next(&dst_iter);
+	dst = kmap_atomic(sg_page_iter_page(&dst_iter));
+
+	in = offset_in_page(batch_start_offset);
+	out = 0;
+	for_each_sg_page(batch_obj->pages->sgl,
+			 &src_iter,
+			 batch_obj->pages->nents,
+			 batch_start_offset >> PAGE_SHIFT) {
+		u32 this, i, j, k;
+		u32 *cmd, *page_end, *batch_end;
+
+		this = batch_len;
+		if (this > PAGE_SIZE - in)
+			this = PAGE_SIZE - in;
+
+		src = kmap_atomic(sg_page_iter_page(&src_iter));
+		if (needs_clflush)
+			drm_clflush_virt_range(src + in, this);
+
+		i = this;
+		j = in;
+		do {
+			/* We keep dst around so that we do not blow
+			 * the CPU caches immediately after the copy (due
+			 * to the kunmap_atomic(dst) doing a TLB flush.
+			 */
+			if (out == PAGE_SIZE) {
+				__sg_page_iter_next(&dst_iter);
+				kunmap_atomic(dst);
+				dst = kmap_atomic(sg_page_iter_page(&dst_iter));
+				out = 0;
+			}
 
-	cmd = batch_base;
-	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
-		u32 length;
+			k = i;
+			if (k > PAGE_SIZE - out)
+				k = PAGE_SIZE - out;
+			if (k == PAGE_SIZE)
+				copy_page(dst, src);
+			else
+				memcpy(dst + out, src + j, k);
+
+			out += k;
+			j += k;
+			i -= k;
+		} while (i);
+
+		cmd = src + in;
+		page_end = (void *)cmd + this;
+		batch_end = (void *)cmd + batch_len;
+
+		if (partial) {
+			memcpy(tmp + partial, cmd, (length - partial)*4);
+			cmd -= partial;
+			partial = 0;
+			buf = tmp;
+			goto check;
+		}
 
-		if (*cmd == MI_BATCH_BUFFER_END)
-			break;
+		do {
+			if (*cmd == MI_BATCH_BUFFER_END) {
+				ret = 0;
+				goto unmap_src;
+			}
 
-		desc = find_cmd(ring, *cmd, &default_desc);
-		if (!desc) {
-			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-					 *cmd);
-			ret = -EINVAL;
-			break;
-		}
+			desc = find_cmd(ring, *cmd, &default_desc);
+			if (!desc) {
+				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+						 *cmd);
+				goto unmap_src;
+			}
 
-		/*
-		 * If the batch buffer contains a chained batch, return an
-		 * error that tells the caller to abort and dispatch the
-		 * workload as a non-secure batch.
-		 */
-		if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-			ret = -EACCES;
-			break;
-		}
+			/*
+			 * If the batch buffer contains a chained batch, return an
+			 * error that tells the caller to abort and dispatch the
+			 * workload as a non-secure batch.
+			 */
+			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
+				ret = -EACCES;
+				goto unmap_src;
+			}
 
-		if (desc->flags & CMD_DESC_FIXED)
-			length = desc->length.fixed;
-		else
-			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
-		if ((batch_end - cmd) < length) {
-			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
-					 *cmd,
-					 length,
-					 batch_end - cmd);
-			ret = -EINVAL;
-			break;
-		}
+			if (desc->flags & CMD_DESC_FIXED)
+				length = desc->length.fixed;
+			else
+				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+			if (cmd + length > page_end) {
+				if (length + cmd > batch_end) {
+					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
+							 *cmd, length, batch_end - cmd);
+					goto unmap_src;
+				}
+
+				if (WARN_ON(length > sizeof(tmp)/4)) {
+					ret = -ENODEV;
+					goto unmap_src;
+				}
+
+				partial = page_end - cmd;
+				memcpy(tmp, cmd, partial*4);
+				break;
+			}
+
+			buf = cmd;
+check:
+			if (!check_cmd(ring, desc, buf, is_master, &oacontrol_set))
+				goto unmap_src;
 
-		if (!check_cmd(ring, desc, cmd, is_master, &oacontrol_set)) {
-			ret = -EINVAL;
+			cmd += length;
+		} while (cmd < page_end);
+
+		kunmap_atomic(src);
+
+		batch_len -= this;
+		if (batch_len == 0)
 			break;
-		}
 
-		cmd += length;
+		in = 0;
+		continue;
+
+unmap_src:
+		kunmap_atomic(src);
+		goto unmap_dst;
 	}
 
 	if (oacontrol_set) {
@@ -1118,13 +1124,10 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		ret = -EINVAL;
 	}
 
-	if (cmd >= batch_end) {
-		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
-		ret = -EINVAL;
-	}
-
-	vunmap(batch_base);
-
+unmap_dst:
+	kunmap_atomic(dst);
+unpin:
+	i915_gem_object_unpin_pages(batch_obj);
 	return ret;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing
  2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
                   ` (47 preceding siblings ...)
  2015-03-27 11:02 ` [PATCH 48/49] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
@ 2015-03-27 11:02 ` Chris Wilson
  2015-03-28  6:21   ` shuang.he
  48 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:02 UTC (permalink / raw)
  To: intel-gfx

The cmd parser has the biggest impact on the BLT ring, because it is
relatively verbose composed to the other engines as the vertex data is
inline. It also typically has runs of repeating commands (again since
the vertex data is inline, it typically has sequences of XY_SETUP_BLT,
XY_SCANLINE_BLT..) We can easily reduce the impact of cmdparsing on
benchmarks by then caching the last descriptor and comparing it against
the next command header. To get maximum benefit, we also want to take
advantage of skipping a few validations and length determinations if the
header is unchanged between commands.

ivb i7-3720QM:
x11perf -dot: before 52.3M, after 124M (max 203M)
glxgears: before 7310 fps, after 7550 fps (max 7860 fps)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 120 +++++++++++++++------------------
 1 file changed, 54 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 60b30b4165d4..2843bce1b83c 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -754,6 +754,9 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
 	fini_hash_table(ring);
 }
 
+/*
+ * Returns a pointer to a descriptor for the command specified by cmd_header.
+ */
 static const struct drm_i915_cmd_descriptor*
 find_cmd_in_table(struct intel_engine_cs *ring,
 		  u32 cmd_header)
@@ -773,37 +776,6 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	return NULL;
 }
 
-/*
- * Returns a pointer to a descriptor for the command specified by cmd_header.
- *
- * The caller must supply space for a default descriptor via the default_desc
- * parameter. If no descriptor for the specified command exists in the ring's
- * command parser tables, this function fills in default_desc based on the
- * ring's default length encoding and returns default_desc.
- */
-static const struct drm_i915_cmd_descriptor*
-find_cmd(struct intel_engine_cs *ring,
-	 u32 cmd_header,
-	 struct drm_i915_cmd_descriptor *default_desc)
-{
-	const struct drm_i915_cmd_descriptor *desc;
-	u32 mask;
-
-	desc = find_cmd_in_table(ring, cmd_header);
-	if (desc)
-		return desc;
-
-	mask = ring->get_cmd_length_mask(cmd_header);
-	if (!mask)
-		return NULL;
-
-	BUG_ON(!default_desc);
-	default_desc->flags = CMD_DESC_SKIP;
-	default_desc->length.mask = mask;
-
-	return default_desc;
-}
-
 static bool valid_reg(const u32 *table, int count, u32 addr)
 {
 	if (table && count != 0) {
@@ -844,17 +816,6 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 		      const bool is_master,
 		      bool *oacontrol_set)
 {
-	if (desc->flags & CMD_DESC_REJECT) {
-		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
-		return false;
-	}
-
-	if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
-		DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
-				 *cmd);
-		return false;
-	}
-
 	if (desc->flags & CMD_DESC_REGISTER) {
 		u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
 
@@ -953,13 +914,14 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    bool is_master)
 {
 	u32 tmp[128];
+	struct drm_i915_cmd_descriptor default_desc;
 	struct sg_page_iter src_iter, dst_iter;
 	const struct drm_i915_cmd_descriptor *desc;
+	u32 last_cmd_header = 0;
 	int needs_clflush = 0;
 	void *src, *dst;
 	unsigned in, out;
 	u32 *buf, partial = 0, length;
-	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
@@ -1053,33 +1015,59 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		}
 
 		do {
-			if (*cmd == MI_BATCH_BUFFER_END) {
-				ret = 0;
-				goto unmap_src;
-			}
+			if (*cmd != last_cmd_header) {
+				if (*cmd == MI_BATCH_BUFFER_END) {
+					ret = 0;
+					goto unmap_src;
+				}
 
-			desc = find_cmd(ring, *cmd, &default_desc);
-			if (!desc) {
-				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-						 *cmd);
-				goto unmap_src;
-			}
+				desc = find_cmd_in_table(ring, *cmd);
+				if (desc) {
+					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+						goto unmap_src;
+					}
+
+					if (unlikely((desc->flags & CMD_DESC_MASTER) && !is_master)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
+								 *cmd);
+						goto unmap_src;
+					}
+
+					/*
+					 * If the batch buffer contains a
+					 * chained batch, return an error that
+					 * tells the caller to abort and
+					 * dispatch the workload as a
+					 * non-secure batch.
+					 */
+					if (unlikely(desc->cmd.value == MI_BATCH_BUFFER_START)) {
+						ret = -EACCES;
+						goto unmap_src;
+					}
+
+					if (desc->flags & CMD_DESC_FIXED)
+						length = desc->length.fixed;
+					else
+						length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+				} else {
+					u32 mask = ring->get_cmd_length_mask(*cmd);
+					if (unlikely(!mask)) {
+						DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+								 *cmd);
+						goto unmap_src;
+					}
+
+					default_desc.flags = CMD_DESC_SKIP;
+					default_desc.length.mask = mask;
+					desc = &default_desc;
+
+					length = ((*cmd & mask) + LENGTH_BIAS);
+				}
 
-			/*
-			 * If the batch buffer contains a chained batch, return an
-			 * error that tells the caller to abort and dispatch the
-			 * workload as a non-secure batch.
-			 */
-			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-				ret = -EACCES;
-				goto unmap_src;
+				last_cmd_header = *cmd;
 			}
 
-			if (desc->flags & CMD_DESC_FIXED)
-				length = desc->length.fixed;
-			else
-				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
 			if (cmd + length > page_end) {
 				if (length + cmd > batch_end) {
 					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 31/49] drm/i915: Reduce locking in execlist command submission
  2015-03-27 11:02 ` [PATCH 31/49] drm/i915: Reduce locking in execlist command submission Chris Wilson
@ 2015-03-27 11:40   ` Tvrtko Ursulin
  2015-03-27 11:47     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-27 11:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

On 03/27/2015 11:02 AM, Chris Wilson wrote:
> This eliminates six needless spin lock/unlock pairs when writing out
> ELSP.
>
> v2: Respin with my preferred colour.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [v2]
> ---
>   drivers/gpu/drm/i915/i915_drv.h     | 18 ++++++++
>   drivers/gpu/drm/i915/intel_lrc.c    | 14 +++---
>   drivers/gpu/drm/i915/intel_uncore.c | 86 ++++++++++++++++++++++++++-----------
>   3 files changed, 86 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0c6e4356fa06..4b51169c37ea 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2540,6 +2540,13 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
>   				enum forcewake_domains domains);
>   void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
>   				enum forcewake_domains domains);
> +/* Like above but take and hold the uncore lock for the duration.
> + * Must be used with I915_READ_FW and friends.
> + */
> +void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
> +				enum forcewake_domains domains);
> +void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
> +				   enum forcewake_domains domains);

Oh well I don't like your colour. :)

I would make the comment clearer in saying the function itself will take 
the lock and not release it since "take and hold the uncore lock for the 
duration" to me reads ambiguous.

Also, not sure about the _irqlock suffix. It is well established in 
spinlocks and the functions even does the opposite from that!

Maybe _get_and_lock / _put_and_unlock, or other way round?

>   void assert_forcewakes_inactive(struct drm_i915_private *dev_priv);
>   static inline bool intel_vgpu_active(struct drm_device *dev)
>   {
> @@ -3232,6 +3239,17 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
>   #define POSTING_READ(reg)	(void)I915_READ_NOTRACE(reg)
>   #define POSTING_READ16(reg)	(void)I915_READ16_NOTRACE(reg)
>
> +/* These are untraced mmio-accessors that are only valid to be used inside
> + * criticial sections inside IRQ handlers where forcewake is explicitly
> + * controlled.
> + * Think twice, and think again, before using these.
> + * Note: Should only be used between intel_uncore_forcewake_irqlock() and
> + * intel_uncore_forcewake_irqunlock().
> + */
> +#define I915_READ_FW(reg__) readl(dev_priv->regs + (reg__))
> +#define I915_WRITE_FW(reg__, val__) writel(val__, dev_priv->regs + (reg__))
> +#define POSTING_READ_FW(reg__) (void)I915_READ_FW(reg__)
> +
>   /* "Broadcast RGB" property */
>   #define INTEL_BROADCAST_RGB_AUTO 0
>   #define INTEL_BROADCAST_RGB_FULL 1
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5e51ed5232e8..454bb7df27fe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -278,17 +278,17 @@ static void execlists_submit_pair(struct intel_engine_cs *ring)
>   	desc[3] = ring->execlist_port[0]->seqno;
>
>   	/* Note: You must always write both descriptors in the order below. */
> -	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> -	I915_WRITE(RING_ELSP(ring), desc[1]);
> -	I915_WRITE(RING_ELSP(ring), desc[0]);
> -	I915_WRITE(RING_ELSP(ring), desc[3]);
> +	intel_uncore_forcewake_irqlock(dev_priv, FORCEWAKE_ALL);
> +	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
> +	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
> +	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
>
>   	/* The context is automatically loaded after the following */
> -	I915_WRITE(RING_ELSP(ring), desc[2]);
> +	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
>
>   	/* ELSP is a wo register, use another nearby reg for posting instead */
> -	POSTING_READ(RING_EXECLIST_STATUS(ring));
> -	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +	POSTING_READ_FW(RING_EXECLIST_STATUS(ring));
> +	intel_uncore_forcewake_irqunlock(dev_priv, FORCEWAKE_ALL);
>   }
>
>   static void execlists_context_unqueue(struct intel_engine_cs *ring)
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 0e32bbbcada8..a063f7d9f31b 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -399,6 +399,26 @@ void intel_uncore_sanitize(struct drm_device *dev)
>   	intel_disable_gt_powersave(dev);
>   }
>
> +static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
> +					 enum forcewake_domains fw_domains)
> +{
> +	struct intel_uncore_forcewake_domain *domain;
> +	enum forcewake_domain_id id;
> +
> +	if (!dev_priv->uncore.funcs.force_wake_get)
> +		return;
> +
> +	fw_domains &= dev_priv->uncore.fw_domains;
> +
> +	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
> +		if (domain->wake_count++)
> +			fw_domains &= ~(1 << id);
> +	}
> +
> +	if (fw_domains)
> +		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
> +}
> +
>   /**
>    * intel_uncore_forcewake_get - grab forcewake domain references
>    * @dev_priv: i915 device instance
> @@ -416,41 +436,30 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
>   				enum forcewake_domains fw_domains)
>   {
>   	unsigned long irqflags;
> -	struct intel_uncore_forcewake_domain *domain;
> -	enum forcewake_domain_id id;
>
>   	if (!dev_priv->uncore.funcs.force_wake_get)
>   		return;
>
>   	WARN_ON(dev_priv->pm.suspended);
>
> -	fw_domains &= dev_priv->uncore.fw_domains;
> -
>   	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> +	__intel_uncore_forcewake_get(dev_priv, fw_domains);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
> +}
>
> -	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
> -		if (domain->wake_count++)
> -			fw_domains &= ~(1 << id);
> -	}
> -
> -	if (fw_domains)
> -		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
> +void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
> +				    enum forcewake_domains fw_domains)
> +{

And kerneldoc probably if we are finalising this.

> +	if (!dev_priv->uncore.funcs.force_wake_get)
> +		return;
>
> -	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
> +	spin_lock(&dev_priv->uncore.lock);
> +	__intel_uncore_forcewake_get(dev_priv, fw_domains);
>   }

So this, why plain spin_lock? It can only be called from irq context now 
but the comment does not say that and there aren't any assert (if they 
are even possible nowadays).

>
> -/**
> - * intel_uncore_forcewake_put - release a forcewake domain reference
> - * @dev_priv: i915 device instance
> - * @fw_domains: forcewake domains to put references
> - *
> - * This function drops the device-level forcewakes for specified
> - * domains obtained by intel_uncore_forcewake_get().
> - */
> -void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
> -				enum forcewake_domains fw_domains)
> +static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
> +					 enum forcewake_domains fw_domains)
>   {
> -	unsigned long irqflags;
>   	struct intel_uncore_forcewake_domain *domain;
>   	enum forcewake_domain_id id;
>
> @@ -459,8 +468,6 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
>
>   	fw_domains &= dev_priv->uncore.fw_domains;
>
> -	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> -
>   	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
>   		if (WARN_ON(domain->wake_count == 0))
>   			continue;
> @@ -471,10 +478,39 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
>   		domain->wake_count++;
>   		fw_domain_arm_timer(domain);
>   	}
> +}
> +
> +/**
> + * intel_uncore_forcewake_put - release a forcewake domain reference
> + * @dev_priv: i915 device instance
> + * @fw_domains: forcewake domains to put references
> + *
> + * This function drops the device-level forcewakes for specified
> + * domains obtained by intel_uncore_forcewake_get().
> + */
> +void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
> +				enum forcewake_domains fw_domains)
> +{
> +	unsigned long irqflags;
> +
> +	if (!dev_priv->uncore.funcs.force_wake_put)
> +		return;
>
> +	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> +	__intel_uncore_forcewake_put(dev_priv, fw_domains);
>   	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
>   }
>
> +void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
> +				      enum forcewake_domains fw_domains)
> +{
> +	if (!dev_priv->uncore.funcs.force_wake_put)
> +		return;
> +
> +	__intel_uncore_forcewake_put(dev_priv, fw_domains);
> +	spin_unlock(&dev_priv->uncore.lock);
> +}
> +
>   void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
>   {
>   	struct intel_uncore_forcewake_domain *domain;
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/49] drm/i915: Optimistically spin for the request completion
  2015-03-27 11:01 ` [PATCH 16/49] drm/i915: Optimistically spin for the request completion Chris Wilson
@ 2015-03-27 11:42   ` Tvrtko Ursulin
  0 siblings, 0 replies; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-27 11:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, Rantala, Valtteri


On 03/27/2015 11:01 AM, Chris Wilson wrote:
> This provides a nice boost to mesa in swap bound scenarios (as mesa
> throttles itself to the previous frame and given the scenario that will
> complete shortly). It will also provide a good boost to systems running
> with semaphores disabled and so frequently waiting on the GPU as it
> switches rings. In the most favourable of microbenchmarks, this can
> increase performance by around 15% - though in practice improvements
> will be marginal and rarely noticeable.
>
> v2: Account for user timeouts
> v3: Limit the spinning to a single jiffie (~1us) at most. On an
> otherwise idle system, there is no scheduler contention and so without a
> limit we would spin until the GPU is ready.
> v4: Drop forcewake - the lazy coherent access doesn't require it, and we
> have no reason to believe that the forcewake itself improves seqno
> coherency - it only adds delay.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>

I gave my r-b for v4 already, here it is again:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 31/49] drm/i915: Reduce locking in execlist command submission
  2015-03-27 11:40   ` Tvrtko Ursulin
@ 2015-03-27 11:47     ` Chris Wilson
  2015-03-27 11:54       ` Tvrtko Ursulin
  2015-03-27 14:15       ` Daniel Vetter
  0 siblings, 2 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 11:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:40:12AM +0000, Tvrtko Ursulin wrote:
> 
> >+/* Like above but take and hold the uncore lock for the duration.
> >+ * Must be used with I915_READ_FW and friends.
> >+ */
> >+void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
> >+				enum forcewake_domains domains);
> >+void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
> >+				   enum forcewake_domains domains);
> 
> Oh well I don't like your colour. :)
> 
> I would make the comment clearer in saying the function itself will
> take the lock and not release it since "take and hold the uncore
> lock for the duration" to me reads ambiguous.

"duration of the critical section".

> Also, not sure about the _irqlock suffix. It is well established in
> spinlocks and the functions even does the opposite from that!
> 
> Maybe _get_and_lock / _put_and_unlock, or other way round?

How about _irq_get for the reasons that I don't this to be widely used
elsewhere. We are trading off debugging for performance, that's only
really justifiable inside irqs or busy-waits (and for busy-waits we
already have the notrace variant).

Actually _get_irq/_put_irq.
 
> So this, why plain spin_lock? It can only be called from irq context
> now but the comment does not say that and there aren't any assert
> (if they are even possible nowadays).

Exactly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 31/49] drm/i915: Reduce locking in execlist command submission
  2015-03-27 11:47     ` Chris Wilson
@ 2015-03-27 11:54       ` Tvrtko Ursulin
  2015-03-27 14:15       ` Daniel Vetter
  1 sibling, 0 replies; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-27 11:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/27/2015 11:47 AM, Chris Wilson wrote:
> On Fri, Mar 27, 2015 at 11:40:12AM +0000, Tvrtko Ursulin wrote:
>>
>>> +/* Like above but take and hold the uncore lock for the duration.
>>> + * Must be used with I915_READ_FW and friends.
>>> + */
>>> +void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
>>> +				enum forcewake_domains domains);
>>> +void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
>>> +				   enum forcewake_domains domains);
>>
>> Oh well I don't like your colour. :)
>>
>> I would make the comment clearer in saying the function itself will
>> take the lock and not release it since "take and hold the uncore
>> lock for the duration" to me reads ambiguous.
>
> "duration of the critical section".

I would like it to be explicit function takes the lock and not that the 
caller has to, for the duration of the critical section. Maybe it is my 
non-native English but from "take and hold the uncore lock for the 
duration.." I am not sure which one of the two it is.

>> Also, not sure about the _irqlock suffix. It is well established in
>> spinlocks and the functions even does the opposite from that!
>>
>> Maybe _get_and_lock / _put_and_unlock, or other way round?
>
> How about _irq_get for the reasons that I don't this to be widely used
> elsewhere. We are trading off debugging for performance, that's only
> really justifiable inside irqs or busy-waits (and for busy-waits we
> already have the notrace variant).
>
> Actually _get_irq/_put_irq.

OK, but comment needs to say that in my opinion.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/49] drm/i915: Tidy batch pool logic
  2015-03-27 11:01 ` [PATCH 10/49] drm/i915: Tidy batch pool logic Chris Wilson
@ 2015-03-27 11:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-27 11:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

On 03/27/2015 11:01 AM, Chris Wilson wrote:
> Move the madvise logic out of the execbuffer main path into the
> relatively rare allocation path, making the execbuffer manipulation less
> fragile.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler
  2015-03-27 11:02 ` [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
@ 2015-03-27 14:13   ` Daniel Vetter
  2015-03-27 14:14     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:02:05AM +0000, Chris Wilson wrote:
> Similar in vain in reducing the number of unrequired spinlocks used for
> execlist command submission (where the forcewake is required but
> manually controlled), we know that the IRQ registers are outside of the
> powerwell and so we can access them directly. Since we now have direct
> access exported via I915_READ_FW/I915_WRITE_FW, lets put those to use in
> the irq handlers as well.
> 
> In the process, reorder the execlist submission to happen as early as
> possible.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

port/pipe interrupts are in display power domains afaik, so I'd prefer not
to lose pm debug with those. But they're also gated behind the master_ctl
bits, so can we have all of the speedups still without touching those?
Same for the pch, just in case.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 63 ++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 8b5e0358c592..da3b76b9ebd9 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1285,56 +1285,56 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  	irqreturn_t ret = IRQ_NONE;
>  
>  	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
> -		tmp = I915_READ(GEN8_GT_IIR(0));
> +		tmp = I915_READ_FW(GEN8_GT_IIR(0));
>  		if (tmp) {
> -			I915_WRITE(GEN8_GT_IIR(0), tmp);
> +			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
>  			ret = IRQ_HANDLED;
>  
>  			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
>  			ring = &dev_priv->ring[RCS];
> -			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
>  			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
>  				intel_lrc_irq_handler(ring);
> +			if (rcs & GT_RENDER_USER_INTERRUPT)
> +				notify_ring(dev, ring);
>  
>  			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
>  			ring = &dev_priv->ring[BCS];
> -			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
>  			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
>  				intel_lrc_irq_handler(ring);
> +			if (bcs & GT_RENDER_USER_INTERRUPT)
> +				notify_ring(dev, ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
>  
>  	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
> -		tmp = I915_READ(GEN8_GT_IIR(1));
> +		tmp = I915_READ_FW(GEN8_GT_IIR(1));
>  		if (tmp) {
> -			I915_WRITE(GEN8_GT_IIR(1), tmp);
> +			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
>  			ret = IRQ_HANDLED;
>  
>  			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
>  			ring = &dev_priv->ring[VCS];
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
>  			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
>  				intel_lrc_irq_handler(ring);
> +			if (vcs & GT_RENDER_USER_INTERRUPT)
> +				notify_ring(dev, ring);
>  
>  			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
>  			ring = &dev_priv->ring[VCS2];
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
>  			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
>  				intel_lrc_irq_handler(ring);
> +			if (vcs & GT_RENDER_USER_INTERRUPT)
> +				notify_ring(dev, ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
>  
>  	if (master_ctl & GEN8_GT_PM_IRQ) {
> -		tmp = I915_READ(GEN8_GT_IIR(2));
> +		tmp = I915_READ_FW(GEN8_GT_IIR(2));
>  		if (tmp & dev_priv->pm_rps_events) {
> -			I915_WRITE(GEN8_GT_IIR(2),
> -				   tmp & dev_priv->pm_rps_events);
> +			I915_WRITE_FW(GEN8_GT_IIR(2),
> +				      tmp & dev_priv->pm_rps_events);
>  			ret = IRQ_HANDLED;
>  			gen6_rps_irq_handler(dev_priv, tmp);
>  		} else
> @@ -1342,17 +1342,17 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  	}
>  
>  	if (master_ctl & GEN8_GT_VECS_IRQ) {
> -		tmp = I915_READ(GEN8_GT_IIR(3));
> +		tmp = I915_READ_FW(GEN8_GT_IIR(3));
>  		if (tmp) {
> -			I915_WRITE(GEN8_GT_IIR(3), tmp);
> +			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
>  			ret = IRQ_HANDLED;
>  
>  			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
>  			ring = &dev_priv->ring[VECS];
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
>  			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
>  				intel_lrc_irq_handler(ring);
> +			if (vcs & GT_RENDER_USER_INTERRUPT)
> +				notify_ring(dev, ring);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}
> @@ -2178,22 +2178,21 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  		aux_mask |=  GEN9_AUX_CHANNEL_B | GEN9_AUX_CHANNEL_C |
>  			GEN9_AUX_CHANNEL_D;
>  
> -	master_ctl = I915_READ(GEN8_MASTER_IRQ);
> +	master_ctl = I915_READ_FW(GEN8_MASTER_IRQ);
>  	master_ctl &= ~GEN8_MASTER_IRQ_CONTROL;
>  	if (!master_ctl)
>  		return IRQ_NONE;
>  
> -	I915_WRITE(GEN8_MASTER_IRQ, 0);
> -	POSTING_READ(GEN8_MASTER_IRQ);
> +	I915_WRITE_FW(GEN8_MASTER_IRQ, 0);
>  
>  	/* Find, clear, then process each source of interrupt */
>  
>  	ret = gen8_gt_irq_handler(dev, dev_priv, master_ctl);
>  
>  	if (master_ctl & GEN8_DE_MISC_IRQ) {
> -		tmp = I915_READ(GEN8_DE_MISC_IIR);
> +		tmp = I915_READ_FW(GEN8_DE_MISC_IIR);
>  		if (tmp) {
> -			I915_WRITE(GEN8_DE_MISC_IIR, tmp);
> +			I915_WRITE_FW(GEN8_DE_MISC_IIR, tmp);
>  			ret = IRQ_HANDLED;
>  			if (tmp & GEN8_DE_MISC_GSE)
>  				intel_opregion_asle_intr(dev);
> @@ -2205,9 +2204,9 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  	}
>  
>  	if (master_ctl & GEN8_DE_PORT_IRQ) {
> -		tmp = I915_READ(GEN8_DE_PORT_IIR);
> +		tmp = I915_READ_FW(GEN8_DE_PORT_IIR);
>  		if (tmp) {
> -			I915_WRITE(GEN8_DE_PORT_IIR, tmp);
> +			I915_WRITE_FW(GEN8_DE_PORT_IIR, tmp);
>  			ret = IRQ_HANDLED;
>  
>  			if (tmp & aux_mask)
> @@ -2225,10 +2224,10 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  		if (!(master_ctl & GEN8_DE_PIPE_IRQ(pipe)))
>  			continue;
>  
> -		pipe_iir = I915_READ(GEN8_DE_PIPE_IIR(pipe));
> +		pipe_iir = I915_READ_FW(GEN8_DE_PIPE_IIR(pipe));
>  		if (pipe_iir) {
>  			ret = IRQ_HANDLED;
> -			I915_WRITE(GEN8_DE_PIPE_IIR(pipe), pipe_iir);
> +			I915_WRITE_FW(GEN8_DE_PIPE_IIR(pipe), pipe_iir);
>  
>  			if (pipe_iir & GEN8_PIPE_VBLANK &&
>  			    intel_pipe_handle_vblank(dev, pipe))
> @@ -2271,9 +2270,9 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  		 * scheme also closed the SDE interrupt handling race we've seen
>  		 * on older pch-split platforms. But this needs testing.
>  		 */
> -		u32 pch_iir = I915_READ(SDEIIR);
> +		u32 pch_iir = I915_READ_FW(SDEIIR);
>  		if (pch_iir) {
> -			I915_WRITE(SDEIIR, pch_iir);
> +			I915_WRITE_FW(SDEIIR, pch_iir);
>  			ret = IRQ_HANDLED;
>  			cpt_irq_handler(dev, pch_iir);
>  		} else
> @@ -2281,8 +2280,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  
>  	}
>  
> -	I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
> -	POSTING_READ(GEN8_MASTER_IRQ);
> +	I915_WRITE_FW(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
> +	POSTING_READ_FW(GEN8_MASTER_IRQ);
>  
>  	return ret;
>  }
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler
  2015-03-27 14:13   ` Daniel Vetter
@ 2015-03-27 14:14     ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 14:14 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 03:13:08PM +0100, Daniel Vetter wrote:
> On Fri, Mar 27, 2015 at 11:02:05AM +0000, Chris Wilson wrote:
> > Similar in vain in reducing the number of unrequired spinlocks used for
> > execlist command submission (where the forcewake is required but
> > manually controlled), we know that the IRQ registers are outside of the
> > powerwell and so we can access them directly. Since we now have direct
> > access exported via I915_READ_FW/I915_WRITE_FW, lets put those to use in
> > the irq handlers as well.
> > 
> > In the process, reorder the execlist submission to happen as early as
> > possible.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> port/pipe interrupts are in display power domains afaik, so I'd prefer not
> to lose pm debug with those. But they're also gated behind the master_ctl
> bits, so can we have all of the speedups still without touching those?

Sure, execlists triggers gen8_gt_irq_handler() a lot, so we can just
limit the use of raw reads/writes there. Probably safer in the long run
as well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 31/49] drm/i915: Reduce locking in execlist command submission
  2015-03-27 11:47     ` Chris Wilson
  2015-03-27 11:54       ` Tvrtko Ursulin
@ 2015-03-27 14:15       ` Daniel Vetter
  1 sibling, 0 replies; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:15 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx

On Fri, Mar 27, 2015 at 11:47:12AM +0000, Chris Wilson wrote:
> On Fri, Mar 27, 2015 at 11:40:12AM +0000, Tvrtko Ursulin wrote:
> > 
> > >+/* Like above but take and hold the uncore lock for the duration.
> > >+ * Must be used with I915_READ_FW and friends.
> > >+ */
> > >+void intel_uncore_forcewake_irqlock(struct drm_i915_private *dev_priv,
> > >+				enum forcewake_domains domains);
> > >+void intel_uncore_forcewake_irqunlock(struct drm_i915_private *dev_priv,
> > >+				   enum forcewake_domains domains);
> > 
> > Oh well I don't like your colour. :)
> > 
> > I would make the comment clearer in saying the function itself will
> > take the lock and not release it since "take and hold the uncore
> > lock for the duration" to me reads ambiguous.
> 
> "duration of the critical section".
> 
> > Also, not sure about the _irqlock suffix. It is well established in
> > spinlocks and the functions even does the opposite from that!
> > 
> > Maybe _get_and_lock / _put_and_unlock, or other way round?
> 
> How about _irq_get for the reasons that I don't this to be widely used
> elsewhere. We are trading off debugging for performance, that's only
> really justifiable inside irqs or busy-waits (and for busy-waits we
> already have the notrace variant).
> 
> Actually _get_irq/_put_irq.

If we bikeshed this, what about forcewake_get/put_locked and making the
lock acquisition explicit in the callers? spin_lock_irq is already a big
red flag asking for close scrutinity, not hiding would be a feature.
Especially if we're concerned with usage creep of these optimized
functions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
  2015-03-27 11:01 ` [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
@ 2015-03-27 14:19   ` Daniel Vetter
  2015-03-27 14:25     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:01:53AM +0000, Chris Wilson wrote:
> When we submit a request to the GPU, we first take the rpm wakelock, and
> only release it once the GPU has been idle for a small period of time
> after all requests have been complete. This means that we are sure no
> new interrupt can arrive whilst we do not hold the rpm wakelock and so
> can drop the individual get/put around every single request inside
> execlists.
> 
> Note: to close one potential issue we should mark the GPU as busy
> earlier in __i915_add_request.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

With the create/add_request rework from John, should we do the idle->busy
check in the request alloc function, together with latching the worker?
Not perfect if the execbuf doesn't go through, but leaves no races and
userspace better submit valid execbufs anyway if it expects performance.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c  | 1 -
>  drivers/gpu/drm/i915/intel_lrc.c | 3 ---
>  2 files changed, 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7e6f2560bf35..4ec195a63d60 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2646,7 +2646,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>  				struct drm_i915_gem_request,
>  				execlist_link);
>  		list_del(&submit_req->execlist_link);
> -		intel_runtime_pm_put(dev_priv);
>  
>  		if (submit_req->ctx != ring->default_context)
>  			intel_lr_context_unpin(ring, submit_req->ctx);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9b7824ac35dc..2ed1cf448c6f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -525,8 +525,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
>  	}
>  	request->tail = tail;
>  
> -	intel_runtime_pm_get(dev_priv);
> -
>  	spin_lock_irq(&ring->execlist_lock);
>  
>  	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> @@ -740,7 +738,6 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
>  
>  		if (ctx_obj && (ctx != ring->default_context))
>  			intel_lr_context_unpin(ring, ctx);
> -		intel_runtime_pm_put(dev_priv);
>  		list_del(&req->execlist_link);
>  		i915_gem_request_unreference(req);
>  	}
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 27/49] drm/i915: Use a separate slab for requests
  2015-03-27 11:01 ` [PATCH 27/49] drm/i915: Use a separate slab for requests Chris Wilson
@ 2015-03-27 14:20   ` Daniel Vetter
  2015-03-27 14:27     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:01:59AM +0000, Chris Wilson wrote:
> requests are even more frequently allocated than objects and equally
> benefit from having a dedicated slab.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

vmas, while we're at it?
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_dma.c         | 12 ++++++++----
>  drivers/gpu/drm/i915/i915_drv.h         |  5 ++++-
>  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_lrc.c        | 11 +++++------
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++-----
>  5 files changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 8f5428b46a27..180b5d92b279 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1006,8 +1006,10 @@ out_regs:
>  put_bridge:
>  	pci_dev_put(dev_priv->bridge_dev);
>  free_priv:
> -	if (dev_priv->slab)
> -		kmem_cache_destroy(dev_priv->slab);
> +	if (dev_priv->requests)
> +		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->objects)
> +		kmem_cache_destroy(dev_priv->objects);
>  	kfree(dev_priv);
>  	return ret;
>  }
> @@ -1090,8 +1092,10 @@ int i915_driver_unload(struct drm_device *dev)
>  	if (dev_priv->regs != NULL)
>  		pci_iounmap(dev->pdev, dev_priv->regs);
>  
> -	if (dev_priv->slab)
> -		kmem_cache_destroy(dev_priv->slab);
> +	if (dev_priv->requests)
> +		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->objects)
> +		kmem_cache_destroy(dev_priv->objects);
>  
>  	pci_dev_put(dev_priv->bridge_dev);
>  	kfree(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ee51540e169a..b728250d6550 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1556,7 +1556,8 @@ struct i915_virtual_gpu {
>  
>  struct drm_i915_private {
>  	struct drm_device *dev;
> -	struct kmem_cache *slab;
> +	struct kmem_cache *objects;
> +	struct kmem_cache *requests;
>  
>  	const struct intel_device_info info;
>  
> @@ -2052,6 +2053,7 @@ struct drm_i915_gem_request {
>  	struct kref ref;
>  
>  	/** On Which ring this request was generated */
> +	struct drm_i915_private *i915;
>  	struct intel_engine_cs *ring;
>  
>  	/** GEM sequence number associated with this request. */
> @@ -2118,6 +2120,7 @@ struct drm_i915_gem_request {
>  	struct list_head execlist_link;
>  };
>  
> +struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i915);
>  void i915_gem_request_free(struct kref *req_ref);
>  
>  static inline uint32_t
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5366162e4983..900cbe17c49a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -381,13 +381,13 @@ out:
>  void *i915_gem_object_alloc(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	return kmem_cache_zalloc(dev_priv->slab, GFP_KERNEL);
> +	return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
>  }
>  
>  void i915_gem_object_free(struct drm_i915_gem_object *obj)
>  {
>  	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> -	kmem_cache_free(dev_priv->slab, obj);
> +	kmem_cache_free(dev_priv->objects, obj);
>  }
>  
>  static int
> @@ -2567,6 +2567,19 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>  	}
>  }
>  
> +struct drm_i915_gem_request *i915_gem_request_alloc(struct drm_i915_private *i915)
> +{
> +	struct drm_i915_gem_request *rq;
> +
> +	rq = kmem_cache_zalloc(i915->requests, GFP_KERNEL);
> +	if (rq == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&rq->ref);
> +	rq->i915 = i915;
> +	return rq;
> +}
> +
>  void i915_gem_request_free(struct kref *req_ref)
>  {
>  	struct drm_i915_gem_request *req = container_of(req_ref,
> @@ -2577,7 +2590,7 @@ void i915_gem_request_free(struct kref *req_ref)
>  		i915_gem_context_unreference(ctx);
>  	}
>  
> -	kfree(req);
> +	kmem_cache_free(req->i915->requests, req);
>  }
>  
>  struct drm_i915_gem_request *
> @@ -5110,11 +5123,16 @@ i915_gem_load(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int i;
>  
> -	dev_priv->slab =
> +	dev_priv->objects =
>  		kmem_cache_create("i915_gem_object",
>  				  sizeof(struct drm_i915_gem_object), 0,
>  				  SLAB_HWCACHE_ALIGN,
>  				  NULL);
> +	dev_priv->requests =
> +		kmem_cache_create("i915_gem_request",
> +				  sizeof(struct drm_i915_gem_request), 0,
> +				  SLAB_HWCACHE_ALIGN,
> +				  NULL);
>  
>  	INIT_LIST_HEAD(&dev_priv->vm_list);
>  	i915_init_vm(dev_priv, &dev_priv->gtt.base);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a013239f5e26..5e51ed5232e8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -742,24 +742,23 @@ static int logical_ring_alloc_request(struct intel_engine_cs *ring,
>  	if (ring->outstanding_lazy_request)
>  		return 0;
>  
> -	request = kzalloc(sizeof(*request), GFP_KERNEL);
> -	if (request == NULL)
> -		return -ENOMEM;
> +	request = i915_gem_request_alloc(dev_private);
> +	if (IS_ERR(request))
> +		return PTR_ERR(request);
>  
>  	ret = intel_lr_context_pin(ring, ctx);
>  	if (ret) {
> -		kfree(request);
> +		i915_gem_request_free(&request->ref);
>  		return ret;
>  	}
>  
> -	kref_init(&request->ref);
>  	request->ring = ring;
>  	request->uniq = dev_private->request_uniq++;
>  
>  	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
>  	if (ret) {
>  		intel_lr_context_unpin(ring, ctx);
> -		kfree(request);
> +		i915_gem_request_free(&request->ref);
>  		return ret;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2e5c39123d24..f7097a80dea3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2179,18 +2179,17 @@ intel_ring_alloc_request(struct intel_engine_cs *ring)
>  	if (ring->outstanding_lazy_request)
>  		return 0;
>  
> -	request = kzalloc(sizeof(*request), GFP_KERNEL);
> -	if (request == NULL)
> -		return -ENOMEM;
> +	request = i915_gem_request_alloc(dev_private);
> +	if (IS_ERR(request))
> +		return PTR_ERR(request);
>  
> -	kref_init(&request->ref);
>  	request->ring = ring;
>  	request->ringbuf = ring->buffer;
>  	request->uniq = dev_private->request_uniq++;
>  
>  	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
>  	if (ret) {
> -		kfree(request);
> +		i915_gem_request_free(&request->ref);
>  		return ret;
>  	}
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
  2015-03-27 14:19   ` Daniel Vetter
@ 2015-03-27 14:25     ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 14:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 03:19:20PM +0100, Daniel Vetter wrote:
> On Fri, Mar 27, 2015 at 11:01:53AM +0000, Chris Wilson wrote:
> > When we submit a request to the GPU, we first take the rpm wakelock, and
> > only release it once the GPU has been idle for a small period of time
> > after all requests have been complete. This means that we are sure no
> > new interrupt can arrive whilst we do not hold the rpm wakelock and so
> > can drop the individual get/put around every single request inside
> > execlists.
> > 
> > Note: to close one potential issue we should mark the GPU as busy
> > earlier in __i915_add_request.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> With the create/add_request rework from John, should we do the idle->busy
> check in the request alloc function, together with latching the worker?
> Not perfect if the execbuf doesn't go through, but leaves no races and
> userspace better submit valid execbufs anyway if it expects performance.

I think it should be done in i915_request_commit(). The argument is that
that are quite a few places where we may start building a request to
decide that it is a no-op (e.g. fixing i915_gpu_idle() for execlists).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2015-03-27 11:02 ` [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
@ 2015-03-27 14:26   ` Daniel Vetter
  0 siblings, 0 replies; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:02:01AM +0000, Chris Wilson wrote:
> The multiple levels of indirect do nothing but hinder the compiler and
> the pointer chasing turns to be quite painful but painless to fix.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Hm I did a quick git grep when this came up a few days back on irc, and
just removing the checks from the execbuf hotpaths looked like the simpler
approach. We don't need them, since NORMAL_VIEW == 0 is intentional.

There's still some left afterwards ofc, but no check beats even a fast
check ;-)
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h     | 4 +---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 2 ++
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b728250d6550..209c9b612509 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2880,9 +2880,7 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
>  	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
>  static inline bool i915_is_ggtt(struct i915_address_space *vm)
>  {
> -	struct i915_address_space *ggtt =
> -		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
> -	return vm == ggtt;
> +	return vm->is_ggtt;
>  }
>  
>  static inline struct i915_hw_ppgtt *
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e33b1214c4d8..68c1f49f2864 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2504,6 +2504,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
>  		return ret;
>  
>  	gtt->base.dev = dev;
> +	gtt->base.is_ggtt = true;
>  
>  	/* GMADR is the PCI mmio aperture into the global GTT. */
>  	DRM_INFO("Memory usable by graphics device = %zdM\n",
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 3f0ad9f25441..20398a18a8a6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -234,6 +234,8 @@ struct i915_address_space {
>  	unsigned long start;		/* Start offset always 0 for dri2 */
>  	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
>  
> +	bool is_ggtt;
> +
>  	struct {
>  		dma_addr_t addr;
>  		struct page *page;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 27/49] drm/i915: Use a separate slab for requests
  2015-03-27 14:20   ` Daniel Vetter
@ 2015-03-27 14:27     ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 14:27 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 03:20:08PM +0100, Daniel Vetter wrote:
> On Fri, Mar 27, 2015 at 11:01:59AM +0000, Chris Wilson wrote:
> > requests are even more frequently allocated than objects and equally
> > benefit from having a dedicated slab.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> vmas, while we're at it?

As a second patch probably. The test cases I was looking at stressed
requests rather than ctx. Though if we get a lightweight full-ppgtt,
there are a few synmark that would definitely benefit from a vmas slab.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches
  2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
@ 2015-03-27 14:28   ` Daniel Vetter
  2015-03-30 12:02   ` Chris Wilson
  1 sibling, 0 replies; 80+ messages in thread
From: Daniel Vetter @ 2015-03-27 14:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 27, 2015 at 11:02:10AM +0000, Chris Wilson wrote:
> Since
> 
> commit 17cabf571e50677d980e9ab2a43c5f11213003ae
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Jan 14 11:20:57 2015 +0000
> 
>     drm/i915: Trim the command parser allocations
> 
> we may then try to allocate a zero-sized object and attempt to extract
> its pages. Understandably this fails.
> 
> Testcase: igt/gem_exec_nop #ivb,byt,hsw
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 403450f4e4ee..19c5fc6ae1e0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1561,7 +1561,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		goto err;
>  	}
>  
> -	if (i915_needs_cmd_parser(ring)) {
> +	if (i915_needs_cmd_parser(ring) && args->batch_len) {
>  		batch_obj = i915_gem_execbuffer_parse(ring,
>  						      &shadow_exec_entry,
>  						      eb,
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging
  2015-03-27 11:01 ` [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
@ 2015-03-27 15:34   ` Paulo Zanoni
  2015-03-27 16:12     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Paulo Zanoni @ 2015-03-27 15:34 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Intel Graphics Development, Paulo Zanoni, Mika Kuoppala

2015-03-27 8:01 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
> Delay the expensive read on the FPGA_DBG register from once per mmio to
> once per forcewake section when we are doing the general wellbeing
> check rather than the targetted error detection. This almost reduces
> the overhead of the debug facility (for example when submitting execlists)
> to zero whilst keeping the debug checks around.
>
> v2: Enable one-shot mmio debugging from the interrupt check as well as a
>     safeguard to catch invalid display writes from outside the powerwell.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
>  1 file changed, 30 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index ab5cc94588e1..0e32bbbcada8 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
>  }
>
>  static void
> +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
> +{
> +       static bool mmio_debug_once = true;
> +
> +       if (i915.mmio_debug || !mmio_debug_once)
> +               return;
> +
> +       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> +               DRM_DEBUG("Unclaimed register detected, "
> +                         "enabling oneshot unclaimed register reporting. "
> +                         "Please use i915.mmio_debug=N for more information.\n");
> +               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
> +               i915.mmio_debug = mmio_debug_once--;
> +       }
> +}
> +
> +static void
> +fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
> +{
> +       hsw_unclaimed_reg_detect(dev_priv);
> +       fw_domains_put(dev_priv, fw_domains);
> +}

This means we won't check during the forcewake puts that are on the
register read/write macros. Is this intentional? I tried checking the
FW code calls, and it seems to me that we're not really going to run
hsw_unclaimed_reg_detect very frequently anymore. I wonder if there's
the risk of staying a long time without running it. But maybe I'm just
wrong.


> +
> +static void
>  fw_domains_posting_read(struct drm_i915_private *dev_priv)
>  {
>         struct intel_uncore_forcewake_domain *d;
> @@ -561,23 +585,6 @@ hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, bool read,
>         }
>  }
>
> -static void
> -hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
> -{
> -       static bool mmio_debug_once = true;
> -
> -       if (i915.mmio_debug || !mmio_debug_once)
> -               return;
> -
> -       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> -               DRM_DEBUG("Unclaimed register detected, "
> -                         "enabling oneshot unclaimed register reporting. "
> -                         "Please use i915.mmio_debug=N for more information.\n");
> -               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
> -               i915.mmio_debug = mmio_debug_once--;
> -       }
> -}
> -
>  #define GEN2_READ_HEADER(x) \
>         u##x val = 0; \
>         assert_device_not_suspended(dev_priv);
> @@ -829,7 +836,6 @@ hsw_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
>                 gen6_gt_check_fifodbg(dev_priv); \
>         } \
>         hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
> -       hsw_unclaimed_reg_detect(dev_priv); \
>         GEN6_WRITE_FOOTER; \
>  }
>
> @@ -871,7 +877,6 @@ gen8_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace
>                 __force_wake_get(dev_priv, FORCEWAKE_RENDER); \
>         __raw_i915_write##x(dev_priv, reg, val); \
>         hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
> -       hsw_unclaimed_reg_detect(dev_priv); \
>         GEN6_WRITE_FOOTER; \
>  }
>
> @@ -1120,6 +1125,10 @@ static void intel_uncore_fw_domains_init(struct drm_device *dev)
>                                FORCEWAKE, FORCEWAKE_ACK);
>         }
>
> +       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
> +           dev_priv->uncore.funcs.force_wake_put == fw_domains_put)

My fear here is that simple changes to the FW code by a future
programmer could unintentionally kill the unclaimed register detection
feature, and we probably wouldn't notice for a looong time. Why not
just omit this fw_domains_put check, since it is true for all
platforms where HAS_FPG_DBG_UNCLAIMED is also true? The side effect of
calling fw_domains_put() when we shouldn't is probably more noticeable
than having unclaimed register detection gone.


> +               dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
> +
>         /* All future platforms are expected to require complex power gating */
>         WARN_ON(dev_priv->uncore.fw_domains == 0);
>  }
> @@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
>
>  void intel_uncore_check_errors(struct drm_device *dev)
>  {
> -       struct drm_i915_private *dev_priv = dev->dev_private;
> -
> -       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
> -           (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
> -               DRM_ERROR("Unclaimed register before interrupt\n");
> -               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
> -       }
> +       if (HAS_FPGA_DBG_UNCLAIMED(dev))
> +               hsw_unclaimed_reg_detect(to_i915(dev));

This means we won't check for unclaimed registers during interrupts if
i915.mmio_debug is being used, which is probably not what we want.


One of the things that worries me a little is that now we'll be
running a mostly-display-related feature only when certain pieces of
non-display-related code run. Instead of doing the checks at the
forcewake puts, maybe we could tie the frequency of the checks to
something in the display code, or just do the check every X seconds. I
don't really know what would be ideal here, I'm just throwing the
ideas. I'm also not blocking this patch, just pointing things that
could maybe be improved.

Since it's impacting performance, perhaps we could even completely
kill unclaimed register detection from the normal use case, hiding it
behind i915.mmio_debug=1 (and maybe a kconfig option)? We would then
instruct QA and developers to always have the option enabled. Just
like QA needs to have lockdep enabled, we could ask it to have
mmio_debug enabled all the time too.

>  }
> --
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging
  2015-03-27 15:34   ` Paulo Zanoni
@ 2015-03-27 16:12     ` Chris Wilson
  2015-03-30 19:15       ` Paulo Zanoni
  0 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-27 16:12 UTC (permalink / raw)
  To: Paulo Zanoni
  Cc: Daniel Vetter, Intel Graphics Development, Paulo Zanoni, Mika Kuoppala

On Fri, Mar 27, 2015 at 12:34:05PM -0300, Paulo Zanoni wrote:
> 2015-03-27 8:01 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
> > Delay the expensive read on the FPGA_DBG register from once per mmio to
> > once per forcewake section when we are doing the general wellbeing
> > check rather than the targetted error detection. This almost reduces
> > the overhead of the debug facility (for example when submitting execlists)
> > to zero whilst keeping the debug checks around.
> >
> > v2: Enable one-shot mmio debugging from the interrupt check as well as a
> >     safeguard to catch invalid display writes from outside the powerwell.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
> >  1 file changed, 30 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index ab5cc94588e1..0e32bbbcada8 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
> >  }
> >
> >  static void
> > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
> > +{
> > +       static bool mmio_debug_once = true;
> > +
> > +       if (i915.mmio_debug || !mmio_debug_once)
> > +               return;
> > +
> > +       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> > +               DRM_DEBUG("Unclaimed register detected, "
> > +                         "enabling oneshot unclaimed register reporting. "
> > +                         "Please use i915.mmio_debug=N for more information.\n");
> > +               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
> > +               i915.mmio_debug = mmio_debug_once--;
> > +       }
> > +}
> > +
> > +static void
> > +fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
> > +{
> > +       hsw_unclaimed_reg_detect(dev_priv);
> > +       fw_domains_put(dev_priv, fw_domains);
> > +}
> 
> This means we won't check during the forcewake puts that are on the
> register read/write macros. Is this intentional?

Not really. But the check still catches any mistakes there even though
we know they are by safe.

> I tried checking the
> FW code calls, and it seems to me that we're not really going to run
> hsw_unclaimed_reg_detect very frequently anymore. I wonder if there's
> the risk of staying a long time without running it. But maybe I'm just
> wrong.

It gets run after every set of register writes (where set is defined as
activity on a single cpu within 1ms). It gets run before the powerwell
is disabled. Look at the profiles, you will see that hsw detect is still
called quite frequently. And by virtue it does not need to be run very
often to catch issues anyway.

> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
> > +           dev_priv->uncore.funcs.force_wake_put == fw_domains_put)
> 
> My fear here is that simple changes to the FW code by a future
> programmer could unintentionally kill the unclaimed register detection
> feature, and we probably wouldn't notice for a looong time. Why not
> just omit this fw_domains_put check, since it is true for all
> platforms where HAS_FPG_DBG_UNCLAIMED is also true? The side effect of
> calling fw_domains_put() when we shouldn't is probably more noticeable
> than having unclaimed register detection gone.

Pardon?

> > +               dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
> > +
> >         /* All future platforms are expected to require complex power gating */
> >         WARN_ON(dev_priv->uncore.fw_domains == 0);
> >  }
> > @@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
> >
> >  void intel_uncore_check_errors(struct drm_device *dev)
> >  {
> > -       struct drm_i915_private *dev_priv = dev->dev_private;
> > -
> > -       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
> > -           (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
> > -               DRM_ERROR("Unclaimed register before interrupt\n");
> > -               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
> > -       }
> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev))
> > +               hsw_unclaimed_reg_detect(to_i915(dev));
> 
> This means we won't check for unclaimed registers during interrupts if
> i915.mmio_debug is being used, which is probably not what we want.

It's exactly what you want. It still does the debug check if you have
mmio_debug unset. If you have mmio_debug set, it means you are debugging
i915 register mmio, since that is all we can reliably debug.
 
> One of the things that worries me a little is that now we'll be
> running a mostly-display-related feature only when certain pieces of
> non-display-related code run. Instead of doing the checks at the
> forcewake puts, maybe we could tie the frequency of the checks to
> something in the display code, or just do the check every X seconds. I
> don't really know what would be ideal here, I'm just throwing the
> ideas. I'm also not blocking this patch, just pointing things that
> could maybe be improved.

Sure, all you would need to do is add the check to every rpm_put() if you
feel paranoid (it will be run before the powerwell is dropped by
design).

> Since it's impacting performance, perhaps we could even completely
> kill unclaimed register detection from the normal use case, hiding it
> behind i915.mmio_debug=1 (and maybe a kconfig option)? We would then
> instruct QA and developers to always have the option enabled. Just
> like QA needs to have lockdep enabled, we could ask it to have
> mmio_debug enabled all the time too.

Whilst I like the idea, having debug code running in the wild (at a
frequency high enough to catch bugs, but low enough not to be noticed)
is invaluable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked
  2015-03-27 11:01 ` [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked Chris Wilson
@ 2015-03-27 16:42   ` Tvrtko Ursulin
  0 siblings, 0 replies; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-27 16:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


Hi,

I've noticed this one is a pre-requisite for read-read optimisation...

On 03/27/2015 11:01 AM, Chris Wilson wrote:
> We were missing a convenience stub to aquire the right mutex whilst
> dropping the request, so add it.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 13 +++++++++++++
>   drivers/gpu/drm/i915/i915_gem.c |  8 ++------
>   2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c80e2e5e591a..fa91ca33d07c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2149,6 +2149,19 @@ i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   	kref_put(&req->ref, i915_gem_request_free);
>   }
>
> +static inline void
> +i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
> +{
> +	if (req && !atomic_add_unless(&req->ref.refcount, -1, 1)) {
> +		struct drm_device *dev = req->ring->dev;
> +
> +		mutex_lock(&dev->struct_mutex);
> +		if (likely(atomic_dec_and_test(&req->ref.refcount)))
> +			i915_gem_request_free(&req->ref);
> +		mutex_unlock(&dev->struct_mutex);
> +	}
> +}
> +
>   static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   					   struct drm_i915_gem_request *src)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 476687a9d067..a46372ebb3bc 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2870,9 +2870,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	ret = __i915_wait_request(req, reset_counter, true,
>   				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
>   				  file->driver_priv);
> -	mutex_lock(&dev->struct_mutex);
> -	i915_gem_request_unreference(req);
> -	mutex_unlock(&dev->struct_mutex);
> +	i915_gem_request_unreference__unlocked(req);
>   	return ret;
>
>   out:
> @@ -4104,9 +4102,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>   	if (ret == 0)
>   		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
>
> -	mutex_lock(&dev->struct_mutex);
> -	i915_gem_request_unreference(target);
> -	mutex_unlock(&dev->struct_mutex);
> +	i915_gem_request_unreference__unlocked(target);
>
>   	return ret;
>   }
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing
  2015-03-27 11:02 ` [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
@ 2015-03-28  6:21   ` shuang.he
  0 siblings, 0 replies; 80+ messages in thread
From: shuang.he @ 2015-03-28  6:21 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6074
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -4              270/270              266/270
ILK                 -1              303/303              302/303
SNB                                  304/304              304/304
IVB                 -26              337/337              311/337
BYT                 -21              287/287              266/287
HSW                 -28              361/361              333/361
BDW                 -5              309/309              304/309
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt@gem_fence_thrash@bo-write-verify-threaded-none      PASS(1)      FAIL(1)PASS(1)
*PNV  igt@gem_fence_thrash@bo-write-verify-x      PASS(1)      FAIL(1)PASS(1)
*PNV  igt@gem_fence_thrash@bo-write-verify-y      PASS(1)      FAIL(1)PASS(1)
*PNV  igt@gem_tiled_pread_pwrite      PASS(1)      FAIL(1)PASS(1)
*ILK  igt@kms_flip@blocking-absolute-wf_vblank-interruptible      PASS(1)      DMESG_WARN(1)PASS(1)
*IVB  igt@gem_reloc_overflow@single-overflow      PASS(1)      FAIL(2)
*IVB  igt@gem_exec_bad_domains@conflicting-write-domain      PASS(1)      FAIL(2)
*IVB  igt@drm_import_export@prime      PASS(1)      DMESG_WARN(2)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/i915_cmd_parser.c:#i915_parse_cmds[i915]()@WARNING:.* at .* i915_parse_cmds+0x
*IVB  igt@gem_reloc_overflow@batch-end-unaligned      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@invalid-address      PASS(1)      FAIL(2)
*IVB  igt@prime_self_import@reimport-vs-gem_close-race      PASS(1)      FAIL(2)
*IVB  igt@gem_write_read_ring_switch@blt2bsd-interruptible      PASS(1)      FAIL(2)
*IVB  igt@gem_write_read_ring_switch@blt2bsd      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@batch-start-unaligned      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@source-offset-end-reloc-cpu      PASS(1)      FAIL(2)
*IVB  igt@gem_write_read_ring_switch@blt2render      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@buffercount-overflow      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@source-offset-big-reloc-gtt      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@source-offset-negative-reloc-gtt      PASS(1)      FAIL(2)
*IVB  igt@gem_reloc_overflow@source-offset-unaligned-reloc-cpu      PASS(1)      FAIL(2)
*IVB  igt@drm_import_export@flink      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/i915_cmd_parser.c:#i915_parse_cmds[i915]()@WARNING:.* at .* i915_parse_cmds+0x
*IVB  igt@gem_ctx_exec@basic      PASS(1)      FAIL(1)
*IVB  igt@gem_reloc_overflow@wrapped-overflow      PASS(1)      FAIL(1)
*IVB  igt@gem_reloc_overflow@source-offset-negative-reloc-cpu      PASS(1)      FAIL(1)
*IVB  igt@gem_reloc_overflow@source-offset-unaligned-reloc-gtt      PASS(1)      FAIL(1)
*IVB  igt@gem_reloc_overflow@source-offset-end-reloc-gtt      PASS(1)      FAIL(1)
*IVB  igt@gem_exec_parse@oacontrol-tracking      PASS(1)      FAIL(1)
*IVB  igt@gem_reloc_overflow@source-offset-big-reloc-cpu      PASS(1)      FAIL(1)
*IVB  igt@gem_exec_parse@cmd-crossing-page      PASS(1)      FAIL(1)
*IVB  igt@gem_write_read_ring_switch@blt2render-interruptible      PASS(1)      FAIL(1)
*IVB  igt@gem_flink_race@flink_close      PASS(1)      FAIL(1)
*BYT  igt@gem_reloc_overflow@single-overflow      PASS(1)      FAIL(2)
*BYT  igt@gem_exec_bad_domains@conflicting-write-domain      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@batch-end-unaligned      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@invalid-address      PASS(1)      FAIL(2)
*BYT  igt@prime_self_import@reimport-vs-gem_close-race      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@batch-start-unaligned      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-end-reloc-cpu      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@buffercount-overflow      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-big-reloc-gtt      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-negative-reloc-gtt      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-unaligned-reloc-cpu      PASS(1)      FAIL(2)
*BYT  igt@gem_ctx_exec@basic      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@wrapped-overflow      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-negative-reloc-cpu      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-unaligned-reloc-gtt      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-end-reloc-gtt      PASS(1)      FAIL(2)
*BYT  igt@gem_exec_parse@oacontrol-tracking      PASS(1)      FAIL(2)
*BYT  igt@gem_reloc_overflow@source-offset-big-reloc-cpu      PASS(1)      FAIL(2)
*BYT  igt@gem_exec_parse@cmd-crossing-page      PASS(1)      FAIL(2)
*BYT  igt@gem_seqno_wrap      PASS(1)      FAIL(2)
*BYT  igt@gem_flink_race@flink_close      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@single-overflow      PASS(1)      FAIL(2)
*HSW  igt@gem_exec_bad_domains@conflicting-write-domain      PASS(1)      FAIL(2)
*HSW  igt@drm_import_export@prime      PASS(1)      DMESG_WARN(2)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/i915_cmd_parser.c:#i915_parse_cmds[i915]()@WARNING:.* at .* i915_parse_cmds+0x
*HSW  igt@gem_reloc_overflow@batch-end-unaligned      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@invalid-address      PASS(1)      FAIL(2)
*HSW  igt@prime_self_import@reimport-vs-gem_close-race      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2bsd-interruptible      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2bsd      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@batch-start-unaligned      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-end-reloc-cpu      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2render      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@buffercount-overflow      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-big-reloc-gtt      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-negative-reloc-gtt      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-unaligned-reloc-cpu      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2vebox-interruptible      PASS(1)      FAIL(2)
*HSW  igt@drm_import_export@flink      PASS(1)      DMESG_WARN(2)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/i915_cmd_parser.c:#i915_parse_cmds[i915]()@WARNING:.* at .* i915_parse_cmds+0x
*HSW  igt@gem_ctx_exec@basic      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@wrapped-overflow      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-negative-reloc-cpu      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-unaligned-reloc-gtt      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-end-reloc-gtt      PASS(1)      FAIL(2)
*HSW  igt@gem_exec_parse@oacontrol-tracking      PASS(1)      FAIL(2)
*HSW  igt@gem_reloc_overflow@source-offset-big-reloc-cpu      PASS(1)      FAIL(2)
*HSW  igt@gem_exec_parse@cmd-crossing-page      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2render-interruptible      PASS(1)      FAIL(2)
*HSW  igt@gem_flink_race@flink_close      PASS(1)      FAIL(2)
*HSW  igt@gem_write_read_ring_switch@blt2vebox      PASS(1)      FAIL(2)
*BDW  igt@gem_evict_everything@forked-multifd-interruptible      PASS(1)      DMESG_FAIL(1)PASS(1)
(dmesg patch applied)Out_of_memory:Kill_process@Out of memory: Kill process
page_allocation_failure:order:#,mode@page allocation failure:
*BDW  igt@gem_evict_everything@forked-normal      PASS(1)      NSPT(1)PASS(1)
*BDW  igt@gem_evict_everything@forked-interruptible      PASS(1)      NSPT(1)PASS(1)
*BDW  igt@gem_userptr_blits@forked-sync-multifd-normal      PASS(1)      NSPT(1)PASS(1)
*BDW  igt@gem_write_read_ring_switch@blt2vebox      PASS(1)      NO_RESULT(1)PASS(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches
  2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
  2015-03-27 14:28   ` Daniel Vetter
@ 2015-03-30 12:02   ` Chris Wilson
  2015-03-30 14:59     ` Daniel Vetter
  1 sibling, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-30 12:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula

On Fri, Mar 27, 2015 at 11:02:10AM +0000, Chris Wilson wrote:
> Since
> 
> commit 17cabf571e50677d980e9ab2a43c5f11213003ae
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Jan 14 11:20:57 2015 +0000
> 
>     drm/i915: Trim the command parser allocations
> 
> we may then try to allocate a zero-sized object and attempt to extract
> its pages. Understandably this fails.

The original failure was in

committ b9ffd80ed659c559152c042e74741f4f60cac691
Author: Brad Volkin <bradley.d.volkin@intel.com>
Date:   Thu Dec 11 12:13:10 2014 -0800

    drm/i915: Use batch length instead of object size in command parser

merged in the v4.0 cycle.

Jani, pretty please?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations
  2015-03-27 11:01 ` [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-03-30 13:52   ` Tvrtko Ursulin
  2015-03-30 14:09     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-30 13:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Lionel Landwerlin


Hi,

On 03/27/2015 11:01 AM, Chris Wilson wrote:
> Currently, we only track the last request globally across all engines.
> This prevents us from issuing concurrent read requests on e.g. the RCS
> and BCS engines (or more likely the render and media engines). Without
> semaphores, we incur costly stalls as we synchronise between rings -
> greatly impacting the current performance of Broadwell versus Haswell in
> certain workloads (like video decode). With the introduction of
> reference counted requests, it is much easier to track the last request
> per ring, as well as the last global write request so that we can
> optimise inter-engine read read requests (as well as better optimise
> certain CPU waits).
>
> v2: Fix inverted readonly condition for nonblocking waits.
> v3: Handle non-continguous engine array after waits
> v4: Rebase, tidy, rewrite ring list debugging
> v5: Use obj->active as a bitfield, it looks cool
> v6: Micro-optimise, mostly involving moving code around
>
> Benchmark: igt/gem_read_read_speed
> hsw:gt3e (with semaphores):
> Before: Time to read-read 1024k:		275.794µs
> After:  Time to read-read 1024k:		123.260µs
>
> hsw:gt3e (w/o semaphores):
> Before: Time to read-read 1024k:		230.433µs
> After:  Time to read-read 1024k:		124.593µs
>
> bdw-u (w/o semaphores):             Before          After
> Time to read-read 1x1:            26.274µs       10.350µs
> Time to read-read 128x128:        40.097µs       21.366µs
> Time to read-read 256x256:        77.087µs       42.608µs
> Time to read-read 512x512:       281.999µs      181.155µs
> Time to read-read 1024x1024:    1196.141µs     1118.223µs
> Time to read-read 2048x2048:    5639.072µs     5225.837µs
> Time to read-read 4096x4096:   22401.662µs    21137.067µs
> Time to read-read 8192x8192:   89617.735µs    85637.681µs
>
> Testcase: igt/gem_concurrent_blit (read-read and friends)
> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  16 +-
>   drivers/gpu/drm/i915/i915_drv.h         |  19 +-
>   drivers/gpu/drm/i915/i915_gem.c         | 516 +++++++++++++++++++-------------
>   drivers/gpu/drm/i915/i915_gem_context.c |   2 -
>   drivers/gpu/drm/i915/i915_gem_debug.c   |  92 ++----
>   drivers/gpu/drm/i915/i915_gpu_error.c   |  19 +-
>   drivers/gpu/drm/i915/intel_display.c    |   4 +-
>   drivers/gpu/drm/i915/intel_lrc.c        |  15 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  10 +-
>   9 files changed, 386 insertions(+), 307 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7ef6295438e9..5cea9a9c1cb9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -120,10 +120,13 @@ static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
>   static void
>   describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   {
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> +	struct intel_engine_cs *ring;
>   	struct i915_vma *vma;
>   	int pin_count = 0;
> +	int i;
>
> -	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
> +	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
>   		   &obj->base,
>   		   obj->active ? "*" : " ",
>   		   get_pin_flag(obj),
> @@ -131,8 +134,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		   get_global_flag(obj),
>   		   obj->base.size / 1024,
>   		   obj->base.read_domains,
> -		   obj->base.write_domain,
> -		   i915_gem_request_get_seqno(obj->last_read_req),
> +		   obj->base.write_domain);
> +	for_each_ring(ring, dev_priv, i)
> +		seq_printf(m, "%x ",
> +				i915_gem_request_get_seqno(obj->last_read_req[i]));
> +	seq_printf(m, "] %x %x%s%s%s",
>   		   i915_gem_request_get_seqno(obj->last_write_req),
>   		   i915_gem_request_get_seqno(obj->last_fenced_req),
>   		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
> @@ -169,9 +175,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   		*t = '\0';
>   		seq_printf(m, " (%s mappable)", s);
>   	}
> -	if (obj->last_read_req != NULL)
> +	if (obj->last_write_req != NULL)
>   		seq_printf(m, " (%s)",
> -			   i915_gem_request_get_ring(obj->last_read_req)->name);
> +			   i915_gem_request_get_ring(obj->last_write_req)->name);
>   	if (obj->frontbuffer_bits)
>   		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 896aae1c10ac..7cf5d1b0a749 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -500,7 +500,7 @@ struct drm_i915_error_state {
>   	struct drm_i915_error_buffer {
>   		u32 size;
>   		u32 name;
> -		u32 rseqno, wseqno;
> +		u32 rseqno[I915_NUM_RINGS], wseqno;
>   		u32 gtt_offset;
>   		u32 read_domains;
>   		u32 write_domain;
> @@ -1905,7 +1905,7 @@ struct drm_i915_gem_object {
>   	struct drm_mm_node *stolen;
>   	struct list_head global_list;
>
> -	struct list_head ring_list;
> +	struct list_head ring_list[I915_NUM_RINGS];
>   	/** Used in execbuf to temporarily hold a ref */
>   	struct list_head obj_exec_link;
>
> @@ -1916,7 +1916,7 @@ struct drm_i915_gem_object {
>   	 * rendering and so a non-zero seqno), and is not set if it i s on
>   	 * inactive (ready to be unbound) list.
>   	 */
> -	unsigned int active:1;
> +	unsigned int active:I915_NUM_RINGS;
>
>   	/**
>   	 * This is set if the object has been written to since last bound
> @@ -1987,8 +1987,17 @@ struct drm_i915_gem_object {
>   	void *dma_buf_vmapping;
>   	int vmapping_count;
>
> -	/** Breadcrumb of last rendering to the buffer. */
> -	struct drm_i915_gem_request *last_read_req;
> +	/** Breadcrumb of last rendering to the buffer.
> +	 * There can only be one writer, but we allow for multiple readers.
> +	 * If there is a writer that necessarily implies that all other
> +	 * read requests are complete - but we may only be lazily clearing
> +	 * the read requests. A read request is naturally the most recent
> +	 * request on a ring, so we may have two different write and read
> +	 * requests on one ring where the write request is older than the
> +	 * read request. This allows for the CPU to read from an active
> +	 * buffer by only waiting for the write to complete.
> +	 * */
> +	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
>   	struct drm_i915_gem_request *last_write_req;
>   	/** Breadcrumb of last fenced GPU access to the buffer. */
>   	struct drm_i915_gem_request *last_fenced_req;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index dc3eafe7f7d4..7e6f2560bf35 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -38,14 +38,17 @@
>   #include <linux/pci.h>
>   #include <linux/dma-buf.h>
>
> +#define RQ_BUG_ON(expr)
> +
>   static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>   static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
> +static void
> +i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
> +static void
> +i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
>   static __must_check int
>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   			       bool readonly);
> -static void
> -i915_gem_object_retire(struct drm_i915_gem_object *obj);
> -
>   static void i915_gem_write_fence(struct drm_device *dev, int reg,
>   				 struct drm_i915_gem_object *obj);
>   static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
> @@ -518,8 +521,6 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   		ret = i915_gem_object_wait_rendering(obj, true);
>   		if (ret)
>   			return ret;
> -
> -		i915_gem_object_retire(obj);
>   	}
>
>   	ret = i915_gem_object_get_pages(obj);
> @@ -939,8 +940,6 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>   		ret = i915_gem_object_wait_rendering(obj, false);
>   		if (ret)
>   			return ret;
> -
> -		i915_gem_object_retire(obj);
>   	}
>   	/* Same trick applies to invalidate partially written cachelines read
>   	 * before writing. */
> @@ -1239,6 +1238,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
>   	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>
> +	if (list_empty(&req->list))
> +		return 0;
> +
>   	if (i915_gem_request_completed(req, true))
>   		return 0;
>
> @@ -1338,6 +1340,56 @@ out:
>   	return ret;
>   }
>
> +static inline void
> +i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> +{
> +	struct drm_i915_file_private *file_priv = request->file_priv;
> +
> +	if (!file_priv)
> +		return;
> +
> +	spin_lock(&file_priv->mm.lock);
> +	list_del(&request->client_list);
> +	request->file_priv = NULL;
> +	spin_unlock(&file_priv->mm.lock);
> +}
> +
> +static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> +{
> +	trace_i915_gem_request_retire(request);
> +
> +	list_del_init(&request->list);
> +	i915_gem_request_remove_from_client(request);
> +
> +	put_pid(request->pid);
> +
> +	i915_gem_request_unreference(request);
> +}
> +
> +static void
> +__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)

It is a bit annoying (for readability) that it can be rq, req and request.

> +{
> +	struct intel_engine_cs *engine = rq->ring;
> +
> +	lockdep_assert_held(&engine->dev->struct_mutex);
> +
> +	if (list_empty(&rq->list))
> +		return;
> +
> +	rq->ringbuf->last_retired_head = rq->postfix;
> +
> +	do {
> +		struct drm_i915_gem_request *prev =
> +			list_entry(rq->list.prev, typeof(*rq), list);
> +
> +		i915_gem_request_retire(rq);
> +
> +		rq = prev;
> +	} while (&rq->list != &engine->request_list);
> +
> +	WARN_ON(i915_verify_lists(engine->dev));
> +}
> +
>   /**
>    * Waits for a request to be signaled, and cleans up the
>    * request and object lists appropriately for that event.
> @@ -1348,7 +1400,6 @@ i915_wait_request(struct drm_i915_gem_request *req)
>   	struct drm_device *dev;
>   	struct drm_i915_private *dev_priv;
>   	bool interruptible;
> -	unsigned reset_counter;
>   	int ret;
>
>   	BUG_ON(req == NULL);
> @@ -1367,29 +1418,13 @@ i915_wait_request(struct drm_i915_gem_request *req)
>   	if (ret)
>   		return ret;
>
> -	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> -	i915_gem_request_reference(req);
> -	ret = __i915_wait_request(req, reset_counter,
> +	ret = __i915_wait_request(req,
> +				  atomic_read(&dev_priv->gpu_error.reset_counter),
>   				  interruptible, NULL, NULL);
> -	i915_gem_request_unreference(req);
> -	return ret;
> -}
> -
> -static int
> -i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
> -{
> -	if (!obj->active)
> -		return 0;
> -
> -	/* Manually manage the write flush as we may have not yet
> -	 * retired the buffer.
> -	 *
> -	 * Note that the last_write_req is always the earlier of
> -	 * the two (read/write) requests, so if we haved successfully waited,
> -	 * we know we have passed the last write.
> -	 */
> -	i915_gem_request_assign(&obj->last_write_req, NULL);
> +	if (ret)
> +		return ret;
>
> +	__i915_gem_request_retire__upto(req);
>   	return 0;
>   }
>
> @@ -1401,18 +1436,38 @@ static __must_check int
>   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>   			       bool readonly)
>   {
> -	struct drm_i915_gem_request *req;
> -	int ret;
> +	int ret, i;
>
> -	req = readonly ? obj->last_write_req : obj->last_read_req;
> -	if (!req)
> +	if (!obj->active)
>   		return 0;
>
> -	ret = i915_wait_request(req);
> -	if (ret)
> -		return ret;
> +	if (readonly) {
> +		if (obj->last_write_req != NULL) {
> +			ret = i915_wait_request(obj->last_write_req);
> +			if (ret)
> +				return ret;
>
> -	return i915_gem_object_wait_rendering__tail(obj);
> +			i = obj->last_write_req->ring->id;
> +			if (obj->last_read_req[i] == obj->last_write_req)
> +				i915_gem_object_retire__read(obj, i);
> +			else
> +				i915_gem_object_retire__write(obj);
> +		}
> +	} else {
> +		for (i = 0; i < I915_NUM_RINGS; i++) {
> +			if (obj->last_read_req[i] == NULL)
> +				continue;
> +
> +			ret = i915_wait_request(obj->last_read_req[i]);
> +			if (ret)
> +				return ret;
> +
> +			i915_gem_object_retire__read(obj, i);
> +		}
> +		RQ_BUG_ON(obj->active);
> +	}
> +
> +	return 0;
>   }
>
>   /* A nonblocking variant of the above wait. This is a highly dangerous routine
> @@ -1423,37 +1478,72 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
>   					    struct drm_i915_file_private *file_priv,
>   					    bool readonly)
>   {
> -	struct drm_i915_gem_request *req;
>   	struct drm_device *dev = obj->base.dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
>   	unsigned reset_counter;
> -	int ret;
> +	int ret, i, n = 0;
>
>   	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>   	BUG_ON(!dev_priv->mm.interruptible);
>
> -	req = readonly ? obj->last_write_req : obj->last_read_req;
> -	if (!req)
> +	if (!obj->active)
>   		return 0;
>
>   	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
>   	if (ret)
>   		return ret;
>
> -	ret = i915_gem_check_olr(req);
> -	if (ret)
> -		return ret;
> -
>   	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> -	i915_gem_request_reference(req);
> +
> +	if (readonly) {
> +		struct drm_i915_gem_request *rq;
> +
> +		rq = obj->last_write_req;
> +		if (rq == NULL)
> +			return 0;
> +
> +		ret = i915_gem_check_olr(rq);
> +		if (ret)
> +			goto err;
> +
> +		requests[n++] = i915_gem_request_reference(rq);
> +	} else {
> +		for (i = 0; i < I915_NUM_RINGS; i++) {
> +			struct drm_i915_gem_request *rq;
> +
> +			rq = obj->last_read_req[i];
> +			if (rq == NULL)
> +				continue;
> +
> +			ret = i915_gem_check_olr(rq);
> +			if (ret)
> +				goto err;
> +
> +			requests[n++] = i915_gem_request_reference(rq);
> +		}
> +	}
> +
>   	mutex_unlock(&dev->struct_mutex);
> -	ret = __i915_wait_request(req, reset_counter, true, NULL, file_priv);
> +	for (i = 0; ret == 0 && i < n; i++)
> +		ret = __i915_wait_request(requests[i], reset_counter, true,
> +					  NULL, file_priv);
>   	mutex_lock(&dev->struct_mutex);
> -	i915_gem_request_unreference(req);
> -	if (ret)
> -		return ret;
>
> -	return i915_gem_object_wait_rendering__tail(obj);
> +err:
> +	for (i = 0; i < n; i++) {
> +		if (ret == 0) {
> +			int ring = requests[i]->ring->id;
> +			if (obj->last_read_req[ring] == requests[i])
> +				i915_gem_object_retire__read(obj, ring);
> +			if (obj->last_write_req == requests[i])
> +				i915_gem_object_retire__write(obj);

Above four lines seem to be identical functionality to similar four in 
__i915_gem_object_sync.

Also, _retire__read will do _retire__write if there is one on the same 
ring. And here by definition they are since it is the same request, no?

> +			__i915_gem_request_retire__upto(requests[i]);
> +		}
> +		i915_gem_request_unreference(requests[i]);
> +	}
> +
> +	return ret;
>   }
>
>   /**
> @@ -2204,78 +2294,58 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>   	return 0;
>   }
>
> -static void
> -i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
> -			       struct intel_engine_cs *ring)
> +void i915_vma_move_to_active(struct i915_vma *vma,
> +			     struct intel_engine_cs *ring)
>   {
> -	struct drm_i915_gem_request *req;
> -	struct intel_engine_cs *old_ring;
> -
> -	BUG_ON(ring == NULL);
> -
> -	req = intel_ring_get_request(ring);
> -	old_ring = i915_gem_request_get_ring(obj->last_read_req);
> -
> -	if (old_ring != ring && obj->last_write_req) {
> -		/* Keep the request relative to the current ring */
> -		i915_gem_request_assign(&obj->last_write_req, req);
> -	}
> +	struct drm_i915_gem_object *obj = vma->obj;
>
>   	/* Add a reference if we're newly entering the active list. */
> -	if (!obj->active) {
> +	if (obj->active == 0)
>   		drm_gem_object_reference(&obj->base);
> -		obj->active = 1;
> -	}
> +	obj->active |= intel_ring_flag(ring);
>
> -	list_move_tail(&obj->ring_list, &ring->active_list);
> +	list_move_tail(&obj->ring_list[ring->id], &ring->active_list);
> +	i915_gem_request_assign(&obj->last_read_req[ring->id],
> +				intel_ring_get_request(ring));
>
> -	i915_gem_request_assign(&obj->last_read_req, req);
> +	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>   }
>
> -void i915_vma_move_to_active(struct i915_vma *vma,
> -			     struct intel_engine_cs *ring)
> +static void
> +i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
>   {
> -	list_move_tail(&vma->mm_list, &vma->vm->active_list);
> -	return i915_gem_object_move_to_active(vma->obj, ring);
> +	RQ_BUG_ON(obj->last_write_req == NULL);
> +	RQ_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
> +
> +	i915_gem_request_assign(&obj->last_write_req, NULL);
> +	intel_fb_obj_flush(obj, true);
>   }
>
>   static void
> -i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
> +i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   {
>   	struct i915_vma *vma;
>
> -	BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS);
> -	BUG_ON(!obj->active);
> +	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
> +	RQ_BUG_ON(!(obj->active & (1 << ring)));
> +
> +	list_del_init(&obj->ring_list[ring]);
> +	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> +
> +	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> +		i915_gem_object_retire__write(obj);
> +
> +	obj->active &= ~(1 << ring);
> +	if (obj->active)
> +		return;
>
>   	list_for_each_entry(vma, &obj->vma_list, vma_link) {
>   		if (!list_empty(&vma->mm_list))
>   			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>   	}
>
> -	intel_fb_obj_flush(obj, true);
> -
> -	list_del_init(&obj->ring_list);
> -
> -	i915_gem_request_assign(&obj->last_read_req, NULL);
> -	i915_gem_request_assign(&obj->last_write_req, NULL);
> -	obj->base.write_domain = 0;
> -
>   	i915_gem_request_assign(&obj->last_fenced_req, NULL);
> -
> -	obj->active = 0;
>   	drm_gem_object_unreference(&obj->base);
> -
> -	WARN_ON(i915_verify_lists(dev));
> -}
> -
> -static void
> -i915_gem_object_retire(struct drm_i915_gem_object *obj)
> -{
> -	if (obj->last_read_req == NULL)
> -		return;
> -
> -	if (i915_gem_request_completed(obj->last_read_req, true))
> -		i915_gem_object_move_to_inactive(obj);
>   }
>
>   static int
> @@ -2452,20 +2522,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
>   	return 0;
>   }
>
> -static inline void
> -i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
> -{
> -	struct drm_i915_file_private *file_priv = request->file_priv;
> -
> -	if (!file_priv)
> -		return;
> -
> -	spin_lock(&file_priv->mm.lock);
> -	list_del(&request->client_list);
> -	request->file_priv = NULL;
> -	spin_unlock(&file_priv->mm.lock);
> -}
> -
>   static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
>   				   const struct intel_context *ctx)
>   {
> @@ -2511,16 +2567,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -static void i915_gem_free_request(struct drm_i915_gem_request *request)
> -{
> -	list_del(&request->list);
> -	i915_gem_request_remove_from_client(request);
> -
> -	put_pid(request->pid);
> -
> -	i915_gem_request_unreference(request);
> -}
> -
>   void i915_gem_request_free(struct kref *req_ref)
>   {
>   	struct drm_i915_gem_request *req = container_of(req_ref,
> @@ -2583,9 +2629,9 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>
>   		obj = list_first_entry(&ring->active_list,
>   				       struct drm_i915_gem_object,
> -				       ring_list);
> +				       ring_list[ring->id]);
>
> -		i915_gem_object_move_to_inactive(obj);
> +		i915_gem_object_retire__read(obj, ring->id);
>   	}
>
>   	/*
> @@ -2622,7 +2668,8 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>   					   struct drm_i915_gem_request,
>   					   list);
>
> -		i915_gem_free_request(request);
> +		request->ringbuf->last_retired_head = request->tail;
> +		i915_gem_request_retire(request);
>   	}

This loop looks awfully similar to __i915_gem_request_retire__upto on 
the last request on the engine?

>   	/* This may not have been flushed before the reset, so clean it now */
> @@ -2670,6 +2717,8 @@ void i915_gem_reset(struct drm_device *dev)
>   	i915_gem_context_reset(dev);
>
>   	i915_gem_restore_fences(dev);
> +
> +	WARN_ON(i915_verify_lists(dev));
>   }
>
>   /**
> @@ -2678,11 +2727,11 @@ void i915_gem_reset(struct drm_device *dev)
>   void
>   i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   {
> -	if (list_empty(&ring->request_list))
> -		return;
> -
>   	WARN_ON(i915_verify_lists(ring->dev));
>
> +	if (list_empty(&ring->active_list))
> +		return;
> +
>   	/* Retire requests first as we use it above for the early return.
>   	 * If we retire requests last, we may use a later seqno and so clear
>   	 * the requests lists without clearing the active list, leading to
> @@ -2698,16 +2747,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   		if (!i915_gem_request_completed(request, true))
>   			break;
>
> -		trace_i915_gem_request_retire(request);
> -
>   		/* We know the GPU must have read the request to have
>   		 * sent us the seqno + interrupt, so use the position
>   		 * of tail of the request to update the last known position
>   		 * of the GPU head.
>   		 */
>   		request->ringbuf->last_retired_head = request->postfix;
> -
> -		i915_gem_free_request(request);
> +		i915_gem_request_retire(request);
>   	}

This loop could also use __i915_gem_request_retire__upto if it found the 
first completed request first. Not sure how much code would that save 
but would maube be more readable, a little bit more self documenting.

>
>   	/* Move any buffers on the active list that are no longer referenced
> @@ -2719,12 +2765,12 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>
>   		obj = list_first_entry(&ring->active_list,
>   				      struct drm_i915_gem_object,
> -				      ring_list);
> +				      ring_list[ring->id]);
>
> -		if (!i915_gem_request_completed(obj->last_read_req, true))
> +		if (!list_empty(&obj->last_read_req[ring->id]->list))
>   			break;
>
> -		i915_gem_object_move_to_inactive(obj);
> +		i915_gem_object_retire__read(obj, ring->id);
>   	}
>
>   	if (unlikely(ring->trace_irq_req &&
> @@ -2813,17 +2859,23 @@ i915_gem_idle_work_handler(struct work_struct *work)
>   static int
>   i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   {
> -	struct intel_engine_cs *ring;
> -	int ret;
> +	int ret, i;
>
> -	if (obj->active) {
> -		ring = i915_gem_request_get_ring(obj->last_read_req);
> +	if (!obj->active)
> +		return 0;
> +
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		struct drm_i915_gem_request *rq;
> +
> +		rq = obj->last_read_req[i];
> +		if (rq == NULL)
> +			continue;
>
> -		ret = i915_gem_check_olr(obj->last_read_req);
> +		ret = i915_gem_check_olr(rq);
>   		if (ret)
>   			return ret;
>
> -		i915_gem_retire_requests_ring(ring);
> +		i915_gem_retire_requests_ring(rq->ring);
>   	}
>
>   	return 0;
> @@ -2857,9 +2909,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_gem_wait *args = data;
>   	struct drm_i915_gem_object *obj;
> -	struct drm_i915_gem_request *req;
> +	struct drm_i915_gem_request *req[I915_NUM_RINGS];
>   	unsigned reset_counter;
> -	int ret = 0;
> +	int i, n = 0;
> +	int ret;
>
>   	if (args->flags != 0)
>   		return -EINVAL;
> @@ -2879,11 +2932,9 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   	if (ret)
>   		goto out;
>
> -	if (!obj->active || !obj->last_read_req)
> +	if (!obj->active)
>   		goto out;
>
> -	req = obj->last_read_req;
> -
>   	/* Do this after OLR check to make sure we make forward progress polling
>   	 * on this IOCTL with a timeout == 0 (like busy ioctl)
>   	 */
> @@ -2894,13 +2945,23 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>
>   	drm_gem_object_unreference(&obj->base);
>   	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> -	i915_gem_request_reference(req);
> +
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		if (obj->last_read_req[i] == NULL)
> +			continue;
> +
> +		req[n++] = i915_gem_request_reference(obj->last_read_req[i]);
> +	}
> +
>   	mutex_unlock(&dev->struct_mutex);
>
> -	ret = __i915_wait_request(req, reset_counter, true,
> -				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
> -				  file->driver_priv);
> -	i915_gem_request_unreference__unlocked(req);
> +	for (i = 0; i < n; i++) {
> +		if (ret == 0)
> +			ret = __i915_wait_request(req[i], reset_counter, true,
> +						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
> +						  file->driver_priv);
> +		i915_gem_request_unreference__unlocked(req[i]);
> +	}
>   	return ret;
>
>   out:
> @@ -2909,6 +2970,62 @@ out:
>   	return ret;
>   }
>
> +static int
> +__i915_gem_object_sync(struct drm_i915_gem_object *obj,
> +		       struct intel_engine_cs *to,
> +		       struct drm_i915_gem_request *rq)
> +{
> +	struct intel_engine_cs *from;
> +	int ret;
> +
> +	if (rq == NULL)
> +		return 0;
> +
> +	from = i915_gem_request_get_ring(rq);
> +	if (to == from)
> +		return 0;
> +
> +	if (i915_gem_request_completed(rq, true))
> +		return 0;
> +
> +	ret = i915_gem_check_olr(rq);
> +	if (ret)
> +		return ret;
> +
> +	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> +		ret = __i915_wait_request(rq,
> +					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
> +					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
> +		if (ret)
> +			return ret;
> +
> +		if (obj->last_read_req[from->id] == rq)
> +			i915_gem_object_retire__read(obj, from->id);
> +		if (obj->last_write_req == rq)
> +			i915_gem_object_retire__write(obj);
> +	} else {
> +		int idx = intel_ring_sync_index(from, to);
> +		u32 seqno = i915_gem_request_get_seqno(rq);
> +
> +		if (seqno <= from->semaphore.sync_seqno[idx])
> +			return 0;
> +
> +		trace_i915_gem_ring_sync_to(from, to, rq);
> +		ret = to->semaphore.sync_to(to, from, seqno);
> +		if (ret)
> +			return ret;
> +
> +		/* We use last_read_req because sync_to()
> +		 * might have just caused seqno wrap under
> +		 * the radar.
> +		 */
> +		from->semaphore.sync_seqno[idx] =
> +			i915_gem_request_get_seqno(obj->last_read_req[from->id]);
> +	}
> +
> +	return 0;
> +}
> +
>   /**
>    * i915_gem_object_sync - sync an object to a ring.
>    *
> @@ -2917,7 +3034,17 @@ out:
>    *
>    * This code is meant to abstract object synchronization with the GPU.
>    * Calling with NULL implies synchronizing the object with the CPU
> - * rather than a particular GPU ring.
> + * rather than a particular GPU ring. Conceptually we serialise writes
> + * between engines inside the GPU. We only allow on engine to write
> + * into a buffer at any time, but multiple readers. To ensure each has
> + * a coherent view of memory, we must:
> + *
> + * - If there is an outstanding write request to the object, the new
> + *   request must wait for it to complete (either CPU or in hw, requests
> + *   on the same ring will be naturally ordered).
> + *
> + * - If we are a write request (pending_write_domain is set), the new
> + *   request must wait for outstanding read requests to complete.
>    *
>    * Returns 0 if successful, else propagates up the lower layer error.
>    */
> @@ -2925,39 +3052,25 @@ int
>   i915_gem_object_sync(struct drm_i915_gem_object *obj,
>   		     struct intel_engine_cs *to)
>   {
> -	struct intel_engine_cs *from;
> -	u32 seqno;
> -	int ret, idx;
> -
> -	from = i915_gem_request_get_ring(obj->last_read_req);
> -
> -	if (from == NULL || to == from)
> -		return 0;
> -
> -	if (to == NULL || !i915_semaphore_is_enabled(obj->base.dev))
> -		return i915_gem_object_wait_rendering(obj, false);
> -
> -	idx = intel_ring_sync_index(from, to);
> +	const bool readonly = obj->base.pending_write_domain == 0;
> +	int ret, i;
>
> -	seqno = i915_gem_request_get_seqno(obj->last_read_req);
> -	/* Optimization: Avoid semaphore sync when we are sure we already
> -	 * waited for an object with higher seqno */
> -	if (seqno <= from->semaphore.sync_seqno[idx])
> +	if (!obj->active)
>   		return 0;
>
> -	ret = i915_gem_check_olr(obj->last_read_req);
> -	if (ret)
> -		return ret;
> -
> -	trace_i915_gem_ring_sync_to(from, to, obj->last_read_req);
> -	ret = to->semaphore.sync_to(to, from, seqno);
> -	if (!ret)
> -		/* We use last_read_req because sync_to()
> -		 * might have just caused seqno wrap under
> -		 * the radar.
> -		 */
> -		from->semaphore.sync_seqno[idx] =
> -				i915_gem_request_get_seqno(obj->last_read_req);
> +	if (to == NULL) {
> +		ret = i915_gem_object_wait_rendering(obj, readonly);
> +	} else if (readonly) {
> +		ret = __i915_gem_object_sync(obj, to,
> +					     obj->last_write_req);
> +	} else {
> +		for (i = 0; i < I915_NUM_RINGS; i++) {
> +			ret = __i915_gem_object_sync(obj, to,
> +						     obj->last_read_req[i]);
> +			if (ret)
> +				break;
> +		}
> +	}
>
>   	return ret;
>   }
> @@ -3044,10 +3157,6 @@ int i915_vma_unbind(struct i915_vma *vma)
>   	/* Since the unbound list is global, only move to that list if
>   	 * no more VMAs exist. */
>   	if (list_empty(&obj->vma_list)) {
> -		/* Throw away the active reference before
> -		 * moving to the unbound list. */
> -		i915_gem_object_retire(obj);
> -
>   		i915_gem_gtt_finish_object(obj);
>   		list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
>   	}
> @@ -3080,6 +3189,7 @@ int i915_gpu_idle(struct drm_device *dev)
>   			return ret;
>   	}
>
> +	WARN_ON(i915_verify_lists(dev));
>   	return 0;
>   }
>
> @@ -3713,8 +3823,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   	if (ret)
>   		return ret;
>
> -	i915_gem_object_retire(obj);
> -
>   	/* Flush and acquire obj->pages so that we are coherent through
>   	 * direct access in memory with previous cached writes through
>   	 * shmemfs and that our cache domain tracking remains valid.
> @@ -3940,11 +4048,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>   	bool was_pin_display;
>   	int ret;
>
> -	if (pipelined != i915_gem_request_get_ring(obj->last_read_req)) {
> -		ret = i915_gem_object_sync(obj, pipelined);
> -		if (ret)
> -			return ret;
> -	}
> +	ret = i915_gem_object_sync(obj, pipelined);
> +	if (ret)
> +		return ret;
>
>   	/* Mark the pin_display early so that we account for the
>   	 * display coherency whilst setting up the cache domains.
> @@ -4049,7 +4155,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>   	if (ret)
>   		return ret;
>
> -	i915_gem_object_retire(obj);
>   	i915_gem_object_flush_gtt_write_domain(obj);
>
>   	old_write_domain = obj->base.write_domain;
> @@ -4359,15 +4464,15 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>   	 * necessary flushes here.
>   	 */
>   	ret = i915_gem_object_flush_active(obj);
> +	if (ret)
> +		goto unref;
>
> -	args->busy = obj->active;
> -	if (obj->last_read_req) {
> -		struct intel_engine_cs *ring;
> -		BUILD_BUG_ON(I915_NUM_RINGS > 16);
> -		ring = i915_gem_request_get_ring(obj->last_read_req);
> -		args->busy |= intel_ring_flag(ring) << 16;
> -	}
> +	BUILD_BUG_ON(I915_NUM_RINGS > 16);
> +	args->busy = obj->active << 16;
> +	if (obj->last_write_req)
> +		args->busy |= intel_ring_flag(obj->last_write_req->ring);
>
> +unref:
>   	drm_gem_object_unreference(&obj->base);
>   unlock:
>   	mutex_unlock(&dev->struct_mutex);
> @@ -4441,8 +4546,11 @@ unlock:
>   void i915_gem_object_init(struct drm_i915_gem_object *obj,
>   			  const struct drm_i915_gem_object_ops *ops)
>   {
> +	int i;
> +
>   	INIT_LIST_HEAD(&obj->global_list);
> -	INIT_LIST_HEAD(&obj->ring_list);
> +	for (i = 0; i < I915_NUM_RINGS; i++)
> +		INIT_LIST_HEAD(&obj->ring_list[i]);
>   	INIT_LIST_HEAD(&obj->obj_exec_link);
>   	INIT_LIST_HEAD(&obj->vma_list);
>   	INIT_LIST_HEAD(&obj->batch_pool_link);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f3e84c44d009..18900f745bc6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -768,8 +768,6 @@ static int do_switch(struct intel_engine_cs *ring,
>   		 * swapped, but there is no way to do that yet.
>   		 */
>   		from->legacy_hw_ctx.rcs_state->dirty = 1;
> -		BUG_ON(i915_gem_request_get_ring(
> -			from->legacy_hw_ctx.rcs_state->last_read_req) != ring);
>
>   		/* obj is kept alive until the next request by its active ref */
>   		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
> diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
> index f462d1b51d97..17299d04189f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_debug.c
> +++ b/drivers/gpu/drm/i915/i915_gem_debug.c
> @@ -34,82 +34,34 @@ int
>   i915_verify_lists(struct drm_device *dev)
>   {
>   	static int warned;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_private *dev_priv = to_i915(dev);
>   	struct drm_i915_gem_object *obj;
> +	struct intel_engine_cs *ring;
>   	int err = 0;
> +	int i;
>
>   	if (warned)
>   		return 0;
>
> -	list_for_each_entry(obj, &dev_priv->render_ring.active_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed render active %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0) {
> -			DRM_ERROR("invalid render active %p (a %d r %x)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.read_domains);
> -			err++;
> -		} else if (obj->base.write_domain && list_empty(&obj->gpu_write_list)) {
> -			DRM_ERROR("invalid render active %p (w %x, gwl %d)\n",
> -				  obj,
> -				  obj->base.write_domain,
> -				  !list_empty(&obj->gpu_write_list));
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &dev_priv->mm.flushing_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed flushing %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0 ||
> -			   list_empty(&obj->gpu_write_list)) {
> -			DRM_ERROR("invalid flushing %p (a %d w %x gwl %d)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.write_domain,
> -				  !list_empty(&obj->gpu_write_list));
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &dev_priv->mm.gpu_write_list, gpu_write_list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed gpu write %p\n", obj);
> -			err++;
> -			break;
> -		} else if (!obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0) {
> -			DRM_ERROR("invalid gpu write %p (a %d w %x)\n",
> -				  obj,
> -				  obj->active,
> -				  obj->base.write_domain);
> -			err++;
> -		}
> -	}
> -
> -	list_for_each_entry(obj, &i915_gtt_vm->inactive_list, list) {
> -		if (obj->base.dev != dev ||
> -		    !atomic_read(&obj->base.refcount.refcount)) {
> -			DRM_ERROR("freed inactive %p\n", obj);
> -			err++;
> -			break;
> -		} else if (obj->pin_count || obj->active ||
> -			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS)) {
> -			DRM_ERROR("invalid inactive %p (p %d a %d w %x)\n",
> -				  obj,
> -				  obj->pin_count, obj->active,
> -				  obj->base.write_domain);
> -			err++;

Nice and stale. :)

> +	for_each_ring(ring, dev_priv, i) {
> +		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
> +			if (obj->base.dev != dev ||
> +			    !atomic_read(&obj->base.refcount.refcount)) {
> +				DRM_ERROR("%s: freed active obj %p\n",
> +					  ring->name, obj);
> +				err++;
> +				break;
> +			} else if (!obj->active ||
> +				   obj->last_read_req[ring->id] == NULL) {
> +				DRM_ERROR("%s: invalid active obj %p\n",
> +					  ring->name, obj);
> +				err++;
> +			} else if (obj->base.write_domain) {
> +				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
> +					  ring->name,
> +					  obj, obj->base.write_domain);
> +				err++;
> +			}
>   		}
>   	}
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1d4e60df8883..5f798961266f 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -192,15 +192,20 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
>   				struct drm_i915_error_buffer *err,
>   				int count)
>   {
> +	int i;
> +
>   	err_printf(m, "  %s [%d]:\n", name, count);
>
>   	while (count--) {
> -		err_printf(m, "    %08x %8u %02x %02x %x %x",
> +		err_printf(m, "    %08x %8u %02x %02x [ ",
>   			   err->gtt_offset,
>   			   err->size,
>   			   err->read_domains,
> -			   err->write_domain,
> -			   err->rseqno, err->wseqno);
> +			   err->write_domain);
> +		for (i = 0; i < I915_NUM_RINGS; i++)
> +			err_printf(m, "%02x ", err->rseqno[i]);
> +
> +		err_printf(m, "] %02x", err->wseqno);
>   		err_puts(m, pin_flag(err->pinned));
>   		err_puts(m, tiling_flag(err->tiling));
>   		err_puts(m, dirty_flag(err->dirty));
> @@ -679,10 +684,12 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>   		       struct i915_vma *vma)
>   {
>   	struct drm_i915_gem_object *obj = vma->obj;
> +	int i;
>
>   	err->size = obj->base.size;
>   	err->name = obj->base.name;
> -	err->rseqno = i915_gem_request_get_seqno(obj->last_read_req);
> +	for (i = 0; i < I915_NUM_RINGS; i++)
> +		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read_req[i]);
>   	err->wseqno = i915_gem_request_get_seqno(obj->last_write_req);
>   	err->gtt_offset = vma->node.start;
>   	err->read_domains = obj->base.read_domains;
> @@ -695,8 +702,8 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>   	err->dirty = obj->dirty;
>   	err->purgeable = obj->madv != I915_MADV_WILLNEED;
>   	err->userptr = obj->userptr.mm != NULL;
> -	err->ring = obj->last_read_req ?
> -			i915_gem_request_get_ring(obj->last_read_req)->id : -1;
> +	err->ring = obj->last_write_req ?
> +			i915_gem_request_get_ring(obj->last_write_req)->id : -1;
>   	err->cache_level = obj->cache_level;
>   }
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 5eb159bcd599..64b67df94d33 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9895,7 +9895,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
>   	else if (i915.enable_execlists)
>   		return true;
>   	else
> -		return ring != i915_gem_request_get_ring(obj->last_read_req);
> +		return ring != i915_gem_request_get_ring(obj->last_write_req);
>   }
>
>   static void skl_do_mmio_flip(struct intel_crtc *intel_crtc)
> @@ -10199,7 +10199,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>   	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
>   		ring = &dev_priv->ring[BCS];
>   	} else if (INTEL_INFO(dev)->gen >= 7) {
> -		ring = i915_gem_request_get_ring(obj->last_read_req);
> +		ring = i915_gem_request_get_ring(obj->last_write_req);
>   		if (ring == NULL || ring->id != RCS)
>   			ring = &dev_priv->ring[BCS];
>   	} else {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1d0fb8450adc..fb4f3792fd78 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -901,6 +901,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>   {
>   	struct intel_engine_cs *ring = ringbuf->ring;
>   	struct drm_i915_gem_request *request;
> +	unsigned space;
>   	int ret;
>
>   	if (intel_ring_space(ringbuf) >= bytes)
> @@ -912,15 +913,14 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>   		 * from multiple ringbuffers. Here, we must ignore any that
>   		 * aren't from the ringbuffer we're considering.
>   		 */
> -		struct intel_context *ctx = request->ctx;
> -		if (ctx->engine[ring->id].ringbuf != ringbuf)
> +		if (request->ringbuf != ringbuf)
>   			continue;
>
>   		/* Would completion of this request free enough space? */
> -		if (__intel_ring_space(request->tail, ringbuf->tail,
> -				       ringbuf->size) >= bytes) {
> +		space = __intel_ring_space(request->tail, ringbuf->tail,
> +					   ringbuf->size);
> +		if (space >= bytes)
>   			break;
> -		}
>   	}
>
>   	if (&request->list == &ring->request_list)
> @@ -930,9 +930,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>   	if (ret)
>   		return ret;
>
> -	i915_gem_retire_requests_ring(ring);
> -
> -	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
> +	ringbuf->space = bytes;
> +	return 0;
>   }
>
>   static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index a351178913f7..a1184e700d1d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2061,16 +2061,17 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>   {
>   	struct intel_ringbuffer *ringbuf = ring->buffer;
>   	struct drm_i915_gem_request *request;
> +	unsigned space;
>   	int ret;
>
>   	if (intel_ring_space(ringbuf) >= n)
>   		return 0;
>
>   	list_for_each_entry(request, &ring->request_list, list) {
> -		if (__intel_ring_space(request->postfix, ringbuf->tail,
> -				       ringbuf->size) >= n) {
> +		space = __intel_ring_space(request->postfix, ringbuf->tail,
> +					   ringbuf->size);
> +		if (space >= n)
>   			break;
> -		}
>   	}
>
>   	if (&request->list == &ring->request_list)
> @@ -2080,8 +2081,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>   	if (ret)
>   		return ret;
>
> -	i915_gem_retire_requests_ring(ring);
> -
> +	ringbuf->space = space;
>   	return 0;
>   }
>
>

So far it all looks reasonable to me, but apart from the comments above, 
I want to do another pass anyway.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations
  2015-03-30 13:52   ` Tvrtko Ursulin
@ 2015-03-30 14:09     ` Chris Wilson
  2015-03-30 14:45       ` Tvrtko Ursulin
  0 siblings, 1 reply; 80+ messages in thread
From: Chris Wilson @ 2015-03-30 14:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Lionel Landwerlin, intel-gfx

On Mon, Mar 30, 2015 at 02:52:26PM +0100, Tvrtko Ursulin wrote:
> >+static void
> >+__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
> 
> It is a bit annoying (for readability) that it can be rq, req and request.

Nonsense they are all rq and struct i915_request. Or once have been and
so will again. /prophecy

> >+err:
> >+	for (i = 0; i < n; i++) {
> >+		if (ret == 0) {
> >+			int ring = requests[i]->ring->id;
> >+			if (obj->last_read_req[ring] == requests[i])
> >+				i915_gem_object_retire__read(obj, ring);
> >+			if (obj->last_write_req == requests[i])
> >+				i915_gem_object_retire__write(obj);
> 
> Above four lines seem to be identical functionality to similar four
> in __i915_gem_object_sync.

Yes. Extracting them ended up looking worse (imo).
 
> Also, _retire__read will do _retire__write if there is one on the
> same ring. And here by definition they are since it is the same
> request, no?

No. It's subtle but here is the bug I pointed out from before. Once we
drop the lock, we no longer can make assumptions about the state of obj.

> >@@ -2698,16 +2747,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> >  		if (!i915_gem_request_completed(request, true))
> >  			break;
> >
> >-		trace_i915_gem_request_retire(request);
> >-
> >  		/* We know the GPU must have read the request to have
> >  		 * sent us the seqno + interrupt, so use the position
> >  		 * of tail of the request to update the last known position
> >  		 * of the GPU head.
> >  		 */
> >  		request->ringbuf->last_retired_head = request->postfix;
> >-
> >-		i915_gem_free_request(request);
> >+		i915_gem_request_retire(request);
> >  	}
> 
> This loop could also use __i915_gem_request_retire__upto if it found
> the first completed request first. Not sure how much code would that
> save but would maube be more readable, a little bit more self
> documenting.

Actually this loop here should be pushed back to the engine (as part of
later patches). After that transformation, using i915_gem_request_retire()
is even clearer. But _retire__upto does become the main way in which we
retire requests (having killed off retire_requests_ring in favour of
explict wait/poll+retire).

> So far it all looks reasonable to me, but apart from the comments
> above, I want to do another pass anyway.

There's a few more changes afoot as well (minor ones concerning
retire__upto and unexporting retire_requests_rig).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations
  2015-03-30 14:09     ` Chris Wilson
@ 2015-03-30 14:45       ` Tvrtko Ursulin
  2015-03-30 15:07         ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Tvrtko Ursulin @ 2015-03-30 14:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Lionel Landwerlin


On 03/30/2015 03:09 PM, Chris Wilson wrote:
> On Mon, Mar 30, 2015 at 02:52:26PM +0100, Tvrtko Ursulin wrote:
>>> +static void
>>> +__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
>>
>> It is a bit annoying (for readability) that it can be rq, req and request.
>
> Nonsense they are all rq and struct i915_request. Or once have been and
> so will again. /prophecy

rq is least spread in the codebase, and even the worst option of the 
three since it sounds like a queue of some sort.

>>> +err:
>>> +	for (i = 0; i < n; i++) {
>>> +		if (ret == 0) {
>>> +			int ring = requests[i]->ring->id;
>>> +			if (obj->last_read_req[ring] == requests[i])
>>> +				i915_gem_object_retire__read(obj, ring);
>>> +			if (obj->last_write_req == requests[i])
>>> +				i915_gem_object_retire__write(obj);
>>
>> Above four lines seem to be identical functionality to similar four
>> in __i915_gem_object_sync.
>
> Yes. Extracting them ended up looking worse (imo).

It would be a single function call taking object and request, how can it 
be worse? Should be more readable with a good name.

>> Also, _retire__read will do _retire__write if there is one on the
>> same ring. And here by definition they are since it is the same
>> request, no?
>
> No. It's subtle but here is the bug I pointed out from before. Once we
> drop the lock, we no longer can make assumptions about the state of obj.

You mean request might have disappeared from last_read_req but is still 
on last_write_req? But how, since if it was retired that shouldn't be 
possible.

>>> @@ -2698,16 +2747,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>>>   		if (!i915_gem_request_completed(request, true))
>>>   			break;
>>>
>>> -		trace_i915_gem_request_retire(request);
>>> -
>>>   		/* We know the GPU must have read the request to have
>>>   		 * sent us the seqno + interrupt, so use the position
>>>   		 * of tail of the request to update the last known position
>>>   		 * of the GPU head.
>>>   		 */
>>>   		request->ringbuf->last_retired_head = request->postfix;
>>> -
>>> -		i915_gem_free_request(request);
>>> +		i915_gem_request_retire(request);
>>>   	}
>>
>> This loop could also use __i915_gem_request_retire__upto if it found
>> the first completed request first. Not sure how much code would that
>> save but would maube be more readable, a little bit more self
>> documenting.
>
> Actually this loop here should be pushed back to the engine (as part of
> later patches). After that transformation, using i915_gem_request_retire()
> is even clearer. But _retire__upto does become the main way in which we
> retire requests (having killed off retire_requests_ring in favour of
> explict wait/poll+retire).

That sounds good.

>> So far it all looks reasonable to me, but apart from the comments
>> above, I want to do another pass anyway.
>
> There's a few more changes afoot as well (minor ones concerning
> retire__upto and unexporting retire_requests_rig).

Ok.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches
  2015-03-30 12:02   ` Chris Wilson
@ 2015-03-30 14:59     ` Daniel Vetter
  0 siblings, 0 replies; 80+ messages in thread
From: Daniel Vetter @ 2015-03-30 14:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Jani Nikula

On Mon, Mar 30, 2015 at 01:02:50PM +0100, Chris Wilson wrote:
> On Fri, Mar 27, 2015 at 11:02:10AM +0000, Chris Wilson wrote:
> > Since
> > 
> > commit 17cabf571e50677d980e9ab2a43c5f11213003ae
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Wed Jan 14 11:20:57 2015 +0000
> > 
> >     drm/i915: Trim the command parser allocations
> > 
> > we may then try to allocate a zero-sized object and attempt to extract
> > its pages. Understandably this fails.
> 
> The original failure was in
> 
> committ b9ffd80ed659c559152c042e74741f4f60cac691
> Author: Brad Volkin <bradley.d.volkin@intel.com>
> Date:   Thu Dec 11 12:13:10 2014 -0800
> 
>     drm/i915: Use batch length instead of object size in command parser
> 
> merged in the v4.0 cycle.
> 
> Jani, pretty please?

Cherry-picked and added the real sha1 citation, thanks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations
  2015-03-30 14:45       ` Tvrtko Ursulin
@ 2015-03-30 15:07         ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-03-30 15:07 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Lionel Landwerlin, intel-gfx

On Mon, Mar 30, 2015 at 03:45:04PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/30/2015 03:09 PM, Chris Wilson wrote:
> >On Mon, Mar 30, 2015 at 02:52:26PM +0100, Tvrtko Ursulin wrote:
> >>>+static void
> >>>+__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
> >>
> >>It is a bit annoying (for readability) that it can be rq, req and request.
> >
> >Nonsense they are all rq and struct i915_request. Or once have been and
> >so will again. /prophecy
> 
> rq is least spread in the codebase, and even the worst option of the
> three since it sounds like a queue of some sort.

It's much clearer than req, matches equivalent implementations in
userspace ;-) rq is the local variable, and request the verbose version
for structure memers.

> >>>+err:
> >>>+	for (i = 0; i < n; i++) {
> >>>+		if (ret == 0) {
> >>>+			int ring = requests[i]->ring->id;
> >>>+			if (obj->last_read_req[ring] == requests[i])
> >>>+				i915_gem_object_retire__read(obj, ring);
> >>>+			if (obj->last_write_req == requests[i])
> >>>+				i915_gem_object_retire__write(obj);
> >>
> >>Above four lines seem to be identical functionality to similar four
> >>in __i915_gem_object_sync.
> >
> >Yes. Extracting them ended up looking worse (imo).
> 
> It would be a single function call taking object and request, how
> can it be worse? Should be more readable with a good name.

I wanted i915_gem_object_retire_request. However,  done.
 
> >>Also, _retire__read will do _retire__write if there is one on the
> >>same ring. And here by definition they are since it is the same
> >>request, no?
> >
> >No. It's subtle but here is the bug I pointed out from before. Once we
> >drop the lock, we no longer can make assumptions about the state of obj.
> 
> You mean request might have disappeared from last_read_req but is
> still on last_write_req? But how, since if it was retired that
> shouldn't be possible.

Just that last_write_req != last_read_req either before or after the
wait (and doubly so after the wait, which is where we had the previous
bug).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging
  2015-03-27 16:12     ` Chris Wilson
@ 2015-03-30 19:15       ` Paulo Zanoni
  0 siblings, 0 replies; 80+ messages in thread
From: Paulo Zanoni @ 2015-03-30 19:15 UTC (permalink / raw)
  To: Chris Wilson, Paulo Zanoni, Intel Graphics Development,
	Daniel Vetter, Mika Kuoppala, Paulo Zanoni

2015-03-27 13:12 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
> On Fri, Mar 27, 2015 at 12:34:05PM -0300, Paulo Zanoni wrote:
>> 2015-03-27 8:01 GMT-03:00 Chris Wilson <chris@chris-wilson.co.uk>:
>> > Delay the expensive read on the FPGA_DBG register from once per mmio to
>> > once per forcewake section when we are doing the general wellbeing
>> > check rather than the targetted error detection. This almost reduces
>> > the overhead of the debug facility (for example when submitting execlists)
>> > to zero whilst keeping the debug checks around.
>> >
>> > v2: Enable one-shot mmio debugging from the interrupt check as well as a
>> >     safeguard to catch invalid display writes from outside the powerwell.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
>> >  1 file changed, 30 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> > index ab5cc94588e1..0e32bbbcada8 100644
>> > --- a/drivers/gpu/drm/i915/intel_uncore.c
>> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> > @@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
>> >  }
>> >
>> >  static void
>> > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
>> > +{
>> > +       static bool mmio_debug_once = true;
>> > +
>> > +       if (i915.mmio_debug || !mmio_debug_once)
>> > +               return;
>> > +
>> > +       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
>> > +               DRM_DEBUG("Unclaimed register detected, "
>> > +                         "enabling oneshot unclaimed register reporting. "
>> > +                         "Please use i915.mmio_debug=N for more information.\n");
>> > +               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>> > +               i915.mmio_debug = mmio_debug_once--;
>> > +       }
>> > +}
>> > +
>> > +static void
>> > +fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
>> > +{
>> > +       hsw_unclaimed_reg_detect(dev_priv);
>> > +       fw_domains_put(dev_priv, fw_domains);
>> > +}
>>
>> This means we won't check during the forcewake puts that are on the
>> register read/write macros. Is this intentional?
>
> Not really. But the check still catches any mistakes there even though
> we know they are by safe.
>
>> I tried checking the
>> FW code calls, and it seems to me that we're not really going to run
>> hsw_unclaimed_reg_detect very frequently anymore. I wonder if there's
>> the risk of staying a long time without running it. But maybe I'm just
>> wrong.
>
> It gets run after every set of register writes (where set is defined as
> activity on a single cpu within 1ms). It gets run before the powerwell
> is disabled. Look at the profiles, you will see that hsw detect is still
> called quite frequently. And by virtue it does not need to be run very
> often to catch issues anyway.
>
>> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
>> > +           dev_priv->uncore.funcs.force_wake_put == fw_domains_put)
>>
>> My fear here is that simple changes to the FW code by a future
>> programmer could unintentionally kill the unclaimed register detection
>> feature, and we probably wouldn't notice for a looong time. Why not
>> just omit this fw_domains_put check, since it is true for all
>> platforms where HAS_FPG_DBG_UNCLAIMED is also true? The side effect of
>> calling fw_domains_put() when we shouldn't is probably more noticeable
>> than having unclaimed register detection gone.
>
> Pardon?

My suggestion was to find a way to transform this "if" statement above
somehow into a check just for "if (has_fpga_dbg_unclaimed())", without
relying on whatever is assigned to funcs.force_wake_put, due to the
fear that a refactoring could accidentally kill the unclaimed register
detection. But it was just a suggestion.

>
>> > +               dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
>> > +
>> >         /* All future platforms are expected to require complex power gating */
>> >         WARN_ON(dev_priv->uncore.fw_domains == 0);
>> >  }
>> > @@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
>> >
>> >  void intel_uncore_check_errors(struct drm_device *dev)
>> >  {
>> > -       struct drm_i915_private *dev_priv = dev->dev_private;
>> > -
>> > -       if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
>> > -           (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
>> > -               DRM_ERROR("Unclaimed register before interrupt\n");
>> > -               __raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>> > -       }
>> > +       if (HAS_FPGA_DBG_UNCLAIMED(dev))
>> > +               hsw_unclaimed_reg_detect(to_i915(dev));
>>
>> This means we won't check for unclaimed registers during interrupts if
>> i915.mmio_debug is being used, which is probably not what we want.
>
> It's exactly what you want. It still does the debug check if you have
> mmio_debug unset. If you have mmio_debug set, it means you are debugging
> i915 register mmio, since that is all we can reliably debug.
>
>> One of the things that worries me a little is that now we'll be
>> running a mostly-display-related feature only when certain pieces of
>> non-display-related code run. Instead of doing the checks at the
>> forcewake puts, maybe we could tie the frequency of the checks to
>> something in the display code, or just do the check every X seconds. I
>> don't really know what would be ideal here, I'm just throwing the
>> ideas. I'm also not blocking this patch, just pointing things that
>> could maybe be improved.
>
> Sure, all you would need to do is add the check to every rpm_put() if you
> feel paranoid (it will be run before the powerwell is dropped by
> design).
>
>> Since it's impacting performance, perhaps we could even completely
>> kill unclaimed register detection from the normal use case, hiding it
>> behind i915.mmio_debug=1 (and maybe a kconfig option)? We would then
>> instruct QA and developers to always have the option enabled. Just
>> like QA needs to have lockdep enabled, we could ask it to have
>> mmio_debug enabled all the time too.
>
> Whilst I like the idea, having debug code running in the wild (at a
> frequency high enough to catch bugs, but low enough not to be noticed)
> is invaluable.

Some other ideas that could be worth discussing:

- Adding a check for the range of registers that are covered by
FPGA_DBG. I imagine most/all of your performance sensitive registers
are outside it, so maybe this change would be enough to fix the issues
you're seeing, potentially replacing this patch.

- It seems that most of the times we call I915_WRITE/READ, the
register argument is seen by the compiler as a constant (I checked
this with __builtin_constant_p()). Maybe we could try to exploit this
builtin to make a special-case that allows the compiler to optimize
away all our "if" statements we have on the register writing macros.
Of course, since not all our reg writes are constant, we'd still need
the older/slower version.

- Didn't we ever discuss replacing I915_WRITE with more specialized
macros that would be used just on specific register ranges? On gen8 I
can see, for example: a range requiring forcewake, a range requiring
fpga_dbg and a range requiring nothing. On gen9 I see even more. Would
the little performance gain justify the change?

It seems you're doing some optimizations, so maybe one of the ideas
could be interesting...

Thanks,
Paulo

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre



-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines
  2015-03-27 11:01 ` [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
@ 2015-04-02 11:09   ` Deepak S
  2015-04-02 11:39     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Deepak S @ 2015-04-02 11:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On Friday 27 March 2015 04:31 PM, Chris Wilson wrote:
> This reverts commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Jun 12 10:28:55 2014 +0100
>
>      drm/i915: Restrict GPU boost to the RCS engine
>
> The premise that media/blitter workloads are not affected by boosting is
> patently false with a trip through igt. The question that remains is
> what exactly is going wrong with the media workload that prompted this?
> Hopefully that would be fixed by the missing agressive downclocking, in
> addition to the extra restrictions imposed on how frequent a process is
> allowed to boost.

we may have to look at media workload. Last time when we observed that for
a 1080p HD clip GPU freq was staying at Rp0 most of the time.
Hopefully aggressive downclocking should help

Acked-by: Deepak S  <deepak.s@linux.intel.com>

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Deepak S <deepak.s@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d54f6a277d82..05f94ee8ea37 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1222,7 +1222,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	timeout_expire = timeout ?
>   		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
>   
> -	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
> +	if (INTEL_INFO(dev)->gen >= 6)
>   		gen6_rps_boost(dev_priv, file_priv);
>   
>   	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail
  2015-03-27 11:01 ` [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail Chris Wilson
@ 2015-04-02 11:21   ` Deepak S
  0 siblings, 0 replies; 80+ messages in thread
From: Deepak S @ 2015-04-02 11:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, Rodrigo Vivi



On Friday 27 March 2015 04:31 PM, Chris Wilson wrote:
> Reuse the same reclocking strategy for Baytail as on its bigger brethren,
> Sandybridge and Ivybridge. In particular, this makes the device quicker
> to reclock (both up and down) though the tendency now is to downclock
> more aggressively to compensate for the RPS boosts.
>
> v2: Rebase
> v3: Exclude Cherrytrail as Deepak was concerned that the increased
> number of register writes would wake the common powerwell too often.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Deepak S <deepak.s@linux.intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 3 +++
>   drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>   drivers/gpu/drm/i915/i915_reg.h | 2 --
>   drivers/gpu/drm/i915/intel_pm.c | 8 +++++++-
>   4 files changed, 12 insertions(+), 5 deletions(-)

Looks fine to me
Reviewed-by: Deepak S<deepak.s@linux.intel.com>

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 701079429832..c80e2e5e591a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1033,6 +1033,9 @@ struct intel_gen6_power_mgmt {
>   	u8 rp0_freq;		/* Non-overclocked max frequency. */
>   	u32 cz_freq;
>   
> +	u8 up_threshold; /* Current %busy required to uplock */
> +	u8 down_threshold; /* Current %busy required to downclock */
> +
>   	int last_adj;
>   	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
>   
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 14ecb4d13a1a..128a6f40b450 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1049,7 +1049,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
>   	if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
>   		if (!vlv_c0_above(dev_priv,
>   				  &dev_priv->rps.down_ei, &now,
> -				  VLV_RP_DOWN_EI_THRESHOLD))
> +				  dev_priv->rps.down_threshold))

>   			events |= GEN6_PM_RP_DOWN_THRESHOLD;
>   		dev_priv->rps.down_ei = now;
>   	}
> @@ -1057,7 +1057,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
>   	if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
>   		if (vlv_c0_above(dev_priv,
>   				 &dev_priv->rps.up_ei, &now,
> -				 VLV_RP_UP_EI_THRESHOLD))
> +				 dev_priv->rps.up_threshold))
>   			events |= GEN6_PM_RP_UP_THRESHOLD;
>   		dev_priv->rps.up_ei = now;
>   	}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index b522eb6e59a4..faf8f829e61f 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -671,8 +671,6 @@ enum skl_disp_power_wells {
>   #define   FB_FMAX_VMIN_FREQ_LO_MASK		0xf8000000
>   
>   #define VLV_CZ_CLOCK_TO_MILLI_SEC		100000
> -#define VLV_RP_UP_EI_THRESHOLD			90
> -#define VLV_RP_DOWN_EI_THRESHOLD		70
>   
>   /* vlv2 north clock has */
>   #define CCK_FUSE_REG				0x8
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index fa4ccb346389..65b33a4f82fc 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3930,6 +3930,8 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>   		    GEN6_RP_DOWN_IDLE_AVG);
>   
>   	dev_priv->rps.power = new_power;
> +	dev_priv->rps.up_threshold = threshold_up;
> +	dev_priv->rps.down_threshold = threshold_down;
>   	dev_priv->rps.last_adj = 0;
>   }
>   
> @@ -4001,8 +4003,11 @@ static void valleyview_set_rps(struct drm_device *dev, u8 val)
>   		      "Odd GPU freq value\n"))
>   		val &= ~1;
>   
> -	if (val != dev_priv->rps.cur_freq)
> +	if (val != dev_priv->rps.cur_freq) {
>   		vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
> +		if (!IS_CHERRYVIEW(dev_priv))
> +			gen6_set_rps_thresholds(dev_priv, val);
> +	}
>   
>   	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
>   
> @@ -4051,6 +4056,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
>   				& GENFREQSTATUS) == 0, 100))
>   		DRM_ERROR("timed out waiting for Punit\n");
>   
> +	gen6_set_rps_thresholds(dev_priv, val);
>   	vlv_force_gfx_clock(dev_priv, false);
>   
>   	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines
  2015-04-02 11:09   ` Deepak S
@ 2015-04-02 11:39     ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-04-02 11:39 UTC (permalink / raw)
  To: Deepak S; +Cc: intel-gfx

On Thu, Apr 02, 2015 at 04:39:56PM +0530, Deepak S wrote:
> 
> 
> On Friday 27 March 2015 04:31 PM, Chris Wilson wrote:
> >This reverts commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> >Author: Chris Wilson <chris@chris-wilson.co.uk>
> >Date:   Thu Jun 12 10:28:55 2014 +0100
> >
> >     drm/i915: Restrict GPU boost to the RCS engine
> >
> >The premise that media/blitter workloads are not affected by boosting is
> >patently false with a trip through igt. The question that remains is
> >what exactly is going wrong with the media workload that prompted this?
> >Hopefully that would be fixed by the missing agressive downclocking, in
> >addition to the extra restrictions imposed on how frequent a process is
> >allowed to boost.
> 
> we may have to look at media workload. Last time when we observed that for
> a 1080p HD clip GPU freq was staying at Rp0 most of the time.
> Hopefully aggressive downclocking should help
> 
> Acked-by: Deepak S  <deepak.s@linux.intel.com>

I think here what will help most is limiting the RPS boost to once per
client (per busy period). I've actually found a couple of other places
where we will artificially boost clocks: mmioflips and sw-semaphores.
I've patches to also restrict those to once per busy period. The plan is
that we only give RPS boosts to missed pageflips (via the vblank
tracker) and only the first time a client stalls on a bo.

I think with those in place, we can have the best of both worlds -
instant boost for compute/gpu bound applications, and low render
frequencies for sustained throughput.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 43/49] drm/i915: Do not zero initialise page tables
  2015-03-27 11:02 ` [PATCH 43/49] drm/i915: Do not zero initialise page tables Chris Wilson
@ 2015-04-07 14:46   ` Mika Kuoppala
  2015-04-07 15:00     ` Chris Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Mika Kuoppala @ 2015-04-07 14:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

Chris Wilson <chris@chris-wilson.co.uk> writes:

> After we successfully allocate them, we will fill them with their
> initial contents (either the chain of page tables, or a pointer to the
> scratch page).
>
> Regression from
> commit 06fda602dbca9c59d87db7da71192e4b54c9f5ff
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Tue Feb 24 16:22:36 2015 +0000
>
>     drm/i915: Create page table allocators
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com> (v3+)
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---

The gen8 parts of dynamic page table series, which Michel will resend
in near future, address this by not zero filling but pointing
unused page directory entries to scratch page table.

-Mika


>  drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 543fff104401..4a50e1db63dc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -426,7 +426,7 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
>  	if (!pd)
>  		return ERR_PTR(-ENOMEM);
>  
> -	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	pd->page = alloc_page(GFP_KERNEL);
>  	if (!pd->page) {
>  		kfree(pd);
>  		return ERR_PTR(-ENOMEM);
> -- 
> 2.1.4
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 43/49] drm/i915: Do not zero initialise page tables
  2015-04-07 14:46   ` Mika Kuoppala
@ 2015-04-07 15:00     ` Chris Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Chris Wilson @ 2015-04-07 15:00 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx

On Tue, Apr 07, 2015 at 05:46:19PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > After we successfully allocate them, we will fill them with their
> > initial contents (either the chain of page tables, or a pointer to the
> > scratch page).
> >
> > Regression from
> > commit 06fda602dbca9c59d87db7da71192e4b54c9f5ff
> > Author: Ben Widawsky <benjamin.widawsky@intel.com>
> > Date:   Tue Feb 24 16:22:36 2015 +0000
> >
> >     drm/i915: Create page table allocators
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com> (v3+)
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> 
> The gen8 parts of dynamic page table series, which Michel will resend
> in near future, address this by not zero filling but pointing
> unused page directory entries to scratch page table.

However, it is currently a regression.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2015-04-07 15:01 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27 11:01 A picking of low hanging fruit Chris Wilson
2015-03-27 11:01 ` [PATCH 01/49] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
2015-03-27 11:01 ` [PATCH 02/49] drm/i915: Agressive downclocking on Baytrail Chris Wilson
2015-04-02 11:21   ` Deepak S
2015-03-27 11:01 ` [PATCH 03/49] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
2015-03-27 11:01 ` [PATCH 04/49] drm/i915: Add i915_gem_request_unreference__unlocked Chris Wilson
2015-03-27 16:42   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 05/49] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
2015-03-27 11:01 ` [PATCH 06/49] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
2015-03-27 11:01 ` [PATCH 07/49] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
2015-03-27 11:01 ` [PATCH 08/49] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
2015-04-02 11:09   ` Deepak S
2015-04-02 11:39     ` Chris Wilson
2015-03-27 11:01 ` [PATCH 09/49] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
2015-03-27 11:01 ` [PATCH 10/49] drm/i915: Tidy batch pool logic Chris Wilson
2015-03-27 11:59   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 11/49] drm/i915: Split the batch pool by engine Chris Wilson
2015-03-27 11:01 ` [PATCH 12/49] drm/i915: Free batch pool when idle Chris Wilson
2015-03-27 11:01 ` [PATCH 13/49] drm/i915: Split batch pool into size buckets Chris Wilson
2015-03-27 11:01 ` [PATCH 14/49] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
2015-03-27 11:01 ` [PATCH 15/49] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
2015-03-27 11:01 ` [PATCH 16/49] drm/i915: Optimistically spin for the request completion Chris Wilson
2015-03-27 11:42   ` Tvrtko Ursulin
2015-03-27 11:01 ` [PATCH 17/49] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
2015-03-30 13:52   ` Tvrtko Ursulin
2015-03-30 14:09     ` Chris Wilson
2015-03-30 14:45       ` Tvrtko Ursulin
2015-03-30 15:07         ` Chris Wilson
2015-03-27 11:01 ` [PATCH 18/49] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
2015-03-27 15:34   ` Paulo Zanoni
2015-03-27 16:12     ` Chris Wilson
2015-03-30 19:15       ` Paulo Zanoni
2015-03-27 11:01 ` [PATCH 19/49] drm/i915: Record ring->start address in error state Chris Wilson
2015-03-27 11:01 ` [PATCH 20/49] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
2015-03-27 11:01 ` [PATCH 21/49] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
2015-03-27 14:19   ` Daniel Vetter
2015-03-27 14:25     ` Chris Wilson
2015-03-27 11:01 ` [PATCH 22/49] drm/i915: Map the execlists context regs once during pinning Chris Wilson
2015-03-27 11:01 ` [PATCH 23/49] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
2015-03-27 11:01 ` [PATCH 24/49] drm/i915: Tidy execlist submission Chris Wilson
2015-03-27 11:01 ` [PATCH 25/49] drm/i915: Move the execlists retirement to the right spot Chris Wilson
2015-03-27 11:01 ` [PATCH 26/49] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
2015-03-27 11:01 ` [PATCH 27/49] drm/i915: Use a separate slab for requests Chris Wilson
2015-03-27 14:20   ` Daniel Vetter
2015-03-27 14:27     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 28/49] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
2015-03-27 11:02 ` [PATCH 29/49] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
2015-03-27 14:26   ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 30/49] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
2015-03-27 11:02 ` [PATCH 31/49] drm/i915: Reduce locking in execlist command submission Chris Wilson
2015-03-27 11:40   ` Tvrtko Ursulin
2015-03-27 11:47     ` Chris Wilson
2015-03-27 11:54       ` Tvrtko Ursulin
2015-03-27 14:15       ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 32/49] drm/i915: Reduce more " Chris Wilson
2015-03-27 11:02 ` [PATCH 33/49] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
2015-03-27 14:13   ` Daniel Vetter
2015-03-27 14:14     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 34/49] drm/i915: Tidy " Chris Wilson
2015-03-27 11:02 ` [PATCH 35/49] drm/i915: Remove request retirement before each batch Chris Wilson
2015-03-27 11:02 ` [PATCH 36/49] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
2015-03-27 11:02 ` [PATCH 37/49] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
2015-03-27 11:02 ` [PATCH 38/49] drm/i915: Skip allocating shadow batch for 0-length batches Chris Wilson
2015-03-27 14:28   ` Daniel Vetter
2015-03-30 12:02   ` Chris Wilson
2015-03-30 14:59     ` Daniel Vetter
2015-03-27 11:02 ` [PATCH 39/49] drm/i915: Remove request->uniq Chris Wilson
2015-03-27 11:02 ` [PATCH 40/49] drm/i915: Cache the reset_counter for the request Chris Wilson
2015-03-27 11:02 ` [PATCH 41/49] drm/i915: Allocate context objects from stolen Chris Wilson
2015-03-27 11:02 ` [PATCH 42/49] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2015-03-27 11:02 ` [PATCH 43/49] drm/i915: Do not zero initialise page tables Chris Wilson
2015-04-07 14:46   ` Mika Kuoppala
2015-04-07 15:00     ` Chris Wilson
2015-03-27 11:02 ` [PATCH 44/49] drm/i915: The argument for postfix is redundant Chris Wilson
2015-03-27 11:02 ` [PATCH 45/49] drm/i915: Record the position of the start of the request Chris Wilson
2015-03-27 11:02 ` [PATCH 46/49] drm/i915: Cache the execlist ctx descriptor Chris Wilson
2015-03-27 11:02 ` [PATCH 47/49] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2015-03-27 11:02 ` [PATCH 48/49] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
2015-03-27 11:02 ` [PATCH 49/49] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
2015-03-28  6:21   ` shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.