Low hanging fruit take 2

All of lore.kernel.org
 help / color / mirror / Atom feed

* Low hanging fruit take 2
@ 2015-04-07 15:20 Chris Wilson
  2015-04-07 15:20 ` [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
                   ` (59 more replies)
  0 siblings, 60 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Lots of pickings here to improve both microbenchmarks and beyond across
several generations. Have fun!
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:16   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 02/70] drm/i915: Fix the flip synchronisation to consider mmioflips Chris Wilson
                   ` (58 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem.c |  4 ++++
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4f5dae9a23f9..51b21483b95f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1987,6 +1987,10 @@ struct drm_i915_gem_object {
 
 	struct sg_table *pages;
 	int pages_pin_count;
+	struct get_page {
+		struct scatterlist *sg;
+		int last;
+	} get_page;
 
 	/* prime dma-buf support */
 	void *dma_buf_vmapping;
@@ -2665,15 +2669,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 				    int *needs_clflush);
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
-static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
+
+static inline int __sg_page_count(struct scatterlist *sg)
+{
+	return sg->length >> PAGE_SHIFT;
+}
+
+static inline struct page *
+i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
-	struct sg_page_iter sg_iter;
+	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
+		return NULL;
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
-		return sg_page_iter_page(&sg_iter);
+	if (n < obj->get_page.last) {
+		obj->get_page.sg = obj->pages->sgl;
+		obj->get_page.last = 0;
+	}
+
+	while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
+		obj->get_page.last += __sg_page_count(obj->get_page.sg++);
+		if (unlikely(sg_is_chain(obj->get_page.sg)))
+			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
+	}
 
-	return NULL;
+	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
 }
+
 static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 {
 	BUG_ON(obj->pages == NULL);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be4f2645b637..567affeafec4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2178,6 +2178,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 		return ret;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+
+	obj->get_page.sg = obj->pages->sgl;
+	obj->get_page.last = 0;
+
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 02/70] drm/i915: Fix the flip synchronisation to consider mmioflips
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
  2015-04-07 15:20 ` [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
                   ` (57 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Currently we emit semaphore synchronisation as if we were going to flip
using the target CS engine, but we then change our minds and do the flip
using the CPU. Consequently we write instructions to the ring but never
use them - even to the point of filling that ring up entirely and never
submitting a request.

The wrinkle in the ointment is that we have to tell a white lie to
pin-to-display for it to skip the synchronisation for mmioflips as we
will create a task specifically for that slow synchronisation. An oddity
of note is the discrepancy in requests that we tell to pin-display to
serialise to and that we then eventually wait upon. This is due to a
limitation in the i915_gem_object_sync() routine that will be lifted
later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3af87e4b0f66..4af89c27504e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10224,6 +10224,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	enum pipe pipe = intel_crtc->pipe;
 	struct intel_unpin_work *work;
 	struct intel_engine_cs *ring;
+	bool mmio_flip;
 	int ret;
 
 	/*
@@ -10321,15 +10322,23 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		ring = &dev_priv->ring[RCS];
 	}
 
+	mmio_flip = use_mmio_flip(ring, obj);
+
+	/* When using CS flips, we want to emit semaphores between rings.
+	 * However, when using mmio flips we will create a task to do the
+	 * synchronisation, so all we want here is to pin the framebuffer
+	 * into the display plane and skip any waits.
+	 */
 	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
-					 crtc->primary->state, ring);
+					 crtc->primary->state,
+					 mmio_flip ? i915_gem_request_get_ring(obj->last_read_req) : ring);
 	if (ret)
 		goto cleanup_pending;
 
 	work->gtt_offset = intel_plane_obj_offset(to_intel_plane(primary), obj)
 						  + intel_crtc->dspaddr_offset;
 
-	if (use_mmio_flip(ring, obj)) {
+	if (mmio_flip) {
 		ret = intel_queue_mmio_flip(dev, crtc, fb, obj, ring,
 					    page_flip_flags);
 		if (ret)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
  2015-04-07 15:20 ` [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
  2015-04-07 15:20 ` [PATCH 02/70] drm/i915: Fix the flip synchronisation to consider mmioflips Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:23   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 04/70] drm/i915: Agressive downclocking on Baytrail Chris Wilson
                   ` (56 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Synchronising to an object active on the same ring is a no-op, for the
benefit of execbuffer scheduler. However, for CS flips this means that
we can forgo checking whether the last write request of the object is
actually queued and more importantly whether the cache flush for the
write was emitted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4af89c27504e..0415e40cef6e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10347,6 +10347,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		i915_gem_request_assign(&work->flip_queued_req,
 					obj->last_write_req);
 	} else {
+		if (obj->last_write_req) {
+			ret = i915_gem_check_olr(obj->last_write_req);
+			if (ret)
+				goto cleanup_unpin;
+		}
+
 		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
 						   page_flip_flags);
 		if (ret)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 04/70] drm/i915: Agressive downclocking on Baytrail
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (2 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 05/70] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
                   ` (55 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Rodrigo Vivi

Reuse the same reclocking strategy for Baytail as on its bigger brethren,
Sandybridge and Ivybridge. In particular, this makes the device quicker
to reclock (both up and down) though the tendency now is to downclock
more aggressively to compensate for the RPS boosts.

v2: Rebase
v3: Exclude Cherrytrail as Deepak was concerned that the increased
number of register writes would wake the common powerwell too often.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 3 +++
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h | 2 --
 drivers/gpu/drm/i915/intel_pm.c | 8 +++++++-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 51b21483b95f..31011988d153 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1036,6 +1036,9 @@ struct intel_gen6_power_mgmt {
 	u8 rp0_freq;		/* Non-overclocked max frequency. */
 	u32 cz_freq;
 
+	u8 up_threshold; /* Current %busy required to uplock */
+	u8 down_threshold; /* Current %busy required to downclock */
+
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 14ecb4d13a1a..128a6f40b450 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1049,7 +1049,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 	if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
 		if (!vlv_c0_above(dev_priv,
 				  &dev_priv->rps.down_ei, &now,
-				  VLV_RP_DOWN_EI_THRESHOLD))
+				  dev_priv->rps.down_threshold))
 			events |= GEN6_PM_RP_DOWN_THRESHOLD;
 		dev_priv->rps.down_ei = now;
 	}
@@ -1057,7 +1057,7 @@ static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
 	if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
 		if (vlv_c0_above(dev_priv,
 				 &dev_priv->rps.up_ei, &now,
-				 VLV_RP_UP_EI_THRESHOLD))
+				 dev_priv->rps.up_threshold))
 			events |= GEN6_PM_RP_UP_THRESHOLD;
 		dev_priv->rps.up_ei = now;
 	}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 12e6fd1bc1f0..626db2b134c7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -671,8 +671,6 @@ enum skl_disp_power_wells {
 #define   FB_FMAX_VMIN_FREQ_LO_MASK		0xf8000000
 
 #define VLV_CZ_CLOCK_TO_MILLI_SEC		100000
-#define VLV_RP_UP_EI_THRESHOLD			90
-#define VLV_RP_DOWN_EI_THRESHOLD		70
 
 /* vlv2 north clock has */
 #define CCK_FUSE_REG				0x8
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 63e642bc9713..50c03472ea41 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3934,6 +3934,8 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 		    GEN6_RP_DOWN_IDLE_AVG);
 
 	dev_priv->rps.power = new_power;
+	dev_priv->rps.up_threshold = threshold_up;
+	dev_priv->rps.down_threshold = threshold_down;
 	dev_priv->rps.last_adj = 0;
 }
 
@@ -4005,8 +4007,11 @@ static void valleyview_set_rps(struct drm_device *dev, u8 val)
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
-	if (val != dev_priv->rps.cur_freq)
+	if (val != dev_priv->rps.cur_freq) {
 		vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
+		if (!IS_CHERRYVIEW(dev_priv))
+			gen6_set_rps_thresholds(dev_priv, val);
+	}
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
@@ -4055,6 +4060,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
 				& GENFREQSTATUS) == 0, 100))
 		DRM_ERROR("timed out waiting for Punit\n");
 
+	gen6_set_rps_thresholds(dev_priv, val);
 	vlv_force_gfx_clock(dev_priv, false);
 
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 05/70] drm/i915: Fix computation of last_adjustment for RPS autotuning
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (3 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 04/70] drm/i915: Agressive downclocking on Baytrail Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
                   ` (54 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

The issue is that by computing the last_adj value after applying the
clamping, we can end up with a bogus value for feeding into the next RPS
autotuning step.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 128a6f40b450..8b5e0358c592 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1095,21 +1095,20 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	pm_iir |= vlv_wa_c0_ei(dev_priv, pm_iir);
 
 	adj = dev_priv->rps.last_adj;
+	new_delay = dev_priv->rps.cur_freq;
 	if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
 		if (adj > 0)
 			adj *= 2;
-		else {
-			/* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv->dev) ? 2 : 1;
-		}
-		new_delay = dev_priv->rps.cur_freq + adj;
-
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(dev_priv) ? 2 : 1;
 		/*
 		 * For better performance, jump directly
 		 * to RPe if we're below it.
 		 */
-		if (new_delay < dev_priv->rps.efficient_freq)
+		if (new_delay < dev_priv->rps.efficient_freq - adj) {
 			new_delay = dev_priv->rps.efficient_freq;
+			adj = 0;
+		}
 	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
 		if (dev_priv->rps.cur_freq > dev_priv->rps.efficient_freq)
 			new_delay = dev_priv->rps.efficient_freq;
@@ -1119,24 +1118,22 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
 		if (adj < 0)
 			adj *= 2;
-		else {
-			/* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv->dev) ? -2 : -1;
-		}
-		new_delay = dev_priv->rps.cur_freq + adj;
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(dev_priv) ? -2 : -1;
 	} else { /* unknown event */
-		new_delay = dev_priv->rps.cur_freq;
+		adj = 0;
 	}
 
+	dev_priv->rps.last_adj = adj;
+
 	/* sysfs frequency interfaces may have snuck in while servicing the
 	 * interrupt
 	 */
+	new_delay += adj;
 	new_delay = clamp_t(int, new_delay,
 			    dev_priv->rps.min_freq_softlimit,
 			    dev_priv->rps.max_freq_softlimit);
 
-	dev_priv->rps.last_adj = new_delay - dev_priv->rps.cur_freq;
-
 	intel_set_rps(dev_priv->dev, new_delay);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (4 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 05/70] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:30   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
                   ` (53 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ander Conselvan de Oliveira

As we perform the mmio-flip without any locking and then try to acquire
the struct_mutex prior to dereferencing the request, it is possible for
userspace to queue a new pageflip before the worker can finish clearing
the old state - and then it will clear the new flip request. The result
is that the new flip could be completed before the GPU has finished
rendering.

The bugs stems from removing the seqno checking in
commit 536f5b5e86b225dab94c7ff8061ae482b6077387
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Thu Nov 6 11:03:40 2014 +0200

    drm/i915: Make mmio flip wait for seqno in the work function

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  6 ++++--
 drivers/gpu/drm/i915/intel_display.c | 39 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/intel_drv.h     |  4 ++--
 3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 31011988d153..0bc913934d3f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2140,10 +2140,12 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
 	return req ? req->ring : NULL;
 }
 
-static inline void
+static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
-	kref_get(&req->ref);
+	if (req)
+		kref_get(&req->ref);
+	return req;
 }
 
 static inline void
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0415e40cef6e..94c09bf0047d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10105,22 +10105,18 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 
 static void intel_mmio_flip_work_func(struct work_struct *work)
 {
-	struct intel_crtc *crtc =
-		container_of(work, struct intel_crtc, mmio_flip.work);
-	struct intel_mmio_flip *mmio_flip;
+	struct intel_mmio_flip *mmio_flip =
+		container_of(work, struct intel_mmio_flip, work);
 
-	mmio_flip = &crtc->mmio_flip;
-	if (mmio_flip->req)
-		WARN_ON(__i915_wait_request(mmio_flip->req,
-					    crtc->reset_counter,
-					    false, NULL, NULL) != 0);
+	if (mmio_flip->rq)
+		WARN_ON(__i915_wait_request(mmio_flip->rq,
+					    mmio_flip->crtc->reset_counter,
+					    false, NULL, NULL));
 
-	intel_do_mmio_flip(crtc);
-	if (mmio_flip->req) {
-		mutex_lock(&crtc->base.dev->struct_mutex);
-		i915_gem_request_assign(&mmio_flip->req, NULL);
-		mutex_unlock(&crtc->base.dev->struct_mutex);
-	}
+	intel_do_mmio_flip(mmio_flip->crtc);
+
+	i915_gem_request_unreference__unlocked(mmio_flip->rq);
+	kfree(mmio_flip);
 }
 
 static int intel_queue_mmio_flip(struct drm_device *dev,
@@ -10130,12 +10126,17 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 				 struct intel_engine_cs *ring,
 				 uint32_t flags)
 {
-	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_mmio_flip *mmio_flip;
+
+	mmio_flip = kmalloc(sizeof(*mmio_flip), GFP_KERNEL);
+	if (mmio_flip == NULL)
+		return -ENOMEM;
 
-	i915_gem_request_assign(&intel_crtc->mmio_flip.req,
-				obj->last_write_req);
+	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
+	mmio_flip->crtc = to_intel_crtc(crtc);
 
-	schedule_work(&intel_crtc->mmio_flip.work);
+	INIT_WORK(&mmio_flip->work, intel_mmio_flip_work_func);
+	schedule_work(&mmio_flip->work);
 
 	return 0;
 }
@@ -13059,8 +13060,6 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
 	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
 	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
 
-	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
-
 	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
 
 	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 686014bd5ec0..0bcc5f36a810 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -403,8 +403,9 @@ struct intel_pipe_wm {
 };
 
 struct intel_mmio_flip {
-	struct drm_i915_gem_request *req;
 	struct work_struct work;
+	struct drm_i915_gem_request *rq;
+	struct intel_crtc *crtc;
 };
 
 struct skl_pipe_wm {
@@ -489,7 +490,6 @@ struct intel_crtc {
 	} wm;
 
 	int scanline_offset;
-	struct intel_mmio_flip mmio_flip;
 
 	struct intel_crtc_atomic_commit atomic;
 };
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (5 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:31   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 08/70] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
                   ` (52 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we hit a vblank and see that have a pageflip queue but not yet
processed, ensure that the GPU is running at maximum in order to clear
the backlog. Pageflips are only queued for the following vblank, if we
miss it, there will be a visible stutter. Boosting the GPU frequency
doesn't prevent us from missing the target vblank, but it should help
the subsequent frames hitting theirs.

v2: Reorder vblank vs flip-complete so that we only check for a missed
flip after processing the completion events, and avoid spurious boosts.

v3: Rename missed_vblank
v4: Rebase
v5: Cancel the outstanding work in runtime suspend
v6: Rebase
v7: Rebase required fixing

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Deepak S<deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 11 ++++++++---
 drivers/gpu/drm/i915/intel_drv.h     |  2 ++
 drivers/gpu/drm/i915/intel_pm.c      | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 94c09bf0047d..1846fb510ebb 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10195,6 +10195,7 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	struct intel_unpin_work *work;
 
 	WARN_ON(!in_interrupt());
 
@@ -10202,12 +10203,16 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
 		return;
 
 	spin_lock(&dev->event_lock);
-	if (intel_crtc->unpin_work && __intel_pageflip_stall_check(dev, crtc)) {
+	work = intel_crtc->unpin_work;
+	if (work != NULL && __intel_pageflip_stall_check(dev, crtc)) {
 		WARN_ONCE(1, "Kicking stuck page flip: queued at %d, now %d\n",
-			 intel_crtc->unpin_work->flip_queued_vblank,
-			 drm_vblank_count(dev, pipe));
+			 work->flip_queued_vblank, drm_vblank_count(dev, pipe));
 		page_flip_completed(intel_crtc);
+		work = NULL;
 	}
+	if (work != NULL &&
+	    drm_vblank_count(dev, pipe) - work->flip_queued_vblank > 1)
+		intel_queue_rps_boost_for_request(dev, work->flip_queued_req);
 	spin_unlock(&dev->event_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 0bcc5f36a810..4f1d02af1237 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1263,6 +1263,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
 void gen6_rps_boost(struct drm_i915_private *dev_priv);
+void intel_queue_rps_boost_for_request(struct drm_device *dev,
+				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
 void skl_wm_get_hw_state(struct drm_device *dev);
 void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 50c03472ea41..3e98f30517c6 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6772,6 +6772,41 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 		return val / GT_FREQUENCY_MULTIPLIER;
 }
 
+struct request_boost {
+	struct work_struct work;
+	struct drm_i915_gem_request *rq;
+};
+
+static void __intel_rps_boost_work(struct work_struct *work)
+{
+	struct request_boost *boost = container_of(work, struct request_boost, work);
+
+	if (!i915_gem_request_completed(boost->rq, true))
+		gen6_rps_boost(to_i915(boost->rq->ring->dev));
+
+	i915_gem_request_unreference__unlocked(boost->rq);
+	kfree(boost);
+}
+
+void intel_queue_rps_boost_for_request(struct drm_device *dev,
+				       struct drm_i915_gem_request *rq)
+{
+	struct request_boost *boost;
+
+	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
+		return;
+
+	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
+	if (boost == NULL)
+		return;
+
+	i915_gem_request_reference(rq);
+	boost->rq = rq;
+
+	INIT_WORK(&boost->work, __intel_rps_boost_work);
+	queue_work(to_i915(dev)->wq, &boost->work);
+}
+
 void intel_pm_setup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 08/70] drm/i915: Deminish contribution of wait-boosting from clients
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (6 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 09/70] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
                   ` (51 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 39 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h     |  9 ++++++---
 drivers/gpu/drm/i915/i915_gem.c     | 35 ++++++++-------------------------
 drivers/gpu/drm/i915/intel_drv.h    |  3 ++-
 drivers/gpu/drm/i915/intel_pm.c     | 18 ++++++++++++++---
 5 files changed, 70 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 10ca5117fcee..9c23eec3277e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2239,6 +2239,44 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_rps_boost_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	ret = mutex_lock_interruptible(&dev_priv->rps.hw_lock);
+	if (ret)
+		goto unlock;
+
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		struct task_struct *task;
+
+		rcu_read_lock();
+		task = pid_task(file->pid, PIDTYPE_PID);
+		seq_printf(m, "%s [%d]: %d boosts%s\n",
+			   task ? task->comm : "<unknown>",
+			   task ? task->pid : -1,
+			   file_priv->rps_boosts,
+			   list_empty(&file_priv->rps_boost) ? "" : ", active");
+		rcu_read_unlock();
+	}
+	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
+
+	mutex_unlock(&dev_priv->rps.hw_lock);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+
+	return ret;
+}
+
 static int i915_llc(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
@@ -4704,6 +4742,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_ddb_info", i915_ddb_info, 0},
 	{"i915_sseu_status", i915_sseu_status, 0},
 	{"i915_drrs_status", i915_drrs_status, 0},
+	{"i915_rps_boost_info", i915_rps_boost_info, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0bc913934d3f..357d781095a2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1044,6 +1044,8 @@ struct intel_gen6_power_mgmt {
 
 	bool enabled;
 	struct delayed_work delayed_resume_work;
+	struct list_head clients;
+	unsigned boosts;
 
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
@@ -2193,12 +2195,13 @@ struct drm_i915_file_private {
 	struct {
 		spinlock_t lock;
 		struct list_head request_list;
-		struct delayed_work idle_work;
 	} mm;
 	struct idr context_idr;
 
-	atomic_t rps_wait_boost;
-	struct  intel_engine_cs *bsd_ring;
+	struct list_head rps_boost;
+	struct intel_engine_cs *bsd_ring;
+
+	unsigned rps_boosts;
 };
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 567affeafec4..102bef9a77c0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1181,14 +1181,6 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
 	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
-static bool can_wait_boost(struct drm_i915_file_private *file_priv)
-{
-	if (file_priv == NULL)
-		return true;
-
-	return !atomic_xchg(&file_priv->rps_wait_boost, true);
-}
-
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
@@ -1230,13 +1222,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev)->gen >= 6 && ring->id == RCS && can_wait_boost(file_priv)) {
-		gen6_rps_boost(dev_priv);
-		if (file_priv)
-			mod_delayed_work(dev_priv->wq,
-					 &file_priv->mm.idle_work,
-					 msecs_to_jiffies(100));
-	}
+	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
+		gen6_rps_boost(dev_priv, file_priv);
 
 	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
 		return -ENODEV;
@@ -5084,8 +5071,6 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
-	cancel_delayed_work_sync(&file_priv->mm.idle_work);
-
 	/* Clean up our request list when the client is going away, so that
 	 * later retire_requests won't dereference our soon-to-be-gone
 	 * file_priv.
@@ -5101,15 +5086,12 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 		request->file_priv = NULL;
 	}
 	spin_unlock(&file_priv->mm.lock);
-}
-
-static void
-i915_gem_file_idle_work_handler(struct work_struct *work)
-{
-	struct drm_i915_file_private *file_priv =
-		container_of(work, typeof(*file_priv), mm.idle_work.work);
 
-	atomic_set(&file_priv->rps_wait_boost, false);
+	if (!list_empty(&file_priv->rps_boost)) {
+		mutex_lock(&to_i915(dev)->rps.hw_lock);
+		list_del(&file_priv->rps_boost);
+		mutex_unlock(&to_i915(dev)->rps.hw_lock);
+	}
 }
 
 int i915_gem_open(struct drm_device *dev, struct drm_file *file)
@@ -5126,11 +5108,10 @@ int i915_gem_open(struct drm_device *dev, struct drm_file *file)
 	file->driver_priv = file_priv;
 	file_priv->dev_priv = dev->dev_private;
 	file_priv->file = file;
+	INIT_LIST_HEAD(&file_priv->rps_boost);
 
 	spin_lock_init(&file_priv->mm.lock);
 	INIT_LIST_HEAD(&file_priv->mm.request_list);
-	INIT_DELAYED_WORK(&file_priv->mm.idle_work,
-			  i915_gem_file_idle_work_handler);
 
 	ret = i915_gem_context_open(dev, file);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 4f1d02af1237..6163be8be812 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1262,7 +1262,8 @@ void gen6_update_ring_freq(struct drm_device *dev);
 void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
-void gen6_rps_boost(struct drm_i915_private *dev_priv);
+void gen6_rps_boost(struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv);
 void intel_queue_rps_boost_for_request(struct drm_device *dev,
 				       struct drm_i915_gem_request *rq);
 void ilk_wm_get_hw_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 3e98f30517c6..d3f4e9593db1 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4091,10 +4091,14 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 		dev_priv->rps.last_adj = 0;
 		I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
 	}
+
+	while (!list_empty(&dev_priv->rps.clients))
+		list_del_init(dev_priv->rps.clients.next);
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
-void gen6_rps_boost(struct drm_i915_private *dev_priv)
+void gen6_rps_boost(struct drm_i915_private *dev_priv,
+		    struct drm_i915_file_private *file_priv)
 {
 	u32 val;
 
@@ -4102,9 +4106,16 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv)
 	val = dev_priv->rps.max_freq_softlimit;
 	if (dev_priv->rps.enabled &&
 	    dev_priv->mm.busy &&
-	    dev_priv->rps.cur_freq < val) {
+	    dev_priv->rps.cur_freq < val &&
+	    (file_priv == NULL || list_empty(&file_priv->rps_boost))) {
 		intel_set_rps(dev_priv->dev, val);
 		dev_priv->rps.last_adj = 0;
+
+		if (file_priv != NULL) {
+			list_add(&file_priv->rps_boost, &dev_priv->rps.clients);
+			file_priv->rps_boosts++;
+		} else
+			dev_priv->rps.boosts++;
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
@@ -6782,7 +6793,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 
 	if (!i915_gem_request_completed(boost->rq, true))
-		gen6_rps_boost(to_i915(boost->rq->ring->dev));
+		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
 
 	i915_gem_request_unreference__unlocked(boost->rq);
 	kfree(boost);
@@ -6815,6 +6826,7 @@ void intel_pm_setup(struct drm_device *dev)
 
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
+	INIT_LIST_HEAD(&dev_priv->rps.clients);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 09/70] drm/i915: Re-enable RPS wait-boosting for all engines
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (7 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 08/70] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 10/70] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
                   ` (50 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: daniel.vetter

This reverts commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: Deepak S <deepak.s@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 102bef9a77c0..973f0c94f26e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,7 +1222,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (ring->id == RCS && INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
 	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 10/70] drm/i915: Split i915_gem_batch_pool into its own header
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (8 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 09/70] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 11/70] drm/i915: Tidy batch pool logic Chris Wilson
                   ` (49 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

In the next patch, I want to use the structure elsewhere and so require
it defined earlier. Rather than move the definition to an earlier location
where it feels very odd, place it in its own header file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 13 +--------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  1 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.h | 42 ++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 357d781095a2..e629939bd23c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,6 +37,7 @@
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
+#include "i915_gem_batch_pool.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
@@ -1143,11 +1144,6 @@ struct intel_l3_parity {
 	int which_slice;
 };
 
-struct i915_gem_batch_pool {
-	struct drm_device *dev;
-	struct list_head cache_list;
-};
-
 struct i915_gem_mm {
 	/** Memory allocator for GTT stolen memory */
 	struct drm_mm stolen;
@@ -3078,13 +3074,6 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
 
-/* i915_gem_batch_pool.c */
-void i915_gem_batch_pool_init(struct drm_device *dev,
-			      struct i915_gem_batch_pool *pool);
-void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
-struct drm_i915_gem_object*
-i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
-
 /* i915_cmd_parser.c */
 int i915_cmd_parser_get_version(void);
 int i915_cmd_parser_init_ring(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index c690170a1c4f..564be7c5ea7e 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -23,6 +23,7 @@
  */
 
 #include "i915_drv.h"
+#include "i915_gem_batch_pool.h"
 
 /**
  * DOC: batch pool
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
new file mode 100644
index 000000000000..5ed70ef6a887
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef I915_GEM_BATCH_POOL_H
+#define I915_GEM_BATCH_POOL_H
+
+#include "i915_drv.h"
+
+struct i915_gem_batch_pool {
+	struct drm_device *dev;
+	struct list_head cache_list;
+};
+
+/* i915_gem_batch_pool.c */
+void i915_gem_batch_pool_init(struct drm_device *dev,
+			      struct i915_gem_batch_pool *pool);
+void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
+struct drm_i915_gem_object*
+i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
+
+#endif /* I915_GEM_BATCH_POOL_H */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 11/70] drm/i915: Tidy batch pool logic
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (9 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 10/70] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 12/70] drm/i915: Split the batch pool by engine Chris Wilson
                   ` (48 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 12 +++------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 39 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 ++++------
 3 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 61ae8ff4eaed..9605ff8f2fcd 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -869,6 +869,9 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	    batch_len + batch_start_offset > src_obj->base.size)
 		return ERR_PTR(-E2BIG);
 
+	if (WARN_ON(dest_obj->pages_pin_count == 0))
+		return ERR_PTR(-ENODEV);
+
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
@@ -882,13 +885,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		goto unpin_src;
 	}
 
-	ret = i915_gem_object_get_pages(dest_obj);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
-		goto unmap_src;
-	}
-	i915_gem_object_pin_pages(dest_obj);
-
 	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
@@ -898,7 +894,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	dst = vmap_batch(dest_obj, 0, batch_len);
 	if (!dst) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		i915_gem_object_unpin_pages(dest_obj);
 		ret = -ENOMEM;
 		goto unmap_src;
 	}
@@ -1129,7 +1124,6 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	}
 
 	vunmap(batch_base);
-	i915_gem_object_unpin_pages(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 564be7c5ea7e..21f3356cc0ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -67,25 +67,23 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 					 struct drm_i915_gem_object,
 					 batch_pool_list);
 
-		WARN_ON(obj->active);
-
-		list_del_init(&obj->batch_pool_list);
+		list_del(&obj->batch_pool_list);
 		drm_gem_object_unreference(&obj->base);
 	}
 }
 
 /**
- * i915_gem_batch_pool_get() - select a buffer from the pool
+ * i915_gem_batch_pool_get() - allocate a buffer from the pool
  * @pool: the batch buffer pool
  * @size: the minimum desired size of the returned buffer
  *
- * Finds or allocates a batch buffer in the pool with at least the requested
- * size. The caller is responsible for any domain, active/inactive, or
- * purgeability management for the returned buffer.
+ * Returns an inactive buffer from @pool with at least @size bytes,
+ * with the pages pinned. The caller must i915_gem_object_unpin_pages()
+ * on the returned object.
  *
  * Note: Callers must hold the struct_mutex
  *
- * Return: the selected batch buffer object
+ * Return: the buffer object or an error pointer
  */
 struct drm_i915_gem_object *
 i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
@@ -97,8 +95,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
 	list_for_each_entry_safe(tmp, next,
-			&pool->cache_list, batch_pool_list) {
-
+				 &pool->cache_list, batch_pool_list) {
 		if (tmp->active)
 			continue;
 
@@ -114,25 +111,27 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		 * but not 'too much' bigger. A better way to do this
 		 * might be to bucket the pool objects based on size.
 		 */
-		if (tmp->base.size >= size &&
-		    tmp->base.size <= (2 * size)) {
+		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
 			obj = tmp;
 			break;
 		}
 	}
 
-	if (!obj) {
+	if (obj == NULL) {
+		int ret;
+
 		obj = i915_gem_alloc_object(pool->dev, size);
-		if (!obj)
+		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
 
-		list_add_tail(&obj->batch_pool_list, &pool->cache_list);
-	}
-	else
-		/* Keep list in LRU order */
-		list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+		ret = i915_gem_object_get_pages(obj);
+		if (ret)
+			return ERR_PTR(ret);
 
-	obj->madv = I915_MADV_WILLNEED;
+		obj->madv = I915_MADV_DONTNEED;
+	}
 
+	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+	i915_gem_object_pin_pages(obj);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4d31993545ab..05fc36449461 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -37,7 +37,6 @@
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1<<29)
 #define  __EXEC_OBJECT_NEEDS_BIAS (1<<28)
-#define  __EXEC_OBJECT_PURGEABLE (1<<27)
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -224,12 +223,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
 		vma->pin_count--;
 
-	if (entry->flags & __EXEC_OBJECT_PURGEABLE)
-		obj->madv = I915_MADV_DONTNEED;
-
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE |
-			  __EXEC_OBJECT_HAS_PIN |
-			  __EXEC_OBJECT_PURGEABLE);
+	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
 
 static void eb_destroy(struct eb_vmas *eb)
@@ -1185,11 +1179,13 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
+	i915_gem_object_unpin_pages(shadow_batch_obj);
+
 	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
 	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
-	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
+	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
@@ -1198,6 +1194,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	return shadow_batch_obj;
 
 err:
+	i915_gem_object_unpin_pages(shadow_batch_obj);
 	if (ret == -EACCES) /* unhandled chained batch */
 		return batch_obj;
 	else
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 12/70] drm/i915: Split the batch pool by engine
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (10 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 11/70] drm/i915: Tidy batch pool logic Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 13/70] drm/i915: Free batch pool when idle Chris Wilson
                   ` (47 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 33 ++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_dma.c            |  1 -
 drivers/gpu/drm/i915/i915_drv.h            |  8 --------
 drivers/gpu/drm/i915/i915_gem.c            |  2 --
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |  3 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +--
 drivers/gpu/drm/i915/intel_lrc.c           |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  8 ++++++++
 9 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9c23eec3277e..f610a2cd2088 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -377,13 +377,17 @@ static void print_batch_pool_stats(struct seq_file *m,
 {
 	struct drm_i915_gem_object *obj;
 	struct file_stats stats;
+	struct intel_engine_cs *ring;
+	int i;
 
 	memset(&stats, 0, sizeof(stats));
 
-	list_for_each_entry(obj,
-			    &dev_priv->mm.batch_pool.cache_list,
-			    batch_pool_list)
-		per_file_stats(0, obj, &stats);
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(obj,
+				    &ring->batch_pool.cache_list,
+				    batch_pool_list)
+			per_file_stats(0, obj, &stats);
+	}
 
 	print_file_stats(m, "batch pool", stats);
 }
@@ -613,21 +617,24 @@ static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *ring;
 	int count = 0;
-	int ret;
+	int ret, i;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
-	seq_puts(m, "cache:\n");
-	list_for_each_entry(obj,
-			    &dev_priv->mm.batch_pool.cache_list,
-			    batch_pool_list) {
-		seq_puts(m, "   ");
-		describe_obj(m, obj);
-		seq_putc(m, '\n');
-		count++;
+	for_each_ring(ring, dev_priv, i) {
+		seq_printf(m, "%s cache:\n", ring->name);
+		list_for_each_entry(obj,
+				    &ring->batch_pool.cache_list,
+				    batch_pool_list) {
+			seq_puts(m, "   ");
+			describe_obj(m, obj);
+			seq_putc(m, '\n');
+			count++;
+		}
 	}
 
 	seq_printf(m, "total: %d\n", count);
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index ec661fe44e70..7b0109e2ab23 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1076,7 +1076,6 @@ int i915_driver_unload(struct drm_device *dev)
 
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
-	i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
 	i915_gem_context_fini(dev);
 	mutex_unlock(&dev->struct_mutex);
 	i915_gem_cleanup_stolen(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e629939bd23c..178da6eb4edb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -37,7 +37,6 @@
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
-#include "i915_gem_batch_pool.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
@@ -1157,13 +1156,6 @@ struct i915_gem_mm {
 	 */
 	struct list_head unbound_list;
 
-	/*
-	 * A pool of objects to use as shadow copies of client batch buffers
-	 * when the command parser is enabled. Prevents the client from
-	 * modifying the batch contents after software parsing.
-	 */
-	struct i915_gem_batch_pool batch_pool;
-
 	/** Usable portion of the GTT for GEM */
 	unsigned long stolen_base; /* limited to low memory (32-bit) */
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 973f0c94f26e..9e1d872aa576 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5062,8 +5062,6 @@ i915_gem_load(struct drm_device *dev)
 
 	i915_gem_shrinker_init(dev_priv);
 
-	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
-
 	mutex_init(&dev_priv->fb_tracking.lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 21f3356cc0ab..1287abf55b84 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -96,8 +96,9 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 
 	list_for_each_entry_safe(tmp, next,
 				 &pool->cache_list, batch_pool_list) {
+		/* The batches are strictly LRU ordered */
 		if (tmp->active)
-			continue;
+			break;
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 05fc36449461..bd7b7ec5c184 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1156,12 +1156,11 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  u32 batch_len,
 			  bool is_master)
 {
-	struct drm_i915_private *dev_priv = to_i915(batch_obj->base.dev);
 	struct drm_i915_gem_object *shadow_batch_obj;
 	struct i915_vma *vma;
 	int ret;
 
-	shadow_batch_obj = i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
+	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
 						   PAGE_ALIGN(batch_len));
 	if (IS_ERR(shadow_batch_obj))
 		return shadow_batch_obj;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1c3834fc5608..5dacd402975a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1383,6 +1383,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		ring->cleanup(ring);
 
 	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
 
 	if (ring->status_page.obj) {
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
@@ -1400,6 +1401,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6ee19e4ccad7..d9c2ae504a66 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1983,6 +1983,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
@@ -2061,6 +2062,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	cleanup_status_page(ring);
 
 	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
 
 	kfree(ringbuf);
 	ring->buffer = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6566dd447498..39f6dfc0ee54 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -2,6 +2,7 @@
 #define _INTEL_RINGBUFFER_H_
 
 #include <linux/hashtable.h>
+#include "i915_gem_batch_pool.h"
 
 #define I915_CMD_HASH_ORDER 9
 
@@ -133,6 +134,13 @@ struct  intel_engine_cs {
 	struct		drm_device *dev;
 	struct intel_ringbuffer *buffer;
 
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
 	struct intel_hw_status_page status_page;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 13/70] drm/i915: Free batch pool when idle
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (11 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 12/70] drm/i915: Split the batch pool by engine Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 14/70] drm/i915: Split batch pool into size buckets Chris Wilson
                   ` (46 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9e1d872aa576..2282f579e101 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2795,8 +2795,19 @@ i915_gem_idle_work_handler(struct work_struct *work)
 {
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv), mm.idle_work.work);
+	struct drm_device *dev = dev_priv->dev;
+
+	intel_mark_idle(dev);
 
-	intel_mark_idle(dev_priv->dev);
+	if (mutex_trylock(&dev->struct_mutex)) {
+		struct intel_engine_cs *ring;
+		int i;
+
+		for_each_ring(ring, dev_priv, i)
+			i915_gem_batch_pool_fini(&ring->batch_pool);
+
+		mutex_unlock(&dev->struct_mutex);
+	}
 }
 
 /**
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 14/70] drm/i915: Split batch pool into size buckets
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (12 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 13/70] drm/i915: Free batch pool when idle Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
                   ` (45 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 46 ++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem.c            |  2 +-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 49 +++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_batch_pool.h |  2 +-
 5 files changed, 64 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f610a2cd2088..11eebc28775a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -378,15 +378,17 @@ static void print_batch_pool_stats(struct seq_file *m,
 	struct drm_i915_gem_object *obj;
 	struct file_stats stats;
 	struct intel_engine_cs *ring;
-	int i;
+	int i, j;
 
 	memset(&stats, 0, sizeof(stats));
 
 	for_each_ring(ring, dev_priv, i) {
-		list_for_each_entry(obj,
-				    &ring->batch_pool.cache_list,
-				    batch_pool_list)
-			per_file_stats(0, obj, &stats);
+		for (j = 0; j < ARRAY_SIZE(ring->batch_pool.cache_list); j++) {
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link)
+				per_file_stats(0, obj, &stats);
+		}
 	}
 
 	print_file_stats(m, "batch pool", stats);
@@ -618,26 +620,38 @@ static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
 	struct intel_engine_cs *ring;
-	int count = 0;
-	int ret, i;
+	int total = 0;
+	int ret, i, j;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
 		return ret;
 
 	for_each_ring(ring, dev_priv, i) {
-		seq_printf(m, "%s cache:\n", ring->name);
-		list_for_each_entry(obj,
-				    &ring->batch_pool.cache_list,
-				    batch_pool_list) {
-			seq_puts(m, "   ");
-			describe_obj(m, obj);
-			seq_putc(m, '\n');
-			count++;
+		for (j = 0; j < ARRAY_SIZE(ring->batch_pool.cache_list); j++) {
+			int count;
+
+			count = 0;
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link)
+				count++;
+			seq_printf(m, "%s cache[%d]: %d objects\n",
+				   ring->name, j, count);
+
+			list_for_each_entry(obj,
+					    &ring->batch_pool.cache_list[j],
+					    batch_pool_link) {
+				seq_puts(m, "   ");
+				describe_obj(m, obj);
+				seq_putc(m, '\n');
+			}
+
+			total += count;
 		}
 	}
 
-	seq_printf(m, "total: %d\n", count);
+	seq_printf(m, "total: %d\n", total);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 178da6eb4edb..cc7956c7f251 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1911,7 +1911,7 @@ struct drm_i915_gem_object {
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
-	struct list_head batch_pool_list;
+	struct list_head batch_pool_link;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2282f579e101..c7d9ee2f708a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4450,7 +4450,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->ring_list);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
-	INIT_LIST_HEAD(&obj->batch_pool_list);
+	INIT_LIST_HEAD(&obj->batch_pool_link);
 
 	obj->ops = ops;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 1287abf55b84..7bf2f3f2968e 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -47,8 +47,12 @@
 void i915_gem_batch_pool_init(struct drm_device *dev,
 			      struct i915_gem_batch_pool *pool)
 {
+	int n;
+
 	pool->dev = dev;
-	INIT_LIST_HEAD(&pool->cache_list);
+
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
+		INIT_LIST_HEAD(&pool->cache_list[n]);
 }
 
 /**
@@ -59,16 +63,20 @@ void i915_gem_batch_pool_init(struct drm_device *dev,
  */
 void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 {
+	int n;
+
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
-	while (!list_empty(&pool->cache_list)) {
-		struct drm_i915_gem_object *obj =
-			list_first_entry(&pool->cache_list,
-					 struct drm_i915_gem_object,
-					 batch_pool_list);
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) {
+		while (!list_empty(&pool->cache_list[n])) {
+			struct drm_i915_gem_object *obj =
+				list_first_entry(&pool->cache_list[n],
+						 struct drm_i915_gem_object,
+						 batch_pool_link);
 
-		list_del(&obj->batch_pool_list);
-		drm_gem_object_unreference(&obj->base);
+			list_del(&obj->batch_pool_link);
+			drm_gem_object_unreference(&obj->base);
+		}
 	}
 }
 
@@ -91,28 +99,33 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 {
 	struct drm_i915_gem_object *obj = NULL;
 	struct drm_i915_gem_object *tmp, *next;
+	struct list_head *list;
+	int n;
 
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
-	list_for_each_entry_safe(tmp, next,
-				 &pool->cache_list, batch_pool_list) {
+	/* Compute a power-of-two bucket, but throw everything greater than
+	 * 16KiB into the same bucket: i.e. the the buckets hold objects of
+	 * (1 page, 2 pages, 4 pages, 8+ pages).
+	 */
+	n = fls(size >> PAGE_SHIFT) - 1;
+	if (n >= ARRAY_SIZE(pool->cache_list))
+		n = ARRAY_SIZE(pool->cache_list) - 1;
+	list = &pool->cache_list[n];
+
+	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
 		if (tmp->active)
 			break;
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
-			list_del(&tmp->batch_pool_list);
+			list_del(&tmp->batch_pool_link);
 			drm_gem_object_unreference(&tmp->base);
 			continue;
 		}
 
-		/*
-		 * Select a buffer that is at least as big as needed
-		 * but not 'too much' bigger. A better way to do this
-		 * might be to bucket the pool objects based on size.
-		 */
-		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
+		if (tmp->base.size >= size) {
 			obj = tmp;
 			break;
 		}
@@ -132,7 +145,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		obj->madv = I915_MADV_DONTNEED;
 	}
 
-	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+	list_move_tail(&obj->batch_pool_link, list);
 	i915_gem_object_pin_pages(obj);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
index 5ed70ef6a887..848e90703eed 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -29,7 +29,7 @@
 
 struct i915_gem_batch_pool {
 	struct drm_device *dev;
-	struct list_head cache_list;
+	struct list_head cache_list[4];
 };
 
 /* i915_gem_batch_pool.c */
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (13 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 14/70] drm/i915: Split batch pool into size buckets Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:33   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
                   ` (44 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Since we use obj->active as a hint in many places throughout the code,
knowing its state in debugfs is extremely useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 11eebc28775a..e87f031abc99 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -123,8 +123,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int pin_count = 0;
 
-	seq_printf(m, "%pK: %s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
+	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
 		   &obj->base,
+		   obj->active ? "*" : " ",
 		   get_pin_flag(obj),
 		   get_tiling_flag(obj),
 		   get_global_flag(obj),
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (14 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:34   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 17/70] drm/i915: Optimistically spin for the request completion Chris Wilson
                   ` (43 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

This is just so that I don't have to read about the batch pool on
systems that are not using it! Rather than using a newline between the
kernel clients and userspace clients, just distinguish the internal
allocations with a '[k]'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e87f031abc99..fbba5c267f5d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -362,16 +362,18 @@ static int per_file_stats(int id, void *ptr, void *data)
 	return 0;
 }
 
-#define print_file_stats(m, name, stats) \
-	seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
-		   name, \
-		   stats.count, \
-		   stats.total, \
-		   stats.active, \
-		   stats.inactive, \
-		   stats.global, \
-		   stats.shared, \
-		   stats.unbound)
+#define print_file_stats(m, name, stats) do { \
+	if (stats.count) \
+		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
+			   name, \
+			   stats.count, \
+			   stats.total, \
+			   stats.active, \
+			   stats.inactive, \
+			   stats.global, \
+			   stats.shared, \
+			   stats.unbound); \
+} while (0)
 
 static void print_batch_pool_stats(struct seq_file *m,
 				   struct drm_i915_private *dev_priv)
@@ -392,7 +394,7 @@ static void print_batch_pool_stats(struct seq_file *m,
 		}
 	}
 
-	print_file_stats(m, "batch pool", stats);
+	print_file_stats(m, "[k]batch pool", stats);
 }
 
 #define count_vmas(list, member) do { \
@@ -478,8 +480,6 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 
 	seq_putc(m, '\n');
 	print_batch_pool_stats(m, dev_priv);
-
-	seq_putc(m, '\n');
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
 		struct task_struct *task;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (15 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:39   ` Daniel Vetter
  2015-04-13 11:34   ` Tvrtko Ursulin
  2015-04-07 15:20 ` [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
                   ` (42 subsequent siblings)
  59 siblings, 2 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, Rantala, Valtteri

This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 44 +++++++++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c7d9ee2f708a..47650327204e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1181,6 +1181,29 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
 	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
+static int __i915_spin_request(struct drm_i915_gem_request *rq)
+{
+	unsigned long timeout;
+
+	if (i915_gem_request_get_ring(rq)->irq_refcount)
+		return -EBUSY;
+
+	timeout = jiffies + 1;
+	while (!need_resched()) {
+		if (i915_gem_request_completed(rq, true))
+			return 0;
+
+		if (time_after_eq(jiffies, timeout))
+			break;
+
+		cpu_relax_lowlatency();
+	}
+	if (i915_gem_request_completed(rq, false))
+		return 0;
+
+	return -EAGAIN;
+}
+
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
@@ -1225,12 +1248,20 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (INTEL_INFO(dev)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
-	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
-		return -ENODEV;
-
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
 	before = ktime_get_raw_ns();
+
+	/* Optimistic spin for the next jiffie before touching IRQs */
+	ret = __i915_spin_request(req);
+	if (ret == 0)
+		goto out;
+
+	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
+		ret = -ENODEV;
+		goto out;
+	}
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1279,14 +1310,15 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	now = ktime_get_raw_ns();
-	trace_i915_gem_request_wait_end(req);
-
 	if (!irq_test_in_progress)
 		ring->irq_put(ring);
 
 	finish_wait(&ring->irq_queue, &wait);
 
+out:
+	now = ktime_get_raw_ns();
+	trace_i915_gem_request_wait_end(req);
+
 	if (timeout) {
 		s64 tres = *timeout - (now - before);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (16 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 17/70] drm/i915: Optimistically spin for the request completion Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-14 13:51   ` Tvrtko Ursulin
  2015-04-07 15:20 ` [PATCH 19/70] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
                   ` (41 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Lionel Landwerlin

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

read-read
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  16 +-
 drivers/gpu/drm/i915/i915_drv.h         |  19 +-
 drivers/gpu/drm/i915/i915_gem.c         | 540 +++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_context.c |   2 -
 drivers/gpu/drm/i915/i915_gem_debug.c   |  92 ++----
 drivers/gpu/drm/i915/i915_gpu_error.c   |  19 +-
 drivers/gpu/drm/i915/intel_display.c    |   6 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  19 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  14 +-
 9 files changed, 407 insertions(+), 320 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fbba5c267f5d..5da74b46e202 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -120,10 +120,13 @@ static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	struct intel_engine_cs *ring;
 	struct i915_vma *vma;
 	int pin_count = 0;
+	int i;
 
-	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
+	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x [ ",
 		   &obj->base,
 		   obj->active ? "*" : " ",
 		   get_pin_flag(obj),
@@ -131,8 +134,11 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   get_global_flag(obj),
 		   obj->base.size / 1024,
 		   obj->base.read_domains,
-		   obj->base.write_domain,
-		   i915_gem_request_get_seqno(obj->last_read_req),
+		   obj->base.write_domain);
+	for_each_ring(ring, dev_priv, i)
+		seq_printf(m, "%x ",
+				i915_gem_request_get_seqno(obj->last_read_req[i]));
+	seq_printf(m, "] %x %x%s%s%s",
 		   i915_gem_request_get_seqno(obj->last_write_req),
 		   i915_gem_request_get_seqno(obj->last_fenced_req),
 		   i915_cache_level_str(to_i915(obj->base.dev), obj->cache_level),
@@ -169,9 +175,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		*t = '\0';
 		seq_printf(m, " (%s mappable)", s);
 	}
-	if (obj->last_read_req != NULL)
+	if (obj->last_write_req != NULL)
 		seq_printf(m, " (%s)",
-			   i915_gem_request_get_ring(obj->last_read_req)->name);
+			   i915_gem_request_get_ring(obj->last_write_req)->name);
 	if (obj->frontbuffer_bits)
 		seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cc7956c7f251..d35778797ef0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -500,7 +500,7 @@ struct drm_i915_error_state {
 	struct drm_i915_error_buffer {
 		u32 size;
 		u32 name;
-		u32 rseqno, wseqno;
+		u32 rseqno[I915_NUM_RINGS], wseqno;
 		u32 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
@@ -1907,7 +1907,7 @@ struct drm_i915_gem_object {
 	struct drm_mm_node *stolen;
 	struct list_head global_list;
 
-	struct list_head ring_list;
+	struct list_head ring_list[I915_NUM_RINGS];
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
 
@@ -1918,7 +1918,7 @@ struct drm_i915_gem_object {
 	 * rendering and so a non-zero seqno), and is not set if it i s on
 	 * inactive (ready to be unbound) list.
 	 */
-	unsigned int active:1;
+	unsigned int active:I915_NUM_RINGS;
 
 	/**
 	 * This is set if the object has been written to since last bound
@@ -1989,8 +1989,17 @@ struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/** Breadcrumb of last rendering to the buffer. */
-	struct drm_i915_gem_request *last_read_req;
+	/** Breadcrumb of last rendering to the buffer.
+	 * There can only be one writer, but we allow for multiple readers.
+	 * If there is a writer that necessarily implies that all other
+	 * read requests are complete - but we may only be lazily clearing
+	 * the read requests. A read request is naturally the most recent
+	 * request on a ring, so we may have two different write and read
+	 * requests on one ring where the write request is older than the
+	 * read request. This allows for the CPU to read from an active
+	 * buffer by only waiting for the write to complete.
+	 * */
+	struct drm_i915_gem_request *last_read_req[I915_NUM_RINGS];
 	struct drm_i915_gem_request *last_write_req;
 	/** Breadcrumb of last fenced GPU access to the buffer. */
 	struct drm_i915_gem_request *last_fenced_req;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 47650327204e..a32a84598fac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,14 +38,17 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+#define RQ_BUG_ON(expr)
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
+static void
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
+static void
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
 static __must_check int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly);
-static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj);
-
 static void i915_gem_write_fence(struct drm_device *dev, int reg,
 				 struct drm_i915_gem_object *obj);
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
@@ -518,8 +521,6 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 		ret = i915_gem_object_wait_rendering(obj, true);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 
 	ret = i915_gem_object_get_pages(obj);
@@ -939,8 +940,6 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
-
-		i915_gem_object_retire(obj);
 	}
 	/* Same trick applies to invalidate partially written cachelines read
 	 * before writing. */
@@ -1239,6 +1238,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
+	if (list_empty(&req->list))
+		return 0;
+
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
@@ -1338,6 +1340,63 @@ out:
 	return ret;
 }
 
+static inline void
+i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+{
+	struct drm_i915_file_private *file_priv = request->file_priv;
+
+	if (!file_priv)
+		return;
+
+	spin_lock(&file_priv->mm.lock);
+	list_del(&request->client_list);
+	request->file_priv = NULL;
+	spin_unlock(&file_priv->mm.lock);
+}
+
+static void i915_gem_request_retire(struct drm_i915_gem_request *request)
+{
+	trace_i915_gem_request_retire(request);
+
+	/* We know the GPU must have read the request to have
+	 * sent us the seqno + interrupt, so use the position
+	 * of tail of the request to update the last known position
+	 * of the GPU head.
+	 *
+	 * Note this requires that we are always called in request
+	 * completion order.
+	 */
+	request->ringbuf->last_retired_head = request->postfix;
+
+	list_del_init(&request->list);
+	i915_gem_request_remove_from_client(request);
+
+	put_pid(request->pid);
+
+	i915_gem_request_unreference(request);
+}
+
+static void
+__i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *engine = rq->ring;
+	struct drm_i915_gem_request *tmp;
+
+	lockdep_assert_held(&engine->dev->struct_mutex);
+
+	if (list_empty(&rq->list))
+		return;
+
+	do {
+		tmp = list_first_entry(&engine->request_list,
+				       typeof(*tmp), list);
+
+		i915_gem_request_retire(tmp);
+	} while (tmp != rq);
+
+	WARN_ON(i915_verify_lists(engine->dev));
+}
+
 /**
  * Waits for a request to be signaled, and cleans up the
  * request and object lists appropriately for that event.
@@ -1348,7 +1407,6 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	struct drm_device *dev;
 	struct drm_i915_private *dev_priv;
 	bool interruptible;
-	unsigned reset_counter;
 	int ret;
 
 	BUG_ON(req == NULL);
@@ -1367,29 +1425,13 @@ i915_wait_request(struct drm_i915_gem_request *req)
 	if (ret)
 		return ret;
 
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
-	ret = __i915_wait_request(req, reset_counter,
+	ret = __i915_wait_request(req,
+				  atomic_read(&dev_priv->gpu_error.reset_counter),
 				  interruptible, NULL, NULL);
-	i915_gem_request_unreference(req);
-	return ret;
-}
-
-static int
-i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj)
-{
-	if (!obj->active)
-		return 0;
-
-	/* Manually manage the write flush as we may have not yet
-	 * retired the buffer.
-	 *
-	 * Note that the last_write_req is always the earlier of
-	 * the two (read/write) requests, so if we haved successfully waited,
-	 * we know we have passed the last write.
-	 */
-	i915_gem_request_assign(&obj->last_write_req, NULL);
+	if (ret)
+		return ret;
 
+	__i915_gem_request_retire__upto(req);
 	return 0;
 }
 
@@ -1401,18 +1443,52 @@ static __must_check int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 			       bool readonly)
 {
-	struct drm_i915_gem_request *req;
-	int ret;
+	int ret, i;
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
-	ret = i915_wait_request(req);
-	if (ret)
-		return ret;
+	if (readonly) {
+		if (obj->last_write_req != NULL) {
+			ret = i915_wait_request(obj->last_write_req);
+			if (ret)
+				return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+			i = obj->last_write_req->ring->id;
+			if (obj->last_read_req[i] == obj->last_write_req)
+				i915_gem_object_retire__read(obj, i);
+			else
+				i915_gem_object_retire__write(obj);
+		}
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			if (obj->last_read_req[i] == NULL)
+				continue;
+
+			ret = i915_wait_request(obj->last_read_req[i]);
+			if (ret)
+				return ret;
+
+			i915_gem_object_retire__read(obj, i);
+		}
+		RQ_BUG_ON(obj->active);
+	}
+
+	return 0;
+}
+
+static void
+i915_gem_object_retire_request(struct drm_i915_gem_object *obj,
+			       struct drm_i915_gem_request *rq)
+{
+	int ring = rq->ring->id;
+
+	if (obj->last_read_req[ring] == rq)
+		i915_gem_object_retire__read(obj, ring);
+	else if (obj->last_write_req == rq)
+		i915_gem_object_retire__write(obj);
+
+	__i915_gem_request_retire__upto(rq);
 }
 
 /* A nonblocking variant of the above wait. This is a highly dangerous routine
@@ -1423,37 +1499,66 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 					    struct drm_i915_file_private *file_priv,
 					    bool readonly)
 {
-	struct drm_i915_gem_request *req;
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret;
+	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 	BUG_ON(!dev_priv->mm.interruptible);
 
-	req = readonly ? obj->last_write_req : obj->last_read_req;
-	if (!req)
+	if (!obj->active)
 		return 0;
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_olr(req);
-	if (ret)
-		return ret;
-
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	if (readonly) {
+		struct drm_i915_gem_request *rq;
+
+		rq = obj->last_write_req;
+		if (rq == NULL)
+			return 0;
+
+		ret = i915_gem_check_olr(rq);
+		if (ret)
+			goto err;
+
+		requests[n++] = i915_gem_request_reference(rq);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			struct drm_i915_gem_request *rq;
+
+			rq = obj->last_read_req[i];
+			if (rq == NULL)
+				continue;
+
+			ret = i915_gem_check_olr(rq);
+			if (ret)
+				goto err;
+
+			requests[n++] = i915_gem_request_reference(rq);
+		}
+	}
+
 	mutex_unlock(&dev->struct_mutex);
-	ret = __i915_wait_request(req, reset_counter, true, NULL, file_priv);
+	for (i = 0; ret == 0 && i < n; i++)
+		ret = __i915_wait_request(requests[i], reset_counter, true,
+					  NULL, file_priv);
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_request_unreference(req);
-	if (ret)
-		return ret;
 
-	return i915_gem_object_wait_rendering__tail(obj);
+err:
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			i915_gem_object_retire_request(obj, requests[i]);
+		i915_gem_request_unreference(requests[i]);
+	}
+
+	return ret;
 }
 
 /**
@@ -2204,78 +2309,58 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static void
-i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
-			       struct intel_engine_cs *ring)
+void i915_vma_move_to_active(struct i915_vma *vma,
+			     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req;
-	struct intel_engine_cs *old_ring;
-
-	BUG_ON(ring == NULL);
-
-	req = intel_ring_get_request(ring);
-	old_ring = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (old_ring != ring && obj->last_write_req) {
-		/* Keep the request relative to the current ring */
-		i915_gem_request_assign(&obj->last_write_req, req);
-	}
+	struct drm_i915_gem_object *obj = vma->obj;
 
 	/* Add a reference if we're newly entering the active list. */
-	if (!obj->active) {
+	if (obj->active == 0)
 		drm_gem_object_reference(&obj->base);
-		obj->active = 1;
-	}
+	obj->active |= intel_ring_flag(ring);
 
-	list_move_tail(&obj->ring_list, &ring->active_list);
+	list_move_tail(&obj->ring_list[ring->id], &ring->active_list);
+	i915_gem_request_assign(&obj->last_read_req[ring->id],
+				intel_ring_get_request(ring));
 
-	i915_gem_request_assign(&obj->last_read_req, req);
+	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
 
-void i915_vma_move_to_active(struct i915_vma *vma,
-			     struct intel_engine_cs *ring)
+static void
+i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	list_move_tail(&vma->mm_list, &vma->vm->active_list);
-	return i915_gem_object_move_to_active(vma->obj, ring);
+	RQ_BUG_ON(obj->last_write_req == NULL);
+	RQ_BUG_ON(!(obj->active & intel_ring_flag(obj->last_write_req->ring)));
+
+	i915_gem_request_assign(&obj->last_write_req, NULL);
+	intel_fb_obj_flush(obj, true);
 }
 
 static void
-i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
+i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS);
-	BUG_ON(!obj->active);
+	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
+	RQ_BUG_ON(!(obj->active & (1 << ring)));
+
+	list_del_init(&obj->ring_list[ring]);
+	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+
+	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
+		i915_gem_object_retire__write(obj);
+
+	obj->active &= ~(1 << ring);
+	if (obj->active)
+		return;
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (!list_empty(&vma->mm_list))
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
-	intel_fb_obj_flush(obj, true);
-
-	list_del_init(&obj->ring_list);
-
-	i915_gem_request_assign(&obj->last_read_req, NULL);
-	i915_gem_request_assign(&obj->last_write_req, NULL);
-	obj->base.write_domain = 0;
-
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
-
-	obj->active = 0;
 	drm_gem_object_unreference(&obj->base);
-
-	WARN_ON(i915_verify_lists(dev));
-}
-
-static void
-i915_gem_object_retire(struct drm_i915_gem_object *obj)
-{
-	if (obj->last_read_req == NULL)
-		return;
-
-	if (i915_gem_request_completed(obj->last_read_req, true))
-		i915_gem_object_move_to_inactive(obj);
 }
 
 static int
@@ -2452,20 +2537,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
-{
-	struct drm_i915_file_private *file_priv = request->file_priv;
-
-	if (!file_priv)
-		return;
-
-	spin_lock(&file_priv->mm.lock);
-	list_del(&request->client_list);
-	request->file_priv = NULL;
-	spin_unlock(&file_priv->mm.lock);
-}
-
 static bool i915_context_is_banned(struct drm_i915_private *dev_priv,
 				   const struct intel_context *ctx)
 {
@@ -2511,16 +2582,6 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_free_request(struct drm_i915_gem_request *request)
-{
-	list_del(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	put_pid(request->pid);
-
-	i915_gem_request_unreference(request);
-}
-
 void i915_gem_request_free(struct kref *req_ref)
 {
 	struct drm_i915_gem_request *req = container_of(req_ref,
@@ -2620,9 +2681,9 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		obj = list_first_entry(&ring->active_list,
 				       struct drm_i915_gem_object,
-				       ring_list);
+				       ring_list[ring->id]);
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	/*
@@ -2659,7 +2720,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 					   struct drm_i915_gem_request,
 					   list);
 
-		i915_gem_free_request(request);
+		i915_gem_request_retire(request);
 	}
 
 	/* This may not have been flushed before the reset, so clean it now */
@@ -2707,6 +2768,8 @@ void i915_gem_reset(struct drm_device *dev)
 	i915_gem_context_reset(dev);
 
 	i915_gem_restore_fences(dev);
+
+	WARN_ON(i915_verify_lists(dev));
 }
 
 /**
@@ -2715,11 +2778,11 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
-	if (list_empty(&ring->request_list))
-		return;
-
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	if (list_empty(&ring->active_list))
+		return;
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2735,16 +2798,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		if (!i915_gem_request_completed(request, true))
 			break;
 
-		trace_i915_gem_request_retire(request);
-
-		/* We know the GPU must have read the request to have
-		 * sent us the seqno + interrupt, so use the position
-		 * of tail of the request to update the last known position
-		 * of the GPU head.
-		 */
-		request->ringbuf->last_retired_head = request->postfix;
-
-		i915_gem_free_request(request);
+		i915_gem_request_retire(request);
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
@@ -2756,12 +2810,12 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		obj = list_first_entry(&ring->active_list,
 				      struct drm_i915_gem_object,
-				      ring_list);
+				      ring_list[ring->id]);
 
-		if (!i915_gem_request_completed(obj->last_read_req, true))
+		if (!list_empty(&obj->last_read_req[ring->id]->list))
 			break;
 
-		i915_gem_object_move_to_inactive(obj);
+		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	if (unlikely(ring->trace_irq_req &&
@@ -2850,17 +2904,30 @@ i915_gem_idle_work_handler(struct work_struct *work)
 static int
 i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 {
-	struct intel_engine_cs *ring;
-	int ret;
+	int ret, i;
+
+	if (!obj->active)
+		return 0;
 
-	if (obj->active) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct drm_i915_gem_request *rq;
 
-		ret = i915_gem_check_olr(obj->last_read_req);
+		rq = obj->last_read_req[i];
+		if (rq == NULL)
+			continue;
+
+		if (list_empty(&rq->list))
+			goto retire;
+
+		ret = i915_gem_check_olr(rq);
 		if (ret)
 			return ret;
 
-		i915_gem_retire_requests_ring(ring);
+		if (i915_gem_request_completed(rq, true)) {
+			__i915_gem_request_retire__upto(rq);
+retire:
+			i915_gem_object_retire__read(obj, i);
+		}
 	}
 
 	return 0;
@@ -2894,9 +2961,10 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
-	struct drm_i915_gem_request *req;
+	struct drm_i915_gem_request *req[I915_NUM_RINGS];
 	unsigned reset_counter;
-	int ret = 0;
+	int i, n = 0;
+	int ret;
 
 	if (args->flags != 0)
 		return -EINVAL;
@@ -2916,11 +2984,9 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	if (ret)
 		goto out;
 
-	if (!obj->active || !obj->last_read_req)
+	if (!obj->active)
 		goto out;
 
-	req = obj->last_read_req;
-
 	/* Do this after OLR check to make sure we make forward progress polling
 	 * on this IOCTL with a timeout == 0 (like busy ioctl)
 	 */
@@ -2931,13 +2997,23 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	drm_gem_object_unreference(&obj->base);
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-	i915_gem_request_reference(req);
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		if (obj->last_read_req[i] == NULL)
+			continue;
+
+		req[n++] = i915_gem_request_reference(obj->last_read_req[i]);
+	}
+
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __i915_wait_request(req, reset_counter, true,
-				  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
-				  file->driver_priv);
-	i915_gem_request_unreference__unlocked(req);
+	for (i = 0; i < n; i++) {
+		if (ret == 0)
+			ret = __i915_wait_request(req[i], reset_counter, true,
+						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
+						  file->driver_priv);
+		i915_gem_request_unreference__unlocked(req[i]);
+	}
 	return ret;
 
 out:
@@ -2946,6 +3022,59 @@ out:
 	return ret;
 }
 
+static int
+__i915_gem_object_sync(struct drm_i915_gem_object *obj,
+		       struct intel_engine_cs *to,
+		       struct drm_i915_gem_request *rq)
+{
+	struct intel_engine_cs *from;
+	int ret;
+
+	if (rq == NULL)
+		return 0;
+
+	from = i915_gem_request_get_ring(rq);
+	if (to == from)
+		return 0;
+
+	if (i915_gem_request_completed(rq, true))
+		return 0;
+
+	ret = i915_gem_check_olr(rq);
+	if (ret)
+		return ret;
+
+	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+		ret = __i915_wait_request(rq,
+					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
+					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
+		if (ret)
+			return ret;
+
+		i915_gem_object_retire_request(obj, rq);
+	} else {
+		int idx = intel_ring_sync_index(from, to);
+		u32 seqno = i915_gem_request_get_seqno(rq);
+
+		if (seqno <= from->semaphore.sync_seqno[idx])
+			return 0;
+
+		trace_i915_gem_ring_sync_to(from, to, rq);
+		ret = to->semaphore.sync_to(to, from, seqno);
+		if (ret)
+			return ret;
+
+		/* We use last_read_req because sync_to()
+		 * might have just caused seqno wrap under
+		 * the radar.
+		 */
+		from->semaphore.sync_seqno[idx] =
+			i915_gem_request_get_seqno(obj->last_read_req[from->id]);
+	}
+
+	return 0;
+}
+
 /**
  * i915_gem_object_sync - sync an object to a ring.
  *
@@ -2954,7 +3083,17 @@ out:
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
- * rather than a particular GPU ring.
+ * rather than a particular GPU ring. Conceptually we serialise writes
+ * between engines inside the GPU. We only allow on engine to write
+ * into a buffer at any time, but multiple readers. To ensure each has
+ * a coherent view of memory, we must:
+ *
+ * - If there is an outstanding write request to the object, the new
+ *   request must wait for it to complete (either CPU or in hw, requests
+ *   on the same ring will be naturally ordered).
+ *
+ * - If we are a write request (pending_write_domain is set), the new
+ *   request must wait for outstanding read requests to complete.
  *
  * Returns 0 if successful, else propagates up the lower layer error.
  */
@@ -2962,39 +3101,25 @@ int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		     struct intel_engine_cs *to)
 {
-	struct intel_engine_cs *from;
-	u32 seqno;
-	int ret, idx;
-
-	from = i915_gem_request_get_ring(obj->last_read_req);
-
-	if (from == NULL || to == from)
-		return 0;
-
-	if (to == NULL || !i915_semaphore_is_enabled(obj->base.dev))
-		return i915_gem_object_wait_rendering(obj, false);
-
-	idx = intel_ring_sync_index(from, to);
+	const bool readonly = obj->base.pending_write_domain == 0;
+	int ret, i;
 
-	seqno = i915_gem_request_get_seqno(obj->last_read_req);
-	/* Optimization: Avoid semaphore sync when we are sure we already
-	 * waited for an object with higher seqno */
-	if (seqno <= from->semaphore.sync_seqno[idx])
+	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_olr(obj->last_read_req);
-	if (ret)
-		return ret;
-
-	trace_i915_gem_ring_sync_to(from, to, obj->last_read_req);
-	ret = to->semaphore.sync_to(to, from, seqno);
-	if (!ret)
-		/* We use last_read_req because sync_to()
-		 * might have just caused seqno wrap under
-		 * the radar.
-		 */
-		from->semaphore.sync_seqno[idx] =
-				i915_gem_request_get_seqno(obj->last_read_req);
+	if (to == NULL) {
+		ret = i915_gem_object_wait_rendering(obj, readonly);
+	} else if (readonly) {
+		ret = __i915_gem_object_sync(obj, to,
+					     obj->last_write_req);
+	} else {
+		for (i = 0; i < I915_NUM_RINGS; i++) {
+			ret = __i915_gem_object_sync(obj, to,
+						     obj->last_read_req[i]);
+			if (ret)
+				break;
+		}
+	}
 
 	return ret;
 }
@@ -3081,10 +3206,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	/* Since the unbound list is global, only move to that list if
 	 * no more VMAs exist. */
 	if (list_empty(&obj->vma_list)) {
-		/* Throw away the active reference before
-		 * moving to the unbound list. */
-		i915_gem_object_retire(obj);
-
 		i915_gem_gtt_finish_object(obj);
 		list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
 	}
@@ -3117,6 +3238,7 @@ int i915_gpu_idle(struct drm_device *dev)
 			return ret;
 	}
 
+	WARN_ON(i915_verify_lists(dev));
 	return 0;
 }
 
@@ -3750,8 +3872,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
-
 	/* Flush and acquire obj->pages so that we are coherent through
 	 * direct access in memory with previous cached writes through
 	 * shmemfs and that our cache domain tracking remains valid.
@@ -3977,11 +4097,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	bool was_pin_display;
 	int ret;
 
-	if (pipelined != i915_gem_request_get_ring(obj->last_read_req)) {
-		ret = i915_gem_object_sync(obj, pipelined);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_object_sync(obj, pipelined);
+	if (ret)
+		return ret;
 
 	/* Mark the pin_display early so that we account for the
 	 * display coherency whilst setting up the cache domains.
@@ -4086,7 +4204,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
-	i915_gem_object_retire(obj);
 	i915_gem_object_flush_gtt_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -4396,15 +4513,15 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	 * necessary flushes here.
 	 */
 	ret = i915_gem_object_flush_active(obj);
+	if (ret)
+		goto unref;
 
-	args->busy = obj->active;
-	if (obj->last_read_req) {
-		struct intel_engine_cs *ring;
-		BUILD_BUG_ON(I915_NUM_RINGS > 16);
-		ring = i915_gem_request_get_ring(obj->last_read_req);
-		args->busy |= intel_ring_flag(ring) << 16;
-	}
+	BUILD_BUG_ON(I915_NUM_RINGS > 16);
+	args->busy = obj->active << 16;
+	if (obj->last_write_req)
+		args->busy |= obj->last_write_req->ring->id;
 
+unref:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
@@ -4478,8 +4595,11 @@ unlock:
 void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			  const struct drm_i915_gem_object_ops *ops)
 {
+	int i;
+
 	INIT_LIST_HEAD(&obj->global_list);
-	INIT_LIST_HEAD(&obj->ring_list);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		INIT_LIST_HEAD(&obj->ring_list[i]);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f3e84c44d009..18900f745bc6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -768,8 +768,6 @@ static int do_switch(struct intel_engine_cs *ring,
 		 * swapped, but there is no way to do that yet.
 		 */
 		from->legacy_hw_ctx.rcs_state->dirty = 1;
-		BUG_ON(i915_gem_request_get_ring(
-			from->legacy_hw_ctx.rcs_state->last_read_req) != ring);
 
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/i915_gem_debug.c b/drivers/gpu/drm/i915/i915_gem_debug.c
index f462d1b51d97..17299d04189f 100644
--- a/drivers/gpu/drm/i915/i915_gem_debug.c
+++ b/drivers/gpu/drm/i915/i915_gem_debug.c
@@ -34,82 +34,34 @@ int
 i915_verify_lists(struct drm_device *dev)
 {
 	static int warned;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *ring;
 	int err = 0;
+	int i;
 
 	if (warned)
 		return 0;
 
-	list_for_each_entry(obj, &dev_priv->render_ring.active_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed render active %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.read_domains & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid render active %p (a %d r %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.read_domains);
-			err++;
-		} else if (obj->base.write_domain && list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid render active %p (w %x, gwl %d)\n",
-				  obj,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.flushing_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed flushing %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0 ||
-			   list_empty(&obj->gpu_write_list)) {
-			DRM_ERROR("invalid flushing %p (a %d w %x gwl %d)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain,
-				  !list_empty(&obj->gpu_write_list));
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &dev_priv->mm.gpu_write_list, gpu_write_list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed gpu write %p\n", obj);
-			err++;
-			break;
-		} else if (!obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS) == 0) {
-			DRM_ERROR("invalid gpu write %p (a %d w %x)\n",
-				  obj,
-				  obj->active,
-				  obj->base.write_domain);
-			err++;
-		}
-	}
-
-	list_for_each_entry(obj, &i915_gtt_vm->inactive_list, list) {
-		if (obj->base.dev != dev ||
-		    !atomic_read(&obj->base.refcount.refcount)) {
-			DRM_ERROR("freed inactive %p\n", obj);
-			err++;
-			break;
-		} else if (obj->pin_count || obj->active ||
-			   (obj->base.write_domain & I915_GEM_GPU_DOMAINS)) {
-			DRM_ERROR("invalid inactive %p (p %d a %d w %x)\n",
-				  obj,
-				  obj->pin_count, obj->active,
-				  obj->base.write_domain);
-			err++;
+	for_each_ring(ring, dev_priv, i) {
+		list_for_each_entry(obj, &ring->active_list, ring_list[ring->id]) {
+			if (obj->base.dev != dev ||
+			    !atomic_read(&obj->base.refcount.refcount)) {
+				DRM_ERROR("%s: freed active obj %p\n",
+					  ring->name, obj);
+				err++;
+				break;
+			} else if (!obj->active ||
+				   obj->last_read_req[ring->id] == NULL) {
+				DRM_ERROR("%s: invalid active obj %p\n",
+					  ring->name, obj);
+				err++;
+			} else if (obj->base.write_domain) {
+				DRM_ERROR("%s: invalid write obj %p (w %x)\n",
+					  ring->name,
+					  obj, obj->base.write_domain);
+				err++;
+			}
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1d4e60df8883..5f798961266f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -192,15 +192,20 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 				struct drm_i915_error_buffer *err,
 				int count)
 {
+	int i;
+
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x %x %x",
+		err_printf(m, "    %08x %8u %02x %02x [ ",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
-			   err->write_domain,
-			   err->rseqno, err->wseqno);
+			   err->write_domain);
+		for (i = 0; i < I915_NUM_RINGS; i++)
+			err_printf(m, "%02x ", err->rseqno[i]);
+
+		err_printf(m, "] %02x", err->wseqno);
 		err_puts(m, pin_flag(err->pinned));
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
@@ -679,10 +684,12 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
+	int i;
 
 	err->size = obj->base.size;
 	err->name = obj->base.name;
-	err->rseqno = i915_gem_request_get_seqno(obj->last_read_req);
+	for (i = 0; i < I915_NUM_RINGS; i++)
+		err->rseqno[i] = i915_gem_request_get_seqno(obj->last_read_req[i]);
 	err->wseqno = i915_gem_request_get_seqno(obj->last_write_req);
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->base.read_domains;
@@ -695,8 +702,8 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
 	err->userptr = obj->userptr.mm != NULL;
-	err->ring = obj->last_read_req ?
-			i915_gem_request_get_ring(obj->last_read_req)->id : -1;
+	err->ring = obj->last_write_req ?
+			i915_gem_request_get_ring(obj->last_write_req)->id : -1;
 	err->cache_level = obj->cache_level;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 1846fb510ebb..83785976aa85 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10016,7 +10016,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
 	else if (i915.enable_execlists)
 		return true;
 	else
-		return ring != i915_gem_request_get_ring(obj->last_read_req);
+		return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
 
 static void skl_do_mmio_flip(struct intel_crtc *intel_crtc)
@@ -10321,7 +10321,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	} else if (IS_IVYBRIDGE(dev) || IS_HASWELL(dev)) {
 		ring = &dev_priv->ring[BCS];
 	} else if (INTEL_INFO(dev)->gen >= 7) {
-		ring = i915_gem_request_get_ring(obj->last_read_req);
+		ring = i915_gem_request_get_ring(obj->last_write_req);
 		if (ring == NULL || ring->id != RCS)
 			ring = &dev_priv->ring[BCS];
 	} else {
@@ -10337,7 +10337,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 */
 	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state,
-					 mmio_flip ? i915_gem_request_get_ring(obj->last_read_req) : ring);
+					 mmio_flip ? i915_gem_request_get_ring(obj->last_write_req) : ring);
 	if (ret)
 		goto cleanup_pending;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5dacd402975a..8ff8c5326b23 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -634,7 +634,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
 	struct drm_i915_gem_request *request;
-	int ret, new_space;
+	unsigned space;
+	int ret;
 
 	if (intel_ring_space(ringbuf) >= bytes)
 		return 0;
@@ -645,14 +646,13 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 		 * from multiple ringbuffers. Here, we must ignore any that
 		 * aren't from the ringbuffer we're considering.
 		 */
-		struct intel_context *ctx = request->ctx;
-		if (ctx->engine[ring->id].ringbuf != ringbuf)
+		if (request->ringbuf != ringbuf)
 			continue;
 
 		/* Would completion of this request free enough space? */
-		new_space = __intel_ring_space(request->postfix, ringbuf->tail,
-				       ringbuf->size);
-		if (new_space >= bytes)
+		space = __intel_ring_space(request->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= bytes)
 			break;
 	}
 
@@ -663,11 +663,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
-	WARN_ON(intel_ring_space(ringbuf) < new_space);
-
-	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
+	ringbuf->space = space;
+	return 0;
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d9c2ae504a66..a242178d6792 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2072,15 +2072,16 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_request *request;
-	int ret, new_space;
+	unsigned space;
+	int ret;
 
 	if (intel_ring_space(ringbuf) >= n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		new_space = __intel_ring_space(request->postfix, ringbuf->tail,
-				       ringbuf->size);
-		if (new_space >= n)
+		space = __intel_ring_space(request->postfix, ringbuf->tail,
+					   ringbuf->size);
+		if (space >= n)
 			break;
 	}
 
@@ -2091,10 +2092,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	if (ret)
 		return ret;
 
-	i915_gem_retire_requests_ring(ring);
-
-	WARN_ON(intel_ring_space(ringbuf) < new_space);
-
+	ringbuf->space = space;
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 19/70] drm/i915: Inline check required for object syncing prior to execbuf
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (17 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
                   ` (40 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

This trims a little overhead from the common case of not needing to
synchronize between rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bd7b7ec5c184..1eda0bdc5eab 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -909,9 +909,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 
 	list_for_each_entry(vma, vmas, exec_list) {
 		struct drm_i915_gem_object *obj = vma->obj;
-		ret = i915_gem_object_sync(obj, ring);
-		if (ret)
-			return ret;
+
+		if (obj->active & ~intel_ring_flag(ring)) {
+			ret = i915_gem_object_sync(obj, ring);
+			if (ret)
+				return ret;
+		}
 
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
 			flush_chipset |= i915_gem_clflush_object(obj, false);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (18 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 19/70] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:46   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 21/70] drm/i915: Limit mmio flip " Chris Wilson
                   ` (39 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  1 +
 drivers/gpu/drm/i915/i915_drv.h     | 34 ++++++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_pm.c     |  1 +
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5da74b46e202..c8fe548af41d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2296,6 +2296,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 			   list_empty(&file_priv->rps_boost) ? "" : ", active");
 		rcu_read_unlock();
 	}
+	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
 	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d35778797ef0..057a1346e81f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -268,6 +268,22 @@ struct drm_i915_private;
 struct i915_mm_struct;
 struct i915_mmu_object;
 
+struct drm_i915_file_private {
+	struct drm_i915_private *dev_priv;
+	struct drm_file *file;
+
+	struct {
+		spinlock_t lock;
+		struct list_head request_list;
+	} mm;
+	struct idr context_idr;
+
+	struct list_head rps_boost;
+	struct intel_engine_cs *bsd_ring;
+
+	unsigned rps_boosts;
+};
+
 enum intel_dpll_id {
 	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
 	/* real shared dpll ids must be >= 0 */
@@ -1047,6 +1063,8 @@ struct intel_gen6_power_mgmt {
 	struct list_head clients;
 	unsigned boosts;
 
+	struct drm_i915_file_private semaphores;
+
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
 
@@ -2185,22 +2203,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
  * a later patch when the call to i915_seqno_passed() is obsoleted...
  */
 
-struct drm_i915_file_private {
-	struct drm_i915_private *dev_priv;
-	struct drm_file *file;
-
-	struct {
-		spinlock_t lock;
-		struct list_head request_list;
-	} mm;
-	struct idr context_idr;
-
-	struct list_head rps_boost;
-	struct intel_engine_cs *bsd_ring;
-
-	unsigned rps_boosts;
-};
-
 /*
  * A command that requires special handling by the command parser.
  */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a32a84598fac..3d31ff11fbef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3045,9 +3045,12 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		return ret;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
+		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(rq,
-					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
-					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
+					  atomic_read(&i915->gpu_error.reset_counter),
+					  i915->mm.interruptible,
+					  NULL,
+					  &i915->rps.semaphores);
 		if (ret)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d3f4e9593db1..3e274cf3adaa 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6827,6 +6827,7 @@ void intel_pm_setup(struct drm_device *dev)
 	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
+	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 21/70] drm/i915: Limit mmio flip RPS boosts
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (19 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 22/70] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
                   ` (38 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

Since we will often pageflip to an active surface, we will often have to
wait for the surface to be written before issuing the flip. Also we are
likely to wait on that surface in plenty of time before the vblank.
Since we have a mechanism for boosting when a flip misses the expected
vblank, curtain the number of times we RPS boost when simply waiting for
mmioflip.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 1 +
 drivers/gpu/drm/i915/i915_drv.h      | 1 +
 drivers/gpu/drm/i915/intel_display.c | 4 +++-
 drivers/gpu/drm/i915/intel_drv.h     | 1 +
 drivers/gpu/drm/i915/intel_pm.c      | 1 +
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c8fe548af41d..dc5394032077 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2297,6 +2297,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 		rcu_read_unlock();
 	}
 	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
+	seq_printf(m, "MMIO flip boosts: %d\n", dev_priv->rps.mmioflips.rps_boosts);
 	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 057a1346e81f..8bb7e66dd4cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1064,6 +1064,7 @@ struct intel_gen6_power_mgmt {
 	unsigned boosts;
 
 	struct drm_i915_file_private semaphores;
+	struct drm_i915_file_private mmioflips;
 
 	/* manual wa residency calculations */
 	struct intel_rps_ei up_ei, down_ei;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 83785976aa85..0c2bb2ce04fc 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10111,7 +10111,8 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 	if (mmio_flip->rq)
 		WARN_ON(__i915_wait_request(mmio_flip->rq,
 					    mmio_flip->crtc->reset_counter,
-					    false, NULL, NULL));
+					    false, NULL,
+					    &mmio_flip->i915->rps.mmioflips));
 
 	intel_do_mmio_flip(mmio_flip->crtc);
 
@@ -10132,6 +10133,7 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
 	if (mmio_flip == NULL)
 		return -ENOMEM;
 
+	mmio_flip->i915 = to_i915(dev);
 	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
 	mmio_flip->crtc = to_intel_crtc(crtc);
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 6163be8be812..160f6a28e9a1 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -404,6 +404,7 @@ struct intel_pipe_wm {
 
 struct intel_mmio_flip {
 	struct work_struct work;
+	struct drm_i915_private *i915;
 	struct drm_i915_gem_request *rq;
 	struct intel_crtc *crtc;
 };
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 3e274cf3adaa..17092897c728 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6828,6 +6828,7 @@ void intel_pm_setup(struct drm_device *dev)
 			  intel_gen6_powersave_work);
 	INIT_LIST_HEAD(&dev_priv->rps.clients);
 	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
+	INIT_LIST_HEAD(&dev_priv->rps.mmioflips.rps_boost);
 
 	dev_priv->pm.suspended = false;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 22/70] drm/i915: Reduce frequency of unspecific HSW reg debugging
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (20 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 21/70] drm/i915: Limit mmio flip " Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 23/70] drm/i915: Record ring->start address in error state Chris Wilson
                   ` (37 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala, Paulo Zanoni

Delay the expensive read on the FPGA_DBG register from once per mmio to
once per forcewake section when we are doing the general wellbeing
check rather than the targetted error detection. This almost reduces
the overhead of the debug facility (for example when submitting execlists)
to zero whilst keeping the debug checks around.

v2: Enable one-shot mmio debugging from the interrupt check as well as a
    safeguard to catch invalid display writes from outside the powerwell.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 56 ++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index ab5cc94588e1..0e32bbbcada8 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -149,6 +149,30 @@ fw_domains_put(struct drm_i915_private *dev_priv, enum forcewake_domains fw_doma
 }
 
 static void
+hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
+{
+	static bool mmio_debug_once = true;
+
+	if (i915.mmio_debug || !mmio_debug_once)
+		return;
+
+	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
+		DRM_DEBUG("Unclaimed register detected, "
+			  "enabling oneshot unclaimed register reporting. "
+			  "Please use i915.mmio_debug=N for more information.\n");
+		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
+		i915.mmio_debug = mmio_debug_once--;
+	}
+}
+
+static void
+fw_domains_put_debug(struct drm_i915_private *dev_priv, enum forcewake_domains fw_domains)
+{
+	hsw_unclaimed_reg_detect(dev_priv);
+	fw_domains_put(dev_priv, fw_domains);
+}
+
+static void
 fw_domains_posting_read(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *d;
@@ -561,23 +585,6 @@ hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, bool read,
 	}
 }
 
-static void
-hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
-{
-	static bool mmio_debug_once = true;
-
-	if (i915.mmio_debug || !mmio_debug_once)
-		return;
-
-	if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
-		DRM_DEBUG("Unclaimed register detected, "
-			  "enabling oneshot unclaimed register reporting. "
-			  "Please use i915.mmio_debug=N for more information.\n");
-		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
-		i915.mmio_debug = mmio_debug_once--;
-	}
-}
-
 #define GEN2_READ_HEADER(x) \
 	u##x val = 0; \
 	assert_device_not_suspended(dev_priv);
@@ -829,7 +836,6 @@ hsw_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace)
 		gen6_gt_check_fifodbg(dev_priv); \
 	} \
 	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
-	hsw_unclaimed_reg_detect(dev_priv); \
 	GEN6_WRITE_FOOTER; \
 }
 
@@ -871,7 +877,6 @@ gen8_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace
 		__force_wake_get(dev_priv, FORCEWAKE_RENDER); \
 	__raw_i915_write##x(dev_priv, reg, val); \
 	hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
-	hsw_unclaimed_reg_detect(dev_priv); \
 	GEN6_WRITE_FOOTER; \
 }
 
@@ -1120,6 +1125,10 @@ static void intel_uncore_fw_domains_init(struct drm_device *dev)
 			       FORCEWAKE, FORCEWAKE_ACK);
 	}
 
+	if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
+	    dev_priv->uncore.funcs.force_wake_put == fw_domains_put)
+		dev_priv->uncore.funcs.force_wake_put = fw_domains_put_debug;
+
 	/* All future platforms are expected to require complex power gating */
 	WARN_ON(dev_priv->uncore.fw_domains == 0);
 }
@@ -1411,11 +1420,6 @@ int intel_gpu_reset(struct drm_device *dev)
 
 void intel_uncore_check_errors(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	if (HAS_FPGA_DBG_UNCLAIMED(dev) &&
-	    (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) {
-		DRM_ERROR("Unclaimed register before interrupt\n");
-		__raw_i915_write32(dev_priv, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
-	}
+	if (HAS_FPGA_DBG_UNCLAIMED(dev))
+		hsw_unclaimed_reg_detect(to_i915(dev));
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 23/70] drm/i915: Record ring->start address in error state
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (21 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 22/70] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-08 11:47   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 24/70] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
                   ` (36 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

This is mostly useful for execlists where the rings switch between
contexts (and so checking that the ring's start register matches the
context is important).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8bb7e66dd4cd..d69ccd16cd60 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -471,6 +471,7 @@ struct drm_i915_error_state {
 		u32 semaphore_seqno[I915_NUM_RINGS - 1];
 
 		/* Register state */
+		u32 start;
 		u32 tail;
 		u32 head;
 		u32 ctl;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5f798961266f..17dc2fcaba10 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -256,10 +256,11 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 		return;
 
 	err_printf(m, "%s command stream:\n", ring_str(ring_idx));
-	err_printf(m, "  HEAD: 0x%08x\n", ring->head);
-	err_printf(m, "  TAIL: 0x%08x\n", ring->tail);
-	err_printf(m, "  CTL: 0x%08x\n", ring->ctl);
-	err_printf(m, "  HWS: 0x%08x\n", ring->hws);
+	err_printf(m, "  START: 0x%08x\n", ring->start);
+	err_printf(m, "  HEAD:  0x%08x\n", ring->head);
+	err_printf(m, "  TAIL:  0x%08x\n", ring->tail);
+	err_printf(m, "  CTL:   0x%08x\n", ring->ctl);
+	err_printf(m, "  HWS:   0x%08x\n", ring->hws);
 	err_printf(m, "  ACTHD: 0x%08x %08x\n", (u32)(ring->acthd>>32), (u32)ring->acthd);
 	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
 	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
@@ -890,6 +891,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
 	ering->seqno = ring->get_seqno(ring, false);
 	ering->acthd = intel_ring_get_active_head(ring);
+	ering->start = I915_READ_START(ring);
 	ering->head = I915_READ_HEAD(ring);
 	ering->tail = I915_READ_TAIL(ring);
 	ering->ctl = I915_READ_CTL(ring);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 24/70] drm/i915: Use simpler form of spin_lock_irq(execlist_lock)
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (22 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 23/70] drm/i915: Record ring->start address in error state Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 25/70] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
                   ` (35 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

We can use the simpler spinlock form to disable interrupts as we are
always outside of an irq/softirq handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8ff8c5326b23..ce84aa9811ba 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -501,7 +501,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_request *cursor;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	unsigned long flags;
 	int num_elements = 0;
 
 	if (to != ring->default_context)
@@ -528,7 +527,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	intel_runtime_pm_get(dev_priv);
 
-	spin_lock_irqsave(&ring->execlist_lock, flags);
+	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
 		if (++num_elements > 2)
@@ -554,7 +553,7 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	if (num_elements == 0)
 		execlists_context_unqueue(ring);
 
-	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+	spin_unlock_irq(&ring->execlist_lock);
 
 	return 0;
 }
@@ -936,7 +935,6 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req, *tmp;
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	unsigned long flags;
 	struct list_head retired_list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -944,9 +942,9 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 		return;
 
 	INIT_LIST_HEAD(&retired_list);
-	spin_lock_irqsave(&ring->execlist_lock, flags);
+	spin_lock_irq(&ring->execlist_lock);
 	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
-	spin_unlock_irqrestore(&ring->execlist_lock, flags);
+	spin_unlock_irq(&ring->execlist_lock);
 
 	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
 		struct intel_context *ctx = req->ctx;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 25/70] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (23 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 24/70] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 26/70] drm/i915: Map the execlists context regs once during pinning Chris Wilson
                   ` (34 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c  | 1 -
 drivers/gpu/drm/i915/intel_lrc.c | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3d31ff11fbef..f94fe2ba4f6f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2698,7 +2698,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 				struct drm_i915_gem_request,
 				execlist_link);
 		list_del(&submit_req->execlist_link);
-		intel_runtime_pm_put(dev_priv);
 
 		if (submit_req->ctx != ring->default_context)
 			intel_lr_context_unpin(ring, submit_req->ctx);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ce84aa9811ba..8acfcf39e72d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -525,8 +525,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	}
 	request->tail = tail;
 
-	intel_runtime_pm_get(dev_priv);
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
@@ -953,7 +951,6 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 
 		if (ctx_obj && (ctx != ring->default_context))
 			intel_lr_context_unpin(ring, ctx);
-		intel_runtime_pm_put(dev_priv);
 		list_del(&req->execlist_link);
 		i915_gem_request_unreference(req);
 	}
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 26/70] drm/i915: Map the execlists context regs once during pinning
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (24 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 25/70] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
                   ` (33 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

When we pin the execlists context on queuing, it the ideal time to map
the register page that we need to update when we submit the request to
the hardware (and keep it around for future requests).

This avoids having to do an atomic kmap on every submission. On the
other hand, it does depend upon correct request construction.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |  10 --
 drivers/gpu/drm/i915/intel_lrc.c        | 189 ++++++++++++--------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   2 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 4 files changed, 73 insertions(+), 129 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f94fe2ba4f6f..071800553a43 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2589,13 +2589,6 @@ void i915_gem_request_free(struct kref *req_ref)
 	struct intel_context *ctx = req->ctx;
 
 	if (ctx) {
-		if (i915.enable_execlists) {
-			struct intel_engine_cs *ring = req->ring;
-
-			if (ctx != ring->default_context)
-				intel_lr_context_unpin(ring, ctx);
-		}
-
 		i915_gem_context_unreference(ctx);
 	}
 
@@ -2699,9 +2692,6 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 				execlist_link);
 		list_del(&submit_req->execlist_link);
 
-		if (submit_req->ctx != ring->default_context)
-			intel_lr_context_unpin(ring, submit_req->ctx);
-
 		i915_gem_request_unreference(submit_req);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8acfcf39e72d..4c985e186e3a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -203,9 +203,6 @@ enum {
 };
 #define GEN8_CTX_ID_SHIFT 32
 
-static int intel_lr_context_pin(struct intel_engine_cs *ring,
-		struct intel_context *ctx);
-
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
  * @dev: DRM device.
@@ -318,47 +315,18 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
-static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
-				    struct drm_i915_gem_object *ring_obj,
-				    u32 tail)
-{
-	struct page *page;
-	uint32_t *reg_state;
-
-	page = i915_gem_object_get_page(ctx_obj, 1);
-	reg_state = kmap_atomic(page);
-
-	reg_state[CTX_RING_TAIL+1] = tail;
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
-
-	kunmap_atomic(reg_state);
-
-	return 0;
-}
-
 static void execlists_submit_contexts(struct intel_engine_cs *ring,
 				      struct intel_context *to0, u32 tail0,
 				      struct intel_context *to1, u32 tail1)
 {
 	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf0 = to0->engine[ring->id].ringbuf;
 	struct drm_i915_gem_object *ctx_obj1 = NULL;
-	struct intel_ringbuffer *ringbuf1 = NULL;
-
-	BUG_ON(!ctx_obj0);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
-	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	to0->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail0;
 
 	if (to1) {
-		ringbuf1 = to1->engine[ring->id].ringbuf;
+		to1->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail1;
 		ctx_obj1 = to1->engine[ring->id].state;
-		BUG_ON(!ctx_obj1);
-		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
-		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
-
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -500,29 +468,17 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct drm_i915_gem_request *request)
 {
 	struct drm_i915_gem_request *cursor;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int num_elements = 0;
 
-	if (to != ring->default_context)
-		intel_lr_context_pin(ring, to);
+	if (WARN_ON(request == NULL))
+		return -ENODEV;
+
+	if (WARN_ON(to->engine[ring->id].pin_count == 0))
+		return -ENODEV;
+
+	i915_gem_request_reference(request);
+	WARN_ON(to != request->ctx);
 
-	if (!request) {
-		/*
-		 * If there isn't a request associated with this submission,
-		 * create one as a temporary holder.
-		 */
-		request = kzalloc(sizeof(*request), GFP_KERNEL);
-		if (request == NULL)
-			return -ENOMEM;
-		request->ring = ring;
-		request->ctx = to;
-		kref_init(&request->ref);
-		request->uniq = dev_priv->request_uniq++;
-		i915_gem_context_reference(request->ctx);
-	} else {
-		i915_gem_request_reference(request);
-		WARN_ON(to != request->ctx);
-	}
 	request->tail = tail;
 
 	spin_lock_irq(&ring->execlist_lock);
@@ -608,16 +564,47 @@ static int execlists_move_to_gpu(struct intel_ringbuffer *ringbuf,
 	return logical_ring_invalidate_all_caches(ringbuf, ctx);
 }
 
+static int intel_lr_context_pin(struct intel_engine_cs *ring,
+				struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+	if (ctx->engine[ring->id].pin_count++)
+		return 0;
+
+	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret)
+		goto reset_pin_count;
+
+	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+	if (ret)
+		goto unpin_ctx_obj;
+
+	ringbuf->regs = kmap(i915_gem_object_get_page(ctx_obj, 1));
+	ringbuf->regs[CTX_RING_BUFFER_START+1] =
+		i915_gem_obj_ggtt_offset(ringbuf->obj);
+
+	return 0;
+
+unpin_ctx_obj:
+	i915_gem_object_ggtt_unpin(ctx_obj);
+reset_pin_count:
+	ctx->engine[ring->id].pin_count = 0;
+
+	return ret;
+}
+
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request,
 					    struct intel_context *ctx)
 {
 	int ret;
 
-	if (ctx != request->ring->default_context) {
-		ret = intel_lr_context_pin(request->ring, ctx);
-		if (ret)
-			return ret;
-	}
+	ret = intel_lr_context_pin(request->ring, ctx);
+	if (ret)
+		return ret;
 
 	request->ringbuf = ctx->engine[request->ring->id].ringbuf;
 	request->ctx     = ctx;
@@ -929,30 +916,42 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 	return 0;
 }
 
+static void intel_lr_context_unpin(struct intel_engine_cs *ring,
+				   struct intel_context *ctx)
+{
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+
+	if (--ctx->engine[ring->id].pin_count)
+		return;
+
+	kunmap(i915_gem_object_get_page(ctx_obj, 1));
+	ringbuf->regs = NULL;
+
+	intel_unpin_ringbuffer_obj(ringbuf);
+	i915_gem_object_ggtt_unpin(ctx_obj);
+}
+
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req, *tmp;
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	struct list_head retired_list;
+	struct list_head list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (list_empty(&ring->execlist_retired_req_list))
 		return;
 
-	INIT_LIST_HEAD(&retired_list);
 	spin_lock_irq(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
+	list_replace_init(&ring->execlist_retired_req_list, &list);
 	spin_unlock_irq(&ring->execlist_lock);
 
-	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
-		struct intel_context *ctx = req->ctx;
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
+	while (!list_empty(&list)) {
+		struct drm_i915_gem_request *rq;
+
+		rq = list_first_entry(&list, typeof(*rq), execlist_link);
+		list_del(&rq->execlist_link);
 
-		if (ctx_obj && (ctx != ring->default_context))
-			intel_lr_context_unpin(ring, ctx);
-		list_del(&req->execlist_link);
-		i915_gem_request_unreference(req);
+		intel_lr_context_unpin(ring, rq->ctx);
+		i915_gem_request_unreference(rq);
 	}
 }
 
@@ -995,50 +994,6 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf,
 	return 0;
 }
 
-static int intel_lr_context_pin(struct intel_engine_cs *ring,
-		struct intel_context *ctx)
-{
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
-	int ret = 0;
-
-	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (ctx->engine[ring->id].pin_count++ == 0) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj,
-				GEN8_LR_CONTEXT_ALIGN, 0);
-		if (ret)
-			goto reset_pin_count;
-
-		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
-		if (ret)
-			goto unpin_ctx_obj;
-	}
-
-	return ret;
-
-unpin_ctx_obj:
-	i915_gem_object_ggtt_unpin(ctx_obj);
-reset_pin_count:
-	ctx->engine[ring->id].pin_count = 0;
-
-	return ret;
-}
-
-void intel_lr_context_unpin(struct intel_engine_cs *ring,
-		struct intel_context *ctx)
-{
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
-
-	if (ctx_obj) {
-		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-		if (--ctx->engine[ring->id].pin_count == 0) {
-			intel_unpin_ringbuffer_obj(ringbuf);
-			i915_gem_object_ggtt_unpin(ctx_obj);
-		}
-	}
-}
-
 static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 					       struct intel_context *ctx)
 {
@@ -1967,7 +1922,7 @@ error_unpin_ctx:
 }
 
 void intel_lr_context_reset(struct drm_device *dev,
-			struct intel_context *ctx)
+			    struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 04d3a6d8b207..b6fd4c2e8b6e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -70,8 +70,6 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
-void intel_lr_context_unpin(struct intel_engine_cs *ring,
-		struct intel_context *ctx);
 void intel_lr_context_reset(struct drm_device *dev,
 			struct intel_context *ctx);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc0ee54..0f0325e88b5a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -97,6 +97,7 @@ struct intel_ring_hangcheck {
 struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
+	uint32_t *regs;
 
 	struct intel_engine_cs *ring;
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (25 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 26/70] drm/i915: Map the execlists context regs once during pinning Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-09 15:02   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 28/70] drm/i915: Overhaul execlist submission Chris Wilson
                   ` (32 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

After the removal of DRI1, all access to the rings are through requests
and so we can always be sure that there is a request to wait upon to
free up available space. The fallback code only existed so that we could
quiesce the GPU following unmediated access by DRI1.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_trace.h       | 27 ----------------
 drivers/gpu/drm/i915/intel_lrc.c        | 57 +++------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 56 ++------------------------------
 3 files changed, 6 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b3070a4501ab..97483e21c9b4 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -597,33 +597,6 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_wait_end,
 	    TP_ARGS(req)
 );
 
-DECLARE_EVENT_CLASS(i915_ring,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring),
-
-	    TP_STRUCT__entry(
-			     __field(u32, dev)
-			     __field(u32, ring)
-			     ),
-
-	    TP_fast_assign(
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   ),
-
-	    TP_printk("dev=%u, ring=%u", __entry->dev, __entry->ring)
-);
-
-DEFINE_EVENT(i915_ring, i915_ring_wait_begin,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring)
-);
-
-DEFINE_EVENT(i915_ring, i915_ring_wait_end,
-	    TP_PROTO(struct intel_engine_cs *ring),
-	    TP_ARGS(ring)
-);
-
 TRACE_EVENT(i915_flip_request,
 	    TP_PROTO(int plane, struct drm_i915_gem_object *obj),
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4c985e186e3a..d1a9701c7f7b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -613,8 +613,9 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
 	return 0;
 }
 
-static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
-				     int bytes)
+static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
+				       struct intel_context *ctx,
+				       int bytes)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
 	struct drm_i915_gem_request *request;
@@ -640,7 +641,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 			break;
 	}
 
-	if (&request->list == &ring->request_list)
+	if (WARN_ON(&request->list == &ring->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(request);
@@ -675,56 +676,6 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 	execlists_context_queue(ring, ctx, ringbuf->tail, request);
 }
 
-static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
-				       struct intel_context *ctx,
-				       int bytes)
-{
-	struct intel_engine_cs *ring = ringbuf->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long end;
-	int ret;
-
-	ret = logical_ring_wait_request(ringbuf, bytes);
-	if (ret != -ENOSPC)
-		return ret;
-
-	/* Force the context submission in case we have been skipping it */
-	intel_logical_ring_advance_and_submit(ringbuf, ctx, NULL);
-
-	/* With GEM the hangcheck timer should kick us out of the loop,
-	 * leaving it early runs the risk of corrupting GEM state (due
-	 * to running on almost untested codepaths). But on resume
-	 * timers don't work yet, so prevent a complete hang in that
-	 * case by choosing an insanely large timeout. */
-	end = jiffies + 60 * HZ;
-
-	ret = 0;
-	do {
-		if (intel_ring_space(ringbuf) >= bytes)
-			break;
-
-		msleep(1);
-
-		if (dev_priv->mm.interruptible && signal_pending(current)) {
-			ret = -ERESTARTSYS;
-			break;
-		}
-
-		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-					   dev_priv->mm.interruptible);
-		if (ret)
-			break;
-
-		if (time_after(jiffies, end)) {
-			ret = -EBUSY;
-			break;
-		}
-	} while (1);
-
-	return ret;
-}
-
 static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
 				    struct intel_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a242178d6792..93788a36db62 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2068,7 +2068,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	ring->buffer = NULL;
 }
 
-static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
+static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
 	struct drm_i915_gem_request *request;
@@ -2085,7 +2085,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 			break;
 	}
 
-	if (&request->list == &ring->request_list)
+	if (WARN_ON(&request->list == &ring->request_list))
 		return -ENOSPC;
 
 	ret = i915_wait_request(request);
@@ -2096,58 +2096,6 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	return 0;
 }
 
-static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
-{
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	unsigned long end;
-	int ret;
-
-	ret = intel_ring_wait_request(ring, n);
-	if (ret != -ENOSPC)
-		return ret;
-
-	/* force the tail write in case we have been skipping them */
-	__intel_ring_advance(ring);
-
-	/* With GEM the hangcheck timer should kick us out of the loop,
-	 * leaving it early runs the risk of corrupting GEM state (due
-	 * to running on almost untested codepaths). But on resume
-	 * timers don't work yet, so prevent a complete hang in that
-	 * case by choosing an insanely large timeout. */
-	end = jiffies + 60 * HZ;
-
-	ret = 0;
-	trace_i915_ring_wait_begin(ring);
-	do {
-		if (intel_ring_space(ringbuf) >= n)
-			break;
-		ringbuf->head = I915_READ_HEAD(ring);
-		if (intel_ring_space(ringbuf) >= n)
-			break;
-
-		msleep(1);
-
-		if (dev_priv->mm.interruptible && signal_pending(current)) {
-			ret = -ERESTARTSYS;
-			break;
-		}
-
-		ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-					   dev_priv->mm.interruptible);
-		if (ret)
-			break;
-
-		if (time_after(jiffies, end)) {
-			ret = -EBUSY;
-			break;
-		}
-	} while (1);
-	trace_i915_ring_wait_end(ring);
-	return ret;
-}
-
 static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 {
 	uint32_t __iomem *virt;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 28/70] drm/i915: Overhaul execlist submission
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (26 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 29/70] drm/i915: Move the execlists retirement to the right spot Chris Wilson
                   ` (31 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

The list handling during submission was quite confusing as the retired
requests were out of order - making it much harder in future to reduce
the extra lists. Simplify the submission mechanism to explicitly track
the actual requests current on each port and so trim the amount of work
required to track hardware and making execlists more consistent with the
GEM core.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  21 +--
 drivers/gpu/drm/i915/i915_drv.h         |   5 +-
 drivers/gpu/drm/i915/i915_gem.c         |  17 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 306 ++++++++++++--------------------
 drivers/gpu/drm/i915/intel_lrc.h        |   1 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +-
 6 files changed, 136 insertions(+), 217 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index dc5394032077..6c147e1bff0c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1942,8 +1942,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+	seq_printf(m, "CONTEXT: %s\n", ring->name);
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2029,7 +2028,7 @@ static int i915_execlists(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	for_each_ring(ring, dev_priv, ring_id) {
-		struct drm_i915_gem_request *head_req = NULL;
+		struct drm_i915_gem_request *rq[2];
 		int count = 0;
 		unsigned long flags;
 
@@ -2059,22 +2058,16 @@ static int i915_execlists(struct seq_file *m, void *data)
 		}
 
 		spin_lock_irqsave(&ring->execlist_lock, flags);
+		memcpy(rq, ring->execlist_port, sizeof(rq));
 		list_for_each(cursor, &ring->execlist_queue)
 			count++;
-		head_req = list_first_entry_or_null(&ring->execlist_queue,
-				struct drm_i915_gem_request, execlist_link);
 		spin_unlock_irqrestore(&ring->execlist_lock, flags);
 
 		seq_printf(m, "\t%d requests in queue\n", count);
-		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
-			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
-			seq_printf(m, "\tHead request tail: %u\n",
-				   head_req->tail);
-		}
+		seq_printf(m, "\tPort[0] seqno: %u\n",
+			   rq[0] ? rq[0]->seqno : 0);
+		seq_printf(m, "\tPort[1] seqno: %u\n",
+			   rq[1] ? rq[1]->seqno : 0);
 
 		seq_putc(m, '\n');
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d69ccd16cd60..b36c97c8c486 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2137,14 +2137,11 @@ struct drm_i915_gem_request {
 
 	/** Execlist link in the submission queue.*/
 	struct list_head execlist_link;
-
-	/** Execlists no. of times this request has been sent to the ELSP */
-	int elsp_submitted;
-
 };
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx);
+void i915_gem_request_retire(struct drm_i915_gem_request *request);
 void i915_gem_request_free(struct kref *req_ref);
 
 static inline uint32_t
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 071800553a43..b64454a64d63 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1354,7 +1354,7 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	spin_unlock(&file_priv->mm.lock);
 }
 
-static void i915_gem_request_retire(struct drm_i915_gem_request *request)
+void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
 
@@ -2684,15 +2684,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	 * are the ones that keep the context and ringbuffer backing objects
 	 * pinned in place.
 	 */
-	while (!list_empty(&ring->execlist_queue)) {
-		struct drm_i915_gem_request *submit_req;
-
-		submit_req = list_first_entry(&ring->execlist_queue,
-				struct drm_i915_gem_request,
-				execlist_link);
-		list_del(&submit_req->execlist_link);
+	if (i915.enable_execlists) {
+		spin_lock_irq(&ring->execlist_lock);
+		list_splice_tail_init(&ring->execlist_queue,
+				      &ring->execlist_completed);
+		memset(&ring->execlist_port, 0, sizeof(ring->execlist_port));
+		spin_unlock_irq(&ring->execlist_lock);
 
-		i915_gem_request_unreference(submit_req);
+		intel_execlists_retire_requests(ring);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d1a9701c7f7b..24c367f9fddf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,78 +230,54 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-/**
- * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
- *
- * Do not confuse with ctx->id! Unfortunately we have a name overload
- * here: the old context ID we pass to userspace as a handler so that
- * they can refer to a context, and the new context ID we pass to the
- * ELSP so that the GPU can inform us of the context status via
- * interrupts.
- *
- * Return: 20-bits globally unique context ID.
- */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
-{
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
-
-	/* LRCA is required to be 4K aligned so the more significant 20 bits
-	 * are globally unique */
-	return lrca >> 12;
-}
-
-static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
+static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 					 struct drm_i915_gem_object *ctx_obj)
 {
-	struct drm_device *dev = ring->dev;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
+	uint32_t desc;
 
 	desc = GEN8_CTX_VALID;
 	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+	desc |= i915_gem_obj_ggtt_offset(ctx_obj);
 
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
 	/* desc |= GEN8_CTX_FORCE_RESTORE; */
 
 	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(dev) &&
-	    INTEL_REVID(dev) <= SKL_REVID_B0 &&
+	if (IS_GEN9(ring->dev) && INTEL_REVID(ring->dev) <= SKL_REVID_B0 &&
 	    (ring->id == BCS || ring->id == VCS ||
-	    ring->id == VECS || ring->id == VCS2))
+	     ring->id == VECS || ring->id == VCS2))
 		desc |= GEN8_CTX_FORCE_RESTORE;
 
 	return desc;
 }
 
-static void execlists_elsp_write(struct intel_engine_cs *ring,
-				 struct drm_i915_gem_object *ctx_obj0,
-				 struct drm_i915_gem_object *ctx_obj1)
+static uint32_t execlists_request_write_tail(struct intel_engine_cs *ring,
+					     struct drm_i915_gem_request *rq)
+
 {
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	uint64_t temp = 0;
+	rq->ctx->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = rq->tail;
+	return execlists_ctx_descriptor(ring, rq->ctx->engine[ring->id].state);
+}
+
+static void execlists_submit_pair(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
 	uint32_t desc[4];
 
-	/* XXX: You must always write both descriptors in the order below. */
-	if (ctx_obj1)
-		temp = execlists_ctx_descriptor(ring, ctx_obj1);
-	else
-		temp = 0;
-	desc[1] = (u32)(temp >> 32);
-	desc[0] = (u32)temp;
+	if (ring->execlist_port[1]) {
+		desc[0] = execlists_request_write_tail(ring,
+						       ring->execlist_port[1]);
+		desc[1] = ring->execlist_port[1]->seqno;
+	} else
+		desc[1] = desc[0] = 0;
 
-	temp = execlists_ctx_descriptor(ring, ctx_obj0);
-	desc[3] = (u32)(temp >> 32);
-	desc[2] = (u32)temp;
+	desc[2] = execlists_request_write_tail(ring, ring->execlist_port[0]);
+	desc[3] = ring->execlist_port[0]->seqno;
 
+	/* Note: You must always write both descriptors in the order below. */
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 	I915_WRITE(RING_ELSP(ring), desc[1]);
 	I915_WRITE(RING_ELSP(ring), desc[0]);
@@ -310,96 +286,82 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	/* The context is automatically loaded after the following */
 	I915_WRITE(RING_ELSP(ring), desc[2]);
 
-	/* ELSP is a wo register, so use another nearby reg for posting instead */
+	/* ELSP is a wo register, use another nearby reg for posting instead */
 	POSTING_READ(RING_EXECLIST_STATUS(ring));
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
-static void execlists_submit_contexts(struct intel_engine_cs *ring,
-				      struct intel_context *to0, u32 tail0,
-				      struct intel_context *to1, u32 tail1)
+static void execlists_context_unqueue(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *ctx_obj0 = to0->engine[ring->id].state;
-	struct drm_i915_gem_object *ctx_obj1 = NULL;
+	struct drm_i915_gem_request *cursor;
+	bool submit = false;
+	int i = 0;
 
-	to0->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail0;
+	assert_spin_locked(&ring->execlist_lock);
 
-	if (to1) {
-		to1->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = tail1;
-		ctx_obj1 = to1->engine[ring->id].state;
+	/* Try to read in pairs */
+	cursor = ring->execlist_port[0];
+	if (cursor == NULL)
+		cursor = list_first_entry(&ring->execlist_queue,
+					  typeof(*cursor),
+					  execlist_link);
+	else
+		cursor = list_next_entry(cursor, execlist_link);
+	while (&cursor->execlist_link != &ring->execlist_queue) {
+		/* Same ctx: ignore earlier request, as the
+		 * second request extends the first.
+		 */
+		if (ring->execlist_port[i] &&
+		    cursor->ctx != ring->execlist_port[i]->ctx) {
+			if (++i == ARRAY_SIZE(ring->execlist_port))
+				break;
+		}
+
+		ring->execlist_port[i] = cursor;
+		submit = true;
+
+		cursor = list_next_entry(cursor, execlist_link);
 	}
 
-	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
+	if (submit)
+		execlists_submit_pair(ring);
 }
 
-static void execlists_context_unqueue(struct intel_engine_cs *ring)
+static bool execlists_complete_requests(struct intel_engine_cs *ring,
+					u32 seqno)
 {
-	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
-	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
-
 	assert_spin_locked(&ring->execlist_lock);
 
-	if (list_empty(&ring->execlist_queue))
-		return;
-
-	/* Try to read in pairs */
-	list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue,
-				 execlist_link) {
-		if (!req0) {
-			req0 = cursor;
-		} else if (req0->ctx == cursor->ctx) {
-			/* Same ctx: ignore first request, as second request
-			 * will update tail past first request's workload */
-			cursor->elsp_submitted = req0->elsp_submitted;
-			list_del(&req0->execlist_link);
-			list_add_tail(&req0->execlist_link,
-				&ring->execlist_retired_req_list);
-			req0 = cursor;
-		} else {
-			req1 = cursor;
-			break;
-		}
-	}
+	if (seqno == 0)
+		return false;
 
-	WARN_ON(req1 && req1->elsp_submitted);
+	do {
+		struct drm_i915_gem_request *rq;
 
-	execlists_submit_contexts(ring, req0->ctx, req0->tail,
-				  req1 ? req1->ctx : NULL,
-				  req1 ? req1->tail : 0);
+		rq = ring->execlist_port[0];
+		if (rq == NULL)
+			break;
 
-	req0->elsp_submitted++;
-	if (req1)
-		req1->elsp_submitted++;
-}
+		if (!i915_seqno_passed(seqno, rq->seqno))
+			break;
 
-static bool execlists_check_remove_request(struct intel_engine_cs *ring,
-					   u32 request_id)
-{
-	struct drm_i915_gem_request *head_req;
+		do {
+			struct drm_i915_gem_request *prev =
+				list_entry(rq->execlist_link.prev,
+					   typeof(*rq),
+					   execlist_link);
 
-	assert_spin_locked(&ring->execlist_lock);
+			list_move_tail(&rq->execlist_link,
+				       &ring->execlist_completed);
 
-	head_req = list_first_entry_or_null(&ring->execlist_queue,
-					    struct drm_i915_gem_request,
-					    execlist_link);
+			rq = prev;
+		} while (&rq->execlist_link != &ring->execlist_queue);
 
-	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
-			WARN(head_req->elsp_submitted == 0,
-			     "Never submitted head request\n");
-
-			if (--head_req->elsp_submitted <= 0) {
-				list_del(&head_req->execlist_link);
-				list_add_tail(&head_req->execlist_link,
-					&ring->execlist_retired_req_list);
-				return true;
-			}
-		}
-	}
+		ring->execlist_port[0] = ring->execlist_port[1];
+		ring->execlist_port[1] = NULL;
+	} while (1);
 
-	return false;
+	return ring->execlist_port[1] == NULL;
 }
 
 /**
@@ -411,55 +373,36 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
  */
 void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
-	u32 status_pointer;
-	u8 read_pointer;
-	u8 write_pointer;
-	u32 status;
-	u32 status_id;
-	u32 submit_contexts = 0;
-
-	status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
-
-	read_pointer = ring->next_context_status_buffer;
-	write_pointer = status_pointer & 0x07;
-	if (read_pointer > write_pointer)
-		write_pointer += 6;
-
-	spin_lock(&ring->execlist_lock);
-
-	while (read_pointer < write_pointer) {
-		read_pointer++;
-		status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
-				(read_pointer % 6) * 8);
-		status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
-				(read_pointer % 6) * 8 + 4);
-
-		if (status & GEN8_CTX_STATUS_PREEMPTED) {
-			if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
-				if (execlists_check_remove_request(ring, status_id))
-					WARN(1, "Lite Restored request removed from queue\n");
-			} else
-				WARN(1, "Preemption without Lite Restore\n");
-		}
-
-		 if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
-		     (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
-			if (execlists_check_remove_request(ring, status_id))
-				submit_contexts++;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	u8 head, tail;
+	u32 seqno = 0;
+
+	head = ring->next_context_status_buffer;
+	tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
+	if (head > tail)
+		tail += 6;
+
+	while (head++ < tail) {
+		u32 reg = RING_CONTEXT_STATUS_BUF(ring) + (head % 6)*8;
+		u32 status = I915_READ(reg);
+		if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED && 0)) {
+			DRM_ERROR("Pre-empted request %x %s Lite Restore\n",
+				  I915_READ(reg + 4),
+				  status & GEN8_CTX_STATUS_LITE_RESTORE ? "with" : "without");
 		}
+		if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
+			      GEN8_CTX_STATUS_ELEMENT_SWITCH))
+			seqno = I915_READ(reg + 4);
 	}
 
-	if (submit_contexts != 0)
-		execlists_context_unqueue(ring);
+	ring->next_context_status_buffer = tail % 6;
+	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
+		   (u32)ring->next_context_status_buffer << 8);
 
+	spin_lock(&ring->execlist_lock);
+	if (execlists_complete_requests(ring, seqno))
+		execlists_context_unqueue(ring);
 	spin_unlock(&ring->execlist_lock);
-
-	WARN(submit_contexts > 2, "More than two context complete events?\n");
-	ring->next_context_status_buffer = write_pointer % 6;
-
-	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
-		   ((u32)ring->next_context_status_buffer & 0x07) << 8);
 }
 
 static int execlists_context_queue(struct intel_engine_cs *ring,
@@ -467,9 +410,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 				   u32 tail,
 				   struct drm_i915_gem_request *request)
 {
-	struct drm_i915_gem_request *cursor;
-	int num_elements = 0;
-
 	if (WARN_ON(request == NULL))
 		return -ENODEV;
 
@@ -483,29 +423,11 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	spin_lock_irq(&ring->execlist_lock);
 
-	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
-		if (++num_elements > 2)
-			break;
-
-	if (num_elements > 2) {
-		struct drm_i915_gem_request *tail_req;
-
-		tail_req = list_last_entry(&ring->execlist_queue,
-					   struct drm_i915_gem_request,
-					   execlist_link);
-
-		if (to == tail_req->ctx) {
-			WARN(tail_req->elsp_submitted != 0,
-				"More than 2 already-submitted reqs queued\n");
-			list_del(&tail_req->execlist_link);
-			list_add_tail(&tail_req->execlist_link,
-				&ring->execlist_retired_req_list);
-		}
-	}
-
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
-	if (num_elements == 0)
-		execlists_context_unqueue(ring);
+	if (ring->execlist_port[0] == NULL) {
+		ring->execlist_port[0] = request;
+		execlists_submit_pair(ring);
+	}
 
 	spin_unlock_irq(&ring->execlist_lock);
 
@@ -579,6 +501,11 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
+	if (WARN_ON(i915_gem_obj_ggtt_offset(ctx_obj) & 0xFFFFFFFF00000FFFULL)) {
+		ret = -ENODEV;
+		goto unpin_ctx_obj;
+	}
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
@@ -888,11 +815,11 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 	struct list_head list;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (list_empty(&ring->execlist_retired_req_list))
+	if (list_empty(&ring->execlist_completed))
 		return;
 
 	spin_lock_irq(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &list);
+	list_replace_init(&ring->execlist_completed, &list);
 	spin_unlock_irq(&ring->execlist_lock);
 
 	while (!list_empty(&list)) {
@@ -901,6 +828,9 @@ void intel_execlists_retire_requests(struct intel_engine_cs *ring)
 		rq = list_first_entry(&list, typeof(*rq), execlist_link);
 		list_del(&rq->execlist_link);
 
+		if (!list_empty(&rq->list))
+			i915_gem_request_retire(rq);
+
 		intel_lr_context_unpin(ring, rq->ctx);
 		i915_gem_request_unreference(rq);
 	}
@@ -1303,7 +1233,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
+	INIT_LIST_HEAD(&ring->execlist_completed);
 	spin_lock_init(&ring->execlist_lock);
 
 	ret = i915_cmd_parser_init_ring(ring);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index b6fd4c2e8b6e..0790e4b26b13 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -82,7 +82,6 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 			       struct list_head *vmas,
 			       struct drm_i915_gem_object *batch_obj,
 			       u64 exec_start, u32 dispatch_flags);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 0f0325e88b5a..298b0ff46ecb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -239,8 +239,9 @@ struct  intel_engine_cs {
 
 	/* Execlists */
 	spinlock_t execlist_lock;
+	struct drm_i915_gem_request *execlist_port[2];
 	struct list_head execlist_queue;
-	struct list_head execlist_retired_req_list;
+	struct list_head execlist_completed;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 29/70] drm/i915: Move the execlists retirement to the right spot
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (27 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 28/70] drm/i915: Overhaul execlist submission Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 30/70] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
                   ` (30 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

We want to run the execlists retire-ring callback whilst we retire the
requests on a particular ring. Having done so, we know that the per-ring
request list is the superset of all requests and so can simplify the
is-idle check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b64454a64d63..5ab974a19779 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2771,6 +2771,9 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	if (list_empty(&ring->active_list))
 		return;
 
+	if (i915.enable_execlists)
+		intel_execlists_retire_requests(ring);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2826,15 +2829,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
-		if (i915.enable_execlists) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&ring->execlist_lock, flags);
-			idle &= list_empty(&ring->execlist_queue);
-			spin_unlock_irqrestore(&ring->execlist_lock, flags);
-
-			intel_execlists_retire_requests(ring);
-		}
 	}
 
 	if (idle)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 30/70] drm/i915: Map the ringbuffer using WB on LLC machines
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (28 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 29/70] drm/i915: Move the execlists retirement to the right spot Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 31/70] drm/i915: Refactor duplicate object vmap functions Chris Wilson
                   ` (29 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

If we have llc coherency, we can write directly into the ringbuffer
using ordinary cached writes rather than forcing WC access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 60 +++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 93788a36db62..5b837ed842f6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1907,11 +1907,35 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
-	iounmap(ringbuf->virtual_start);
+	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
+		vunmap(ringbuf->virtual_start);
+	else
+		iounmap(ringbuf->virtual_start);
 	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
+static u32 *vmap_obj(struct drm_i915_gem_object *obj)
+{
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+	void *addr;
+	int i;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL)
+		return NULL;
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
+		pages[i++] = sg_page_iter_page(&sg_iter);
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	drm_free_large(pages);
+
+	return addr;
+}
+
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 				     struct intel_ringbuffer *ringbuf)
 {
@@ -1923,17 +1947,31 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 	if (ret)
 		return ret;
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret) {
-		i915_gem_object_ggtt_unpin(obj);
-		return ret;
-	}
+	if (HAS_LLC(dev_priv) && !obj->stolen) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+		if (ret) {
+			i915_gem_object_ggtt_unpin(obj);
+			return ret;
+		}
 
-	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
-			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		i915_gem_object_ggtt_unpin(obj);
-		return -EINVAL;
+		ringbuf->virtual_start = vmap_obj(obj);
+		if (ringbuf->virtual_start == NULL) {
+			i915_gem_object_ggtt_unpin(obj);
+			return -ENOMEM;
+		}
+	} else {
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		if (ret) {
+			i915_gem_object_ggtt_unpin(obj);
+			return ret;
+		}
+
+		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
+						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
+		if (ringbuf->virtual_start == NULL) {
+			i915_gem_object_ggtt_unpin(obj);
+			return -EINVAL;
+		}
 	}
 
 	return 0;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 31/70] drm/i915: Refactor duplicate object vmap functions
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (29 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 30/70] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 32/70] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
                   ` (28 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the vmapping into the
obj->pages lifetime, then we can reuse an obj->vmapping for both and at
the same time couple it into the shrinker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 12 ++++---
 drivers/gpu/drm/i915/i915_gem.c         | 41 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_dmabuf.c  | 55 +++++----------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 53 ++++++++++---------------------
 4 files changed, 72 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b36c97c8c486..600b6d4a0139 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2004,10 +2004,7 @@ struct drm_i915_gem_object {
 		struct scatterlist *sg;
 		int last;
 	} get_page;
-
-	/* prime dma-buf support */
-	void *dma_buf_vmapping;
-	int vmapping_count;
+	void *vmapping;
 
 	/** Breadcrumb of last rendering to the buffer.
 	 * There can only be one writer, but we allow for multiple readers.
@@ -2706,12 +2703,19 @@ static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->pages == NULL);
 	obj->pages_pin_count++;
 }
+
 static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
 {
 	BUG_ON(obj->pages_pin_count == 0);
 	obj->pages_pin_count--;
 }
 
+void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj);
+static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
+{
+	i915_gem_object_unpin_pages(obj);
+}
+
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
 			 struct intel_engine_cs *to);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ab974a19779..1f07cd17be04 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2150,6 +2150,11 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 	ops->put_pages(obj);
 	obj->pages = NULL;
 
+	if (obj->vmapping) {
+		vunmap(obj->vmapping);
+		obj->vmapping = NULL;
+	}
+
 	i915_gem_object_invalidate(obj);
 
 	return 0;
@@ -2309,6 +2314,42 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
+{
+	int ret;
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
+	i915_gem_object_pin_pages(obj);
+
+	if (obj->vmapping == NULL) {
+		struct sg_page_iter sg_iter;
+		struct page **pages;
+		int n;
+
+		n = obj->base.size >> PAGE_SHIFT;
+		pages = kmalloc(n*sizeof(*pages), GFP_TEMPORARY);
+		if (pages == NULL)
+			pages = drm_malloc_ab(n, sizeof(*pages));
+		if (pages != NULL) {
+			n = 0;
+			for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
+				pages[n++] = sg_page_iter_page(&sg_iter);
+
+			obj->vmapping = vmap(pages, n, 0, PAGE_KERNEL);
+			drm_free_large(pages);
+		}
+		if (obj->vmapping == NULL) {
+			i915_gem_object_unpin_pages(obj);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
+	return obj->vmapping;
+}
+
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct intel_engine_cs *ring)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 82a1f4b57778..18bdad6a54d2 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
 {
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
 
-	mutex_lock(&obj->base.dev->struct_mutex);
-
 	dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
 	sg_free_table(sg);
 	kfree(sg);
 
-	i915_gem_object_unpin_pages(obj);
-
+	mutex_lock(&obj->base.dev->struct_mutex);
+	i915_gem_object_unpin_vmap(obj);
 	mutex_unlock(&obj->base.dev->struct_mutex);
 }
 
@@ -110,51 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
 {
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
 	struct drm_device *dev = obj->base.dev;
-	struct sg_page_iter sg_iter;
-	struct page **pages;
-	int ret, i;
+	void *addr;
+	int ret;
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ERR_PTR(ret);
 
-	if (obj->dma_buf_vmapping) {
-		obj->vmapping_count++;
-		goto out_unlock;
-	}
-
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		goto err;
-
-	i915_gem_object_pin_pages(obj);
-
-	ret = -ENOMEM;
-
-	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
-	if (pages == NULL)
-		goto err_unpin;
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
-
-	obj->dma_buf_vmapping = vmap(pages, i, 0, PAGE_KERNEL);
-	drm_free_large(pages);
-
-	if (!obj->dma_buf_vmapping)
-		goto err_unpin;
-
-	obj->vmapping_count = 1;
-out_unlock:
+	addr = i915_gem_object_pin_vmap(obj);
 	mutex_unlock(&dev->struct_mutex);
-	return obj->dma_buf_vmapping;
 
-err_unpin:
-	i915_gem_object_unpin_pages(obj);
-err:
-	mutex_unlock(&dev->struct_mutex);
-	return ERR_PTR(ret);
+	return addr;
 }
 
 static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
@@ -163,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
 	struct drm_device *dev = obj->base.dev;
 
 	mutex_lock(&dev->struct_mutex);
-	if (--obj->vmapping_count == 0) {
-		vunmap(obj->dma_buf_vmapping);
-		obj->dma_buf_vmapping = NULL;
-
-		i915_gem_object_unpin_pages(obj);
-	}
+	i915_gem_object_unpin_pages(obj);
 	mutex_unlock(&dev->struct_mutex);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5b837ed842f6..99a1fdff4924 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1908,34 +1908,12 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
 	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
-		vunmap(ringbuf->virtual_start);
+		i915_gem_object_unpin_vmap(ringbuf->obj);
 	else
 		iounmap(ringbuf->virtual_start);
-	ringbuf->virtual_start = NULL;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
-static u32 *vmap_obj(struct drm_i915_gem_object *obj)
-{
-	struct sg_page_iter sg_iter;
-	struct page **pages;
-	void *addr;
-	int i;
-
-	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
-	if (pages == NULL)
-		return NULL;
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	drm_free_large(pages);
-
-	return addr;
-}
-
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 				     struct intel_ringbuffer *ringbuf)
 {
@@ -1949,32 +1927,33 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 
 	if (HAS_LLC(dev_priv) && !obj->stolen) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret) {
-			i915_gem_object_ggtt_unpin(obj);
-			return ret;
-		}
+		if (ret)
+			goto unpin;
 
-		ringbuf->virtual_start = vmap_obj(obj);
-		if (ringbuf->virtual_start == NULL) {
-			i915_gem_object_ggtt_unpin(obj);
-			return -ENOMEM;
+		ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
+		if (IS_ERR(ringbuf->virtual_start)) {
+			ret = PTR_ERR(ringbuf->virtual_start);
+			ringbuf->virtual_start = NULL;
+			goto unpin;
 		}
 	} else {
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
-		if (ret) {
-			i915_gem_object_ggtt_unpin(obj);
-			return ret;
-		}
+		if (ret)
+			goto unpin;
 
 		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
 						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
 		if (ringbuf->virtual_start == NULL) {
-			i915_gem_object_ggtt_unpin(obj);
-			return -EINVAL;
+			ret = -ENOMEM;
+			goto unpin;
 		}
 	}
 
 	return 0;
+
+unpin:
+	i915_gem_object_ggtt_unpin(obj);
+	return ret;
 }
 
 void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 32/70] drm/i915: Treat ringbuffer writes as write to normal memory
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (30 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 31/70] drm/i915: Refactor duplicate object vmap functions Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:20 ` [PATCH 33/70] drm/i915: Use a separate slab for requests Chris Wilson
                   ` (27 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

The hardware is documentated as treating the TAIL register update as
serialising, so we can relax the barriers when filling the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.h        |  6 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ++++++++++++-----
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 0790e4b26b13..16c717672020 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -52,8 +52,9 @@ int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf,
  */
 static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 {
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ringbuf);
 }
+
 /**
  * intel_logical_ring_emit() - write a DWORD to the ringbuffer.
  * @ringbuf: Ringbuffer to write to.
@@ -62,8 +63,7 @@ static inline void intel_logical_ring_advance(struct intel_ringbuffer *ringbuf)
 static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 					   u32 data)
 {
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ringbuf, data);
 }
 
 /* Logical Ring Contexts */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 298b0ff46ecb..0899123c6bcc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -404,17 +404,24 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request);
 
 int __must_check intel_ring_begin(struct intel_engine_cs *ring, int n);
 int __must_check intel_ring_cacheline_align(struct intel_engine_cs *ring);
+static inline void intel_ringbuffer_emit(struct intel_ringbuffer *rb,
+					 u32 data)
+{
+	*(uint32_t __force *)(rb->virtual_start + rb->tail) = data;
+	rb->tail += 4;
+}
+static inline void intel_ringbuffer_advance(struct intel_ringbuffer *rb)
+{
+	rb->tail &= rb->size - 1;
+}
 static inline void intel_ring_emit(struct intel_engine_cs *ring,
 				   u32 data)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	iowrite32(data, ringbuf->virtual_start + ringbuf->tail);
-	ringbuf->tail += 4;
+	intel_ringbuffer_emit(ring->buffer, data);
 }
 static inline void intel_ring_advance(struct intel_engine_cs *ring)
 {
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	ringbuf->tail &= ringbuf->size - 1;
+	intel_ringbuffer_advance(ring->buffer);
 }
 int __intel_ring_space(int head, int tail, int size);
 void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 33/70] drm/i915: Use a separate slab for requests
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (31 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 32/70] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-05-22 14:48   ` Robert Beckett
  2015-04-07 15:20 ` [PATCH 34/70] drm/i915: Use a separate slab for vmas Chris Wilson
                   ` (26 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c         | 12 ++++++----
 drivers/gpu/drm/i915/i915_drv.h         |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c         | 41 +++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 -
 4 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 7b0109e2ab23..135fbcad367f 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1010,8 +1010,10 @@ out_regs:
 put_bridge:
 	pci_dev_put(dev_priv->bridge_dev);
 free_priv:
-	if (dev_priv->slab)
-		kmem_cache_destroy(dev_priv->slab);
+	if (dev_priv->requests)
+		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->objects)
+		kmem_cache_destroy(dev_priv->objects);
 	kfree(dev_priv);
 	return ret;
 }
@@ -1094,8 +1096,10 @@ int i915_driver_unload(struct drm_device *dev)
 	if (dev_priv->regs != NULL)
 		pci_iounmap(dev->pdev, dev_priv->regs);
 
-	if (dev_priv->slab)
-		kmem_cache_destroy(dev_priv->slab);
+	if (dev_priv->requests)
+		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->objects)
+		kmem_cache_destroy(dev_priv->objects);
 
 	pci_dev_put(dev_priv->bridge_dev);
 	kfree(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 600b6d4a0139..ad08aa532456 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1578,7 +1578,8 @@ struct i915_virtual_gpu {
 
 struct drm_i915_private {
 	struct drm_device *dev;
-	struct kmem_cache *slab;
+	struct kmem_cache *objects;
+	struct kmem_cache *requests;
 
 	const struct intel_device_info info;
 
@@ -2070,6 +2071,7 @@ struct drm_i915_gem_request {
 	struct kref ref;
 
 	/** On Which ring this request was generated */
+	struct drm_i915_private *i915;
 	struct intel_engine_cs *ring;
 
 	/** GEM sequence number associated with this request. */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f07cd17be04..a4a62592f0f8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -381,13 +381,13 @@ out:
 void *i915_gem_object_alloc(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	return kmem_cache_zalloc(dev_priv->slab, GFP_KERNEL);
+	return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
 }
 
 void i915_gem_object_free(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	kmem_cache_free(dev_priv->slab, obj);
+	kmem_cache_free(dev_priv->objects, obj);
 }
 
 static int
@@ -2633,43 +2633,45 @@ void i915_gem_request_free(struct kref *req_ref)
 		i915_gem_context_unreference(ctx);
 	}
 
-	kfree(req);
+	kmem_cache_free(req->i915->requests, req);
 }
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx)
 {
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	struct drm_i915_gem_request *rq;
 	int ret;
-	struct drm_i915_gem_request *request;
-	struct drm_i915_private *dev_private = ring->dev->dev_private;
 
 	if (ring->outstanding_lazy_request)
 		return 0;
 
-	request = kzalloc(sizeof(*request), GFP_KERNEL);
-	if (request == NULL)
+	rq = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
+	if (rq == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
+	kref_init(&rq->ref);
+	rq->i915 = dev_priv;
+
+	ret = i915_gem_get_seqno(ring->dev, &rq->seqno);
 	if (ret) {
-		kfree(request);
+		kfree(rq);
 		return ret;
 	}
 
-	kref_init(&request->ref);
-	request->ring = ring;
-	request->uniq = dev_private->request_uniq++;
+	rq->ring = ring;
+	rq->uniq = dev_priv->request_uniq++;
 
 	if (i915.enable_execlists)
-		ret = intel_logical_ring_alloc_request_extras(request, ctx);
+		ret = intel_logical_ring_alloc_request_extras(rq, ctx);
 	else
-		ret = intel_ring_alloc_request_extras(request);
+		ret = intel_ring_alloc_request_extras(rq);
 	if (ret) {
-		kfree(request);
+		kfree(rq);
 		return ret;
 	}
 
-	ring->outstanding_lazy_request = request;
+	ring->outstanding_lazy_request = rq;
 	return 0;
 }
 
@@ -5204,11 +5206,16 @@ i915_gem_load(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
-	dev_priv->slab =
+	dev_priv->objects =
 		kmem_cache_create("i915_gem_object",
 				  sizeof(struct drm_i915_gem_object), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
+	dev_priv->requests =
+		kmem_cache_create("i915_gem_request",
+				  sizeof(struct drm_i915_gem_request), 0,
+				  SLAB_HWCACHE_ALIGN,
+				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->vm_list);
 	i915_init_vm(dev_priv, &dev_priv->gtt.base);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 99a1fdff4924..bf7837d30388 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2162,7 +2162,6 @@ int intel_ring_idle(struct intel_engine_cs *ring)
 int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 {
 	request->ringbuf = request->ring->buffer;
-
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 34/70] drm/i915: Use a separate slab for vmas
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (32 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 33/70] drm/i915: Use a separate slab for requests Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-10  8:32   ` Daniel Vetter
  2015-04-07 15:20 ` [PATCH 35/70] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
                   ` (25 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c     | 4 ++++
 drivers/gpu/drm/i915/i915_drv.h     | 1 +
 drivers/gpu/drm/i915/i915_gem.c     | 7 ++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 135fbcad367f..9cbc04df94fb 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1012,6 +1012,8 @@ put_bridge:
 free_priv:
 	if (dev_priv->requests)
 		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->vmas)
+		kmem_cache_destroy(dev_priv->vmas);
 	if (dev_priv->objects)
 		kmem_cache_destroy(dev_priv->objects);
 	kfree(dev_priv);
@@ -1098,6 +1100,8 @@ int i915_driver_unload(struct drm_device *dev)
 
 	if (dev_priv->requests)
 		kmem_cache_destroy(dev_priv->requests);
+	if (dev_priv->vmas)
+		kmem_cache_destroy(dev_priv->vmas);
 	if (dev_priv->objects)
 		kmem_cache_destroy(dev_priv->objects);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ad08aa532456..2ca11208983e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1579,6 +1579,7 @@ struct i915_virtual_gpu {
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
+	struct kmem_cache *vmas;
 	struct kmem_cache *requests;
 
 	const struct intel_device_info info;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a4a62592f0f8..05d7431db4ab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4832,7 +4832,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
 
 	list_del(&vma->vma_link);
 
-	kfree(vma);
+	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
 
 static void
@@ -5211,6 +5211,11 @@ i915_gem_load(struct drm_device *dev)
 				  sizeof(struct drm_i915_gem_object), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
+	dev_priv->vmas =
+		kmem_cache_create("i915_gem_vma",
+				  sizeof(struct i915_vma), 0,
+				  SLAB_HWCACHE_ALIGN,
+				  NULL);
 	dev_priv->requests =
 		kmem_cache_create("i915_gem_request",
 				  sizeof(struct drm_i915_gem_request), 0,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f48d8454f0ef..a9f24236efd9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2542,7 +2542,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
-	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
+
+	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 35/70] drm/i915: Use the new rq->i915 field where appropriate
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (33 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 34/70] drm/i915: Use a separate slab for vmas Chris Wilson
@ 2015-04-07 15:20 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 36/70] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
                   ` (24 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:20 UTC (permalink / raw)
  To: intel-gfx

In a few cases, having a direct pointer to the drm_i915_private from the
request is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 ++++-------
 drivers/gpu/drm/i915/intel_pm.c |  2 +-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05d7431db4ab..796dc69a6c47 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1227,8 +1227,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			struct drm_i915_file_private *file_priv)
 {
 	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_private *dev_priv = req->i915;
 	const bool irq_test_in_progress =
 		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
 	DEFINE_WAIT(wait);
@@ -1247,7 +1246,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	timeout_expire = timeout ?
 		jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
 
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev_priv)->gen >= 6)
 		gen6_rps_boost(dev_priv, file_priv);
 
 	/* Record current time in case interrupted by signal, or wedged */
@@ -1404,18 +1403,16 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *rq)
 int
 i915_wait_request(struct drm_i915_gem_request *req)
 {
-	struct drm_device *dev;
 	struct drm_i915_private *dev_priv;
 	bool interruptible;
 	int ret;
 
 	BUG_ON(req == NULL);
 
-	dev = req->ring->dev;
-	dev_priv = dev->dev_private;
+	dev_priv = req->i915;
 	interruptible = dev_priv->mm.interruptible;
 
-	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 17092897c728..a48c65cffb97 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6793,7 +6793,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 
 	if (!i915_gem_request_completed(boost->rq, true))
-		gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
+		gen6_rps_boost(boost->rq->i915, NULL);
 
 	i915_gem_request_unreference__unlocked(boost->rq);
 	kfree(boost);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 36/70] drm/i915: Reduce the pointer dance of i915_is_ggtt()
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (34 preceding siblings ...)
  2015-04-07 15:20 ` [PATCH 35/70] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 37/70] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
                   ` (23 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     | 4 +---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 ++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2ca11208983e..2a5343a9ed24 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2898,9 +2898,7 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
 static inline bool i915_is_ggtt(struct i915_address_space *vm)
 {
-	struct i915_address_space *ggtt =
-		&((struct drm_i915_private *)(vm)->dev->dev_private)->gtt.base;
-	return vm == ggtt;
+	return vm->is_ggtt;
 }
 
 static inline struct i915_hw_ppgtt *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a9f24236efd9..df1ee971138e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2511,6 +2511,7 @@ int i915_gem_gtt_init(struct drm_device *dev)
 		return ret;
 
 	gtt->base.dev = dev;
+	gtt->base.is_ggtt = true;
 
 	/* GMADR is the PCI mmio aperture into the global GTT. */
 	DRM_INFO("Memory usable by graphics device = %zdM\n",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fc03c99317c9..db9ec04d312c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -235,6 +235,8 @@ struct i915_address_space {
 	unsigned long start;		/* Start offset always 0 for dri2 */
 	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
 
+	bool is_ggtt;
+
 	struct {
 		dma_addr_t addr;
 		struct page *page;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 37/70] drm/i915: Squash more pointer indirection for i915_is_gtt
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (35 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 36/70] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 38/70] drm/i915: Reduce locking in execlist command submission Chris Wilson
                   ` (22 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

12:58 < jlahtine> there're actually equally many i915_is_ggtt(vma->vm)
calls
12:58 < jlahtine> (one less)
12:59 < jlahtine> so while at it I'd make it vm->is_ggtt and
vma->is_ggtt
12:59 < jlahtine> then get rid of the whole helper, maybe
13:00 < ickle> you preempted my beautiful macro
13:03 < ickle> just don't complain about the increased churn

* to be squashed into the previous patch if desired
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h            |  7 +------
 drivers/gpu/drm/i915/i915_gem.c            | 32 ++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 21 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c      |  2 +-
 drivers/gpu/drm/i915/i915_trace.h          | 18 ++++++-----------
 8 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6c147e1bff0c..2e851c6a310c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -156,7 +156,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!i915_is_ggtt(vma->vm))
+		if (!vma->is_ggtt)
 			seq_puts(m, " (pp");
 		else
 			seq_puts(m, " (g");
@@ -335,7 +335,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			if (i915_is_ggtt(vma->vm)) {
+			if (vma->is_ggtt) {
 				stats->global += obj->base.size;
 				continue;
 			}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2a5343a9ed24..0dbc7d69f148 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2896,16 +2896,11 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 /* Some GGTT VM helpers */
 #define i915_obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
-static inline bool i915_is_ggtt(struct i915_address_space *vm)
-{
-	return vm->is_ggtt;
-}
 
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
-	WARN_ON(i915_is_ggtt(vm));
-
+	WARN_ON(vm->is_ggtt);
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 796dc69a6c47..36add864593a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3200,8 +3200,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	 * cause memory corruption through use-after-free.
 	 */
 
-	if (i915_is_ggtt(vma->vm) &&
-	    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
+	if (vma->is_ggtt && vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 		i915_gem_object_finish_gtt(obj);
 
 		/* release the fence reg _after_ flushing */
@@ -3215,7 +3214,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	vma->unbind_vma(vma);
 
 	list_del_init(&vma->mm_list);
-	if (i915_is_ggtt(vma->vm)) {
+	if (vma->is_ggtt) {
 		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) {
 			obj->map_and_fenceable = false;
 		} else if (vma->ggtt_view.pages) {
@@ -3658,7 +3657,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	if(WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
 
 	fence_size = i915_gem_get_gtt_size(dev,
@@ -3756,8 +3755,7 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
-		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
-				VM_TO_TRACE_NAME(vma->vm));
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size);
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
@@ -4360,13 +4358,13 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
 		return -ENODEV;
 
-	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
+	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !vm->is_ggtt))
 		return -EINVAL;
 
 	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
 		return -EINVAL;
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return -EINVAL;
 
 	vma = ggtt_view ? i915_gem_obj_to_ggtt_view(obj, ggtt_view) :
@@ -4456,7 +4454,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    uint64_t flags)
 {
 	return i915_gem_object_do_pin(obj, vm,
-				      i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL,
+				      vm->is_ggtt ? &i915_ggtt_view_normal : NULL,
 				      size, alignment, flags);
 }
 
@@ -4788,7 +4786,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -4824,7 +4822,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
 
 	vm = vma->vm;
 
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
 
 	list_del(&vma->vma_link);
@@ -5188,7 +5186,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 void i915_init_vm(struct drm_i915_private *dev_priv,
 		  struct i915_address_space *vm)
 {
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		drm_mm_init(&vm->mm, vm->start, vm->total);
 	vm->dev = dev_priv->dev;
 	INIT_LIST_HEAD(&vm->active_list);
@@ -5353,7 +5351,7 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -5361,7 +5359,7 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	}
 
 	WARN(1, "%s vma for this object not found.\n",
-	     i915_is_ggtt(vm) ? "global" : "ppgtt");
+	     vm->is_ggtt ? "global" : "ppgtt");
 	return -1;
 }
 
@@ -5387,7 +5385,7 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
@@ -5434,7 +5432,7 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 	BUG_ON(list_empty(&o->vma_list));
 
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->vm == vm)
@@ -5447,7 +5445,7 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 {
 	struct i915_vma *vma;
 	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (i915_is_ggtt(vma->vm) &&
+		if (vma->is_ggtt &&
 		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
 			continue;
 		if (vma->pin_count > 0)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1eda0bdc5eab..5f735b491e2f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -642,7 +642,7 @@ need_reloc_mappable(struct i915_vma *vma)
 	if (entry->relocation_count == 0)
 		return false;
 
-	if (!i915_is_ggtt(vma->vm))
+	if (!vma->is_ggtt)
 		return false;
 
 	/* See also use_cpu_reloc() */
@@ -661,8 +661,7 @@ eb_vma_misplaced(struct i915_vma *vma)
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
-	       !i915_is_ggtt(vma->vm));
+	WARN_ON(entry->flags & __EXEC_OBJECT_NEEDS_MAP && !vma->is_ggtt);
 
 	if (entry->alignment &&
 	    vma->node.start & (entry->alignment - 1))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index df1ee971138e..85077beb9338 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1703,7 +1703,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 					container_of(vm, struct i915_hw_ppgtt,
 						     base);
 
-			if (i915_is_ggtt(vm))
+			if (vm->is_ggtt)
 				ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 			gen6_write_page_range(dev_priv, &ppgtt->pd,
@@ -1881,7 +1881,7 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
 		AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
 
-	BUG_ON(!i915_is_ggtt(vma->vm));
+	BUG_ON(!vma->is_ggtt);
 	intel_gtt_insert_sg_entries(vma->ggtt_view.pages, entry, flags);
 	vma->bound = GLOBAL_BIND;
 }
@@ -1901,7 +1901,7 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	const unsigned int first = vma->node.start >> PAGE_SHIFT;
 	const unsigned int size = vma->obj->base.size >> PAGE_SHIFT;
 
-	BUG_ON(!i915_is_ggtt(vma->vm));
+	BUG_ON(!vma->is_ggtt);
 	vma->bound = 0;
 	intel_gtt_clear_range(first, size);
 }
@@ -1919,7 +1919,7 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	if (obj->gt_ro)
 		flags |= PTE_READ_ONLY;
 
-	if (i915_is_ggtt(vma->vm))
+	if (vma->is_ggtt)
 		pages = vma->ggtt_view.pages;
 
 	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
@@ -2541,7 +2541,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
+	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
 		return ERR_PTR(-EINVAL);
 
 	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
@@ -2553,9 +2553,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
+	vma->is_ggtt = vm->is_ggtt;
 
 	if (INTEL_INFO(vm->dev)->gen >= 6) {
-		if (i915_is_ggtt(vm)) {
+		if (vm->is_ggtt) {
 			vma->ggtt_view = *ggtt_view;
 
 			vma->unbind_vma = ggtt_unbind_vma;
@@ -2565,14 +2566,14 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 			vma->bind_vma = ppgtt_bind_vma;
 		}
 	} else {
-		BUG_ON(!i915_is_ggtt(vm));
+		BUG_ON(!vm->is_ggtt);
 		vma->ggtt_view = *ggtt_view;
 		vma->unbind_vma = i915_ggtt_unbind_vma;
 		vma->bind_vma = i915_ggtt_bind_vma;
 	}
 
 	list_add_tail(&vma->vma_link, &obj->vma_list);
-	if (!i915_is_ggtt(vm))
+	if (!vm->is_ggtt)
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
 	return vma;
@@ -2587,7 +2588,7 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 	vma = i915_gem_obj_to_vma(obj, vm);
 	if (!vma)
 		vma = __i915_gem_vma_create(obj, vm,
-					    i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL);
+					    vm->is_ggtt ? &i915_ggtt_view_normal : NULL);
 
 	return vma;
 }
@@ -2758,7 +2759,7 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
 int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		  u32 flags)
 {
-	if (i915_is_ggtt(vma->vm)) {
+	if (vma->is_ggtt) {
 		int ret = i915_get_ggtt_vma_pages(vma);
 
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index db9ec04d312c..4e6cdaba2569 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -160,6 +160,7 @@ struct i915_vma {
 #define LOCAL_BIND	(1<<1)
 #define PTE_READ_ONLY	(1<<2)
 	unsigned int bound : 4;
+	unsigned is_ggtt : 1;
 
 	/**
 	 * Support different GGTT views into the same object.
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 17dc2fcaba10..8832f1b2a495 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -606,7 +606,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 		dst->gtt_offset = -1;
 
 	reloc_offset = dst->gtt_offset;
-	if (i915_is_ggtt(vm))
+	if (vm->is_ggtt)
 		vma = i915_gem_obj_to_ggtt(src);
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   vma && (vma->bound & GLOBAL_BIND) &&
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97483e21c9b4..ce8ee9e8bced 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,35 +156,29 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-#define VM_TO_TRACE_NAME(vm) \
-	(i915_is_ggtt(vm) ? "G" : \
-		      "P")
-
 DECLARE_EVENT_CLASS(i915_va,
-	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	TP_ARGS(vm, start, length, name),
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	TP_ARGS(vm, start, length),
 
 	TP_STRUCT__entry(
 		__field(struct i915_address_space *, vm)
 		__field(u64, start)
 		__field(u64, end)
-		__string(name, name)
 	),
 
 	TP_fast_assign(
 		__entry->vm = vm;
 		__entry->start = start;
 		__entry->end = start + length - 1;
-		__assign_str(name, name);
 	),
 
-	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
-		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+	TP_printk("vm=%p (%c), 0x%llx-0x%llx",
+		  __entry->vm, __entry->vm->is_ggtt ? 'G' : 'P',  __entry->start, __entry->end)
 );
 
 DEFINE_EVENT(i915_va, i915_va_alloc,
-	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
-	     TP_ARGS(vm, start, length, name)
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
 );
 
 DECLARE_EVENT_CLASS(i915_page_table_entry,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 38/70] drm/i915: Reduce locking in execlist command submission
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (36 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 37/70] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 39/70] drm/i915: Reduce more " Chris Wilson
                   ` (21 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This eliminates six needless spin lock/unlock pairs when writing out
ELSP.

v2: Respin with my preferred colour.
v3: Mostly back to the original colour

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v1]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h     | 18 +++++++
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++---
 drivers/gpu/drm/i915/intel_uncore.c | 98 ++++++++++++++++++++++++++++---------
 3 files changed, 103 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0dbc7d69f148..7581c0f5908d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2551,6 +2551,13 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 				enum forcewake_domains domains);
 void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 				enum forcewake_domains domains);
+/* Like above but the caller must manage the uncore.lock itself.
+ * Must be used with I915_READ_FW and friends.
+ */
+void intel_uncore_forcewake_get__locked(struct drm_i915_private *dev_priv,
+					enum forcewake_domains domains);
+void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
+					enum forcewake_domains domains);
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv);
 static inline bool intel_vgpu_active(struct drm_device *dev)
 {
@@ -3249,6 +3256,17 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
 #define POSTING_READ(reg)	(void)I915_READ_NOTRACE(reg)
 #define POSTING_READ16(reg)	(void)I915_READ16_NOTRACE(reg)
 
+/* These are untraced mmio-accessors that are only valid to be used inside
+ * criticial sections inside IRQ handlers where forcewake is explicitly
+ * controlled.
+ * Think twice, and think again, before using these.
+ * Note: Should only be used between intel_uncore_forcewake_irqlock() and
+ * intel_uncore_forcewake_irqunlock().
+ */
+#define I915_READ_FW(reg__) readl(dev_priv->regs + (reg__))
+#define I915_WRITE_FW(reg__, val__) writel(val__, dev_priv->regs + (reg__))
+#define POSTING_READ_FW(reg__) (void)I915_READ_FW(reg__)
+
 /* "Broadcast RGB" property */
 #define INTEL_BROADCAST_RGB_AUTO 0
 #define INTEL_BROADCAST_RGB_FULL 1
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 24c367f9fddf..08e35003c4f2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -278,17 +278,19 @@ static void execlists_submit_pair(struct intel_engine_cs *ring)
 	desc[3] = ring->execlist_port[0]->seqno;
 
 	/* Note: You must always write both descriptors in the order below. */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-	I915_WRITE(RING_ELSP(ring), desc[1]);
-	I915_WRITE(RING_ELSP(ring), desc[0]);
-	I915_WRITE(RING_ELSP(ring), desc[3]);
+	spin_lock(&dev_priv->uncore.lock);
+	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
+	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
 
 	/* The context is automatically loaded after the following */
-	I915_WRITE(RING_ELSP(ring), desc[2]);
+	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
 
 	/* ELSP is a wo register, use another nearby reg for posting instead */
-	POSTING_READ(RING_EXECLIST_STATUS(ring));
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	POSTING_READ_FW(RING_EXECLIST_STATUS(ring));
+	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
+	spin_unlock(&dev_priv->uncore.lock);
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 0e32bbbcada8..20cc325d6225 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -399,6 +399,26 @@ void intel_uncore_sanitize(struct drm_device *dev)
 	intel_disable_gt_powersave(dev);
 }
 
+static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
+					 enum forcewake_domains fw_domains)
+{
+	struct intel_uncore_forcewake_domain *domain;
+	enum forcewake_domain_id id;
+
+	if (!dev_priv->uncore.funcs.force_wake_get)
+		return;
+
+	fw_domains &= dev_priv->uncore.fw_domains;
+
+	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
+		if (domain->wake_count++)
+			fw_domains &= ~(1 << id);
+	}
+
+	if (fw_domains)
+		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
+}
+
 /**
  * intel_uncore_forcewake_get - grab forcewake domain references
  * @dev_priv: i915 device instance
@@ -416,41 +436,39 @@ void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
 				enum forcewake_domains fw_domains)
 {
 	unsigned long irqflags;
-	struct intel_uncore_forcewake_domain *domain;
-	enum forcewake_domain_id id;
 
 	if (!dev_priv->uncore.funcs.force_wake_get)
 		return;
 
 	WARN_ON(dev_priv->pm.suspended);
 
-	fw_domains &= dev_priv->uncore.fw_domains;
-
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
-
-	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
-		if (domain->wake_count++)
-			fw_domains &= ~(1 << id);
-	}
-
-	if (fw_domains)
-		dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_domains);
-
+	__intel_uncore_forcewake_get(dev_priv, fw_domains);
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
 /**
- * intel_uncore_forcewake_put - release a forcewake domain reference
+ * intel_uncore_forcewake_get__locked - grab forcewake domain references
  * @dev_priv: i915 device instance
- * @fw_domains: forcewake domains to put references
+ * @fw_domains: forcewake domains to get reference on
  *
- * This function drops the device-level forcewakes for specified
- * domains obtained by intel_uncore_forcewake_get().
+ * See intel_uncore_forcewake_get(). This variant places the onus
+ * on the caller to explicitly handle the dev_priv->uncore.lock spinlock.
  */
-void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
-				enum forcewake_domains fw_domains)
+void intel_uncore_forcewake_get__locked(struct drm_i915_private *dev_priv,
+					enum forcewake_domains fw_domains)
+{
+	assert_spin_locked(&dev_priv->uncore.lock);
+
+	if (!dev_priv->uncore.funcs.force_wake_get)
+		return;
+
+	__intel_uncore_forcewake_get(dev_priv, fw_domains);
+}
+
+static void __intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
+					 enum forcewake_domains fw_domains)
 {
-	unsigned long irqflags;
 	struct intel_uncore_forcewake_domain *domain;
 	enum forcewake_domain_id id;
 
@@ -459,8 +477,6 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 
 	fw_domains &= dev_priv->uncore.fw_domains;
 
-	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
-
 	for_each_fw_domain_mask(domain, fw_domains, dev_priv, id) {
 		if (WARN_ON(domain->wake_count == 0))
 			continue;
@@ -471,10 +487,48 @@ void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
 		domain->wake_count++;
 		fw_domain_arm_timer(domain);
 	}
+}
 
+/**
+ * intel_uncore_forcewake_put - release a forcewake domain reference
+ * @dev_priv: i915 device instance
+ * @fw_domains: forcewake domains to put references
+ *
+ * This function drops the device-level forcewakes for specified
+ * domains obtained by intel_uncore_forcewake_get().
+ */
+void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
+				enum forcewake_domains fw_domains)
+{
+	unsigned long irqflags;
+
+	if (!dev_priv->uncore.funcs.force_wake_put)
+		return;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	__intel_uncore_forcewake_put(dev_priv, fw_domains);
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
+/**
+ * intel_uncore_forcewake_put__locked - grab forcewake domain references
+ * @dev_priv: i915 device instance
+ * @fw_domains: forcewake domains to get reference on
+ *
+ * See intel_uncore_forcewake_put(). This variant places the onus
+ * on the caller to explicitly handle the dev_priv->uncore.lock spinlock.
+ */
+void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
+					enum forcewake_domains fw_domains)
+{
+	assert_spin_locked(&dev_priv->uncore.lock);
+
+	if (!dev_priv->uncore.funcs.force_wake_put)
+		return;
+
+	__intel_uncore_forcewake_put(dev_priv, fw_domains);
+}
+
 void assert_forcewakes_inactive(struct drm_i915_private *dev_priv)
 {
 	struct intel_uncore_forcewake_domain *domain;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 39/70] drm/i915: Reduce more locking in execlist command submission
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (37 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 38/70] drm/i915: Reduce locking in execlist command submission Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 40/70] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
                   ` (20 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Slightly more extravagant than the previous patch is to use the
I915_READ_FW() registers for all the bounded reads in
intel_lrc_irq_handler - for even more spinlock reduction.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 08e35003c4f2..27942f61d6fe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -278,19 +278,12 @@ static void execlists_submit_pair(struct intel_engine_cs *ring)
 	desc[3] = ring->execlist_port[0]->seqno;
 
 	/* Note: You must always write both descriptors in the order below. */
-	spin_lock(&dev_priv->uncore.lock);
-	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
 	I915_WRITE_FW(RING_ELSP(ring), desc[1]);
 	I915_WRITE_FW(RING_ELSP(ring), desc[0]);
 	I915_WRITE_FW(RING_ELSP(ring), desc[3]);
 
 	/* The context is automatically loaded after the following */
 	I915_WRITE_FW(RING_ELSP(ring), desc[2]);
-
-	/* ELSP is a wo register, use another nearby reg for posting instead */
-	POSTING_READ_FW(RING_EXECLIST_STATUS(ring));
-	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
-	spin_unlock(&dev_priv->uncore.lock);
 }
 
 static void execlists_context_unqueue(struct intel_engine_cs *ring)
@@ -379,31 +372,37 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 	u8 head, tail;
 	u32 seqno = 0;
 
+	spin_lock(&ring->execlist_lock);
+	spin_lock(&dev_priv->uncore.lock);
+	intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
+
 	head = ring->next_context_status_buffer;
-	tail = I915_READ(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
+	tail = I915_READ_FW(RING_CONTEXT_STATUS_PTR(ring)) & 0x7;
 	if (head > tail)
 		tail += 6;
 
 	while (head++ < tail) {
 		u32 reg = RING_CONTEXT_STATUS_BUF(ring) + (head % 6)*8;
-		u32 status = I915_READ(reg);
+		u32 status = I915_READ_FW(reg);
 		if (unlikely(status & GEN8_CTX_STATUS_PREEMPTED && 0)) {
 			DRM_ERROR("Pre-empted request %x %s Lite Restore\n",
-				  I915_READ(reg + 4),
+				  I915_READ_FW(reg + 4),
 				  status & GEN8_CTX_STATUS_LITE_RESTORE ? "with" : "without");
 		}
 		if (status & (GEN8_CTX_STATUS_ACTIVE_IDLE |
 			      GEN8_CTX_STATUS_ELEMENT_SWITCH))
-			seqno = I915_READ(reg + 4);
+			seqno = I915_READ_FW(reg + 4);
 	}
 
 	ring->next_context_status_buffer = tail % 6;
-	I915_WRITE(RING_CONTEXT_STATUS_PTR(ring),
-		   (u32)ring->next_context_status_buffer << 8);
+	I915_WRITE_FW(RING_CONTEXT_STATUS_PTR(ring),
+		      (u32)ring->next_context_status_buffer << 8);
 
-	spin_lock(&ring->execlist_lock);
 	if (execlists_complete_requests(ring, seqno))
 		execlists_context_unqueue(ring);
+
+	intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
+	spin_unlock(&dev_priv->uncore.lock);
 	spin_unlock(&ring->execlist_lock);
 }
 
@@ -427,8 +426,16 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
 	if (ring->execlist_port[0] == NULL) {
+		struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+		spin_lock(&dev_priv->uncore.lock);
+		intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
+
 		ring->execlist_port[0] = request;
 		execlists_submit_pair(ring);
+
+		intel_uncore_forcewake_put__locked(dev_priv, FORCEWAKE_ALL);
+		spin_unlock(&dev_priv->uncore.lock);
 	}
 
 	spin_unlock_irq(&ring->execlist_lock);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 40/70] drm/i915: Reduce locking in gen8 IRQ handler
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (38 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 39/70] drm/i915: Reduce more " Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 41/70] drm/i915: Tidy " Chris Wilson
                   ` (19 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Similar in vain in reducing the number of unrequired spinlocks used for
execlist command submission (where the forcewake is required but
manually controlled), we know that the IRQ registers are outside of the
powerwell and so we can access them directly. Since we now have direct
access exported via I915_READ_FW/I915_WRITE_FW, lets put those to use in
the irq handlers as well.

In the process, reorder the execlist submission to happen as early as
possible.

v2: Restrict the untraced register mmio to just the GT path (i.e. the
hotpath for execlists)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 47 ++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8b5e0358c592..c2c80bf490c6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1285,56 +1285,56 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	irqreturn_t ret = IRQ_NONE;
 
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
-		tmp = I915_READ(GEN8_GT_IIR(0));
+		tmp = I915_READ_FW(GEN8_GT_IIR(0));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(0), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
 
 			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
 			ring = &dev_priv->ring[RCS];
-			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (rcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 
 			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
 			ring = &dev_priv->ring[BCS];
-			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (bcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
 
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
-		tmp = I915_READ(GEN8_GT_IIR(1));
+		tmp = I915_READ_FW(GEN8_GT_IIR(1));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(1), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
 
 			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
 			ring = &dev_priv->ring[VCS];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 
 			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
 			ring = &dev_priv->ring[VCS2];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
 
 	if (master_ctl & GEN8_GT_PM_IRQ) {
-		tmp = I915_READ(GEN8_GT_IIR(2));
+		tmp = I915_READ_FW(GEN8_GT_IIR(2));
 		if (tmp & dev_priv->pm_rps_events) {
-			I915_WRITE(GEN8_GT_IIR(2),
-				   tmp & dev_priv->pm_rps_events);
+			I915_WRITE_FW(GEN8_GT_IIR(2),
+				      tmp & dev_priv->pm_rps_events);
 			ret = IRQ_HANDLED;
 			gen6_rps_irq_handler(dev_priv, tmp);
 		} else
@@ -1342,17 +1342,17 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 	}
 
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
-		tmp = I915_READ(GEN8_GT_IIR(3));
+		tmp = I915_READ_FW(GEN8_GT_IIR(3));
 		if (tmp) {
-			I915_WRITE(GEN8_GT_IIR(3), tmp);
+			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
 			ret = IRQ_HANDLED;
 
 			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
 			ring = &dev_priv->ring[VECS];
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
 			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
 				intel_lrc_irq_handler(ring);
+			if (vcs & GT_RENDER_USER_INTERRUPT)
+				notify_ring(dev, ring);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
@@ -2178,13 +2178,12 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 		aux_mask |=  GEN9_AUX_CHANNEL_B | GEN9_AUX_CHANNEL_C |
 			GEN9_AUX_CHANNEL_D;
 
-	master_ctl = I915_READ(GEN8_MASTER_IRQ);
+	master_ctl = I915_READ_FW(GEN8_MASTER_IRQ);
 	master_ctl &= ~GEN8_MASTER_IRQ_CONTROL;
 	if (!master_ctl)
 		return IRQ_NONE;
 
-	I915_WRITE(GEN8_MASTER_IRQ, 0);
-	POSTING_READ(GEN8_MASTER_IRQ);
+	I915_WRITE_FW(GEN8_MASTER_IRQ, 0);
 
 	/* Find, clear, then process each source of interrupt */
 
@@ -2281,8 +2280,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 
 	}
 
-	I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
-	POSTING_READ(GEN8_MASTER_IRQ);
+	I915_WRITE_FW(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
+	POSTING_READ_FW(GEN8_MASTER_IRQ);
 
 	return ret;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 41/70] drm/i915: Tidy gen8 IRQ handler
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (39 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 40/70] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-10  8:36   ` Daniel Vetter
  2015-04-07 15:21 ` [PATCH 42/70] drm/i915: Remove request retirement before each batch Chris Wilson
                   ` (18 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Remove some needless variables and parameter passing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 113 +++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index c2c80bf490c6..46bcbff89760 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -985,8 +985,7 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
 	return;
 }
 
-static void notify_ring(struct drm_device *dev,
-			struct intel_engine_cs *ring)
+static void notify_ring(struct intel_engine_cs *ring)
 {
 	if (!intel_ring_initialized(ring))
 		return;
@@ -1248,9 +1247,9 @@ static void ilk_gt_irq_handler(struct drm_device *dev,
 {
 	if (gt_iir &
 	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
-		notify_ring(dev, &dev_priv->ring[RCS]);
+		notify_ring(&dev_priv->ring[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[VCS]);
+		notify_ring(&dev_priv->ring[VCS]);
 }
 
 static void snb_gt_irq_handler(struct drm_device *dev,
@@ -1260,11 +1259,11 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 
 	if (gt_iir &
 	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
-		notify_ring(dev, &dev_priv->ring[RCS]);
+		notify_ring(&dev_priv->ring[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[VCS]);
+		notify_ring(&dev_priv->ring[VCS]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		notify_ring(dev, &dev_priv->ring[BCS]);
+		notify_ring(&dev_priv->ring[BCS]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -1275,63 +1274,65 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 		ivybridge_parity_error_irq_handler(dev, gt_iir);
 }
 
-static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
-				       struct drm_i915_private *dev_priv,
+static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
-	struct intel_engine_cs *ring;
-	u32 rcs, bcs, vcs;
-	uint32_t tmp = 0;
 	irqreturn_t ret = IRQ_NONE;
 
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(0));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(0));
 		if (tmp) {
 			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
 			ret = IRQ_HANDLED;
 
-			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-			ring = &dev_priv->ring[RCS];
-			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (rcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-
-			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
-			ring = &dev_priv->ring[BCS];
-			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (bcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[RCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[RCS]);
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[BCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[BCS]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
 
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(1));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(1));
 		if (tmp) {
 			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
 			ret = IRQ_HANDLED;
 
-			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
-			ring = &dev_priv->ring[VCS];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-
-			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
-			ring = &dev_priv->ring[VCS2];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VCS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VCS]);
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VCS2]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VCS2]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
 
+	if (master_ctl & GEN8_GT_VECS_IRQ) {
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(3));
+		if (tmp) {
+			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
+			ret = IRQ_HANDLED;
+
+			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
+				intel_lrc_irq_handler(&dev_priv->ring[VECS]);
+			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
+				notify_ring(&dev_priv->ring[VECS]);
+		} else
+			DRM_ERROR("The master control interrupt lied (GT3)!\n");
+	}
+
 	if (master_ctl & GEN8_GT_PM_IRQ) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(2));
+		u32 tmp = I915_READ_FW(GEN8_GT_IIR(2));
 		if (tmp & dev_priv->pm_rps_events) {
 			I915_WRITE_FW(GEN8_GT_IIR(2),
 				      tmp & dev_priv->pm_rps_events);
@@ -1341,22 +1342,6 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
 			DRM_ERROR("The master control interrupt lied (PM)!\n");
 	}
 
-	if (master_ctl & GEN8_GT_VECS_IRQ) {
-		tmp = I915_READ_FW(GEN8_GT_IIR(3));
-		if (tmp) {
-			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
-			ret = IRQ_HANDLED;
-
-			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
-			ring = &dev_priv->ring[VECS];
-			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
-				intel_lrc_irq_handler(ring);
-			if (vcs & GT_RENDER_USER_INTERRUPT)
-				notify_ring(dev, ring);
-		} else
-			DRM_ERROR("The master control interrupt lied (GT3)!\n");
-	}
-
 	return ret;
 }
 
@@ -1651,7 +1636,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 
 	if (HAS_VEBOX(dev_priv->dev)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->dev, &dev_priv->ring[VECS]);
+			notify_ring(&dev_priv->ring[VECS]);
 
 		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
@@ -1845,7 +1830,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 			I915_WRITE(VLV_IIR, iir);
 		}
 
-		gen8_gt_irq_handler(dev, dev_priv, master_ctl);
+		gen8_gt_irq_handler(dev_priv, master_ctl);
 
 		/* Call regardless, as some status bits might not be
 		 * signalled in iir */
@@ -2187,7 +2172,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 
 	/* Find, clear, then process each source of interrupt */
 
-	ret = gen8_gt_irq_handler(dev, dev_priv, master_ctl);
+	ret = gen8_gt_irq_handler(dev_priv, master_ctl);
 
 	if (master_ctl & GEN8_DE_MISC_IRQ) {
 		tmp = I915_READ(GEN8_DE_MISC_IIR);
@@ -3692,7 +3677,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		new_iir = I915_READ16(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			int plane = pipe;
@@ -3883,7 +3868,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		new_iir = I915_READ(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			int plane = pipe;
@@ -4110,9 +4095,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		new_iir = I915_READ(IIR); /* Flush posted writes */
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[RCS]);
+			notify_ring(&dev_priv->ring[RCS]);
 		if (iir & I915_BSD_USER_INTERRUPT)
-			notify_ring(dev, &dev_priv->ring[VCS]);
+			notify_ring(&dev_priv->ring[VCS]);
 
 		for_each_pipe(dev_priv, pipe) {
 			if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS &&
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 42/70] drm/i915: Remove request retirement before each batch
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (40 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 41/70] drm/i915: Tidy " Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 43/70] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
                   ` (17 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

This reimplements the denial-of-service protection against igt from

commit 227f782e4667fc622810bce8be8ccdeee45f89c2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 15 10:41:42 2014 +0100

    drm/i915: Retire requests before creating a new one

and transfers the stall from before each batch into a new close handler.
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we make the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking. However, we do wish to do a retire
before forcing an allocation of a new batch pool object (prior to an
execbuffer), but we make the optimisation that we only need to do so if
the oldest available batch pool object is active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            |  3 ++-
 drivers/gpu/drm/i915/i915_gem.c            | 17 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 21 +++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_batch_pool.h |  6 +++++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 --
 drivers/gpu/drm/i915/i915_gem_shrinker.c   |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c           |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  2 +-
 9 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c3fdbb09ddd0..1366e0ec4933 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1593,6 +1593,7 @@ static struct drm_driver driver = {
 	.debugfs_init = i915_debugfs_init,
 	.debugfs_cleanup = i915_debugfs_cleanup,
 #endif
+	.gem_close_object = i915_gem_close_object,
 	.gem_free_object = i915_gem_free_object,
 	.gem_vm_ops = &i915_gem_vm_ops,
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7581c0f5908d..262ebb620112 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2649,6 +2649,8 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_init_vm(struct drm_i915_private *dev_priv,
 		  struct i915_address_space *vm);
+void i915_gem_close_object(struct drm_gem_object *obj,
+			   struct drm_file *file);
 void i915_gem_free_object(struct drm_gem_object *obj);
 void i915_gem_vma_destroy(struct i915_vma *vma);
 
@@ -2769,7 +2771,6 @@ struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring);
 
 bool i915_gem_retire_requests(struct drm_device *dev);
-void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 				      bool interruptible);
 int __must_check i915_gem_check_olr(struct drm_i915_gem_request *req);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 36add864593a..9511993daeea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2201,6 +2201,10 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	for (i = 0; i < page_count; i++) {
 		page = shmem_read_mapping_page_gfp(mapping, i, gfp);
 		if (IS_ERR(page)) {
+			i915_gem_retire_requests(dev_priv->dev);
+			page = shmem_read_mapping_page_gfp(mapping, i, gfp);
+		}
+		if (IS_ERR(page)) {
 			i915_gem_shrink(dev_priv,
 					page_count,
 					I915_SHRINK_BOUND |
@@ -2803,7 +2807,7 @@ void i915_gem_reset(struct drm_device *dev)
 /**
  * This function clears the request list as sequence numbers are passed.
  */
-void
+static void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	WARN_ON(i915_verify_lists(ring->dev));
@@ -2955,6 +2959,17 @@ retire:
 	return 0;
 }
 
+void i915_gem_close_object(struct drm_gem_object *gem,
+			   struct drm_file *file)
+{
+	struct drm_i915_gem_object *obj = to_intel_bo(gem);
+
+	if (obj->active && mutex_trylock(&obj->base.dev->struct_mutex)) {
+		(void)i915_gem_object_flush_active(obj);
+		mutex_unlock(&obj->base.dev->struct_mutex);
+	}
+}
+
 /**
  * i915_gem_wait_ioctl - implements DRM_IOCTL_I915_GEM_WAIT
  * @DRM_IOCTL_ARGS: standard ioctl arguments
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 7bf2f3f2968e..088020a1afc4 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -44,12 +44,13 @@
  * @dev: the drm device
  * @pool: the batch buffer pool
  */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *ring,
 			      struct i915_gem_batch_pool *pool)
 {
 	int n;
 
-	pool->dev = dev;
+	pool->dev = ring->dev;
+	pool->ring = ring;
 
 	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
 		INIT_LIST_HEAD(&pool->cache_list[n]);
@@ -98,7 +99,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			size_t size)
 {
 	struct drm_i915_gem_object *obj = NULL;
-	struct drm_i915_gem_object *tmp, *next;
+	struct drm_i915_gem_object *tmp;
 	struct list_head *list;
 	int n;
 
@@ -113,10 +114,18 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		n = ARRAY_SIZE(pool->cache_list) - 1;
 	list = &pool->cache_list[n];
 
-	list_for_each_entry_safe(tmp, next, list, batch_pool_link) {
+	list_for_each_entry(tmp, list, batch_pool_link) {
 		/* The batches are strictly LRU ordered */
-		if (tmp->active)
-			break;
+		if (tmp->active) {
+			struct drm_i915_gem_request *rq;
+
+			rq = tmp->last_read_req[pool->ring->id];
+			if (!i915_gem_request_completed(rq, true))
+				break;
+
+			BUG_ON(tmp->active & ~intel_ring_flag(pool->ring));
+			BUG_ON(tmp->last_write_req);
+		}
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
index 848e90703eed..467578c621bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
@@ -27,13 +27,17 @@
 
 #include "i915_drv.h"
 
+struct drm_device;
+struct intel_engine_cs;
+
 struct i915_gem_batch_pool {
 	struct drm_device *dev;
+	struct intel_engine_cs *ring;
 	struct list_head cache_list[4];
 };
 
 /* i915_gem_batch_pool.c */
-void i915_gem_batch_pool_init(struct drm_device *dev,
+void i915_gem_batch_pool_init(struct intel_engine_cs *ring,
 			      struct i915_gem_batch_pool *pool);
 void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
 struct drm_i915_gem_object*
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5f735b491e2f..16fd922afb72 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -697,8 +697,6 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	i915_gem_retire_requests_ring(ring);
-
 	vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
 
 	INIT_LIST_HEAD(&ordered_vmas);
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index f7929e769250..87bfced67998 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -204,6 +204,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!i915_gem_shrinker_lock(dev, &unlock))
 		return 0;
 
+	i915_gem_retire_requests(dev);
+
 	count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
 		if (obj->pages_pin_count == 0)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 27942f61d6fe..6e73ff798a2a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1238,7 +1238,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
+	i915_gem_batch_pool_init(ring, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bf7837d30388..6b894ab9d0f2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2000,7 +2000,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	i915_gem_batch_pool_init(dev, &ring->batch_pool);
+	i915_gem_batch_pool_init(ring, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 43/70] drm/i915: Cache the GGTT offset for the execlists context
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (41 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 42/70] drm/i915: Remove request retirement before each batch Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
                   ` (16 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

The offset doesn't change once the context is pinned, but the lookup
turns out to be comparatively costly as it gets repeated for every
request.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 23 +++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6e73ff798a2a..26f96999a4a9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,8 +230,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
-					 struct drm_i915_gem_object *ctx_obj)
+static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *engine,
+					 uint32_t ggtt_offset)
 {
 	uint32_t desc;
 
@@ -239,27 +239,28 @@ static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= i915_gem_obj_ggtt_offset(ctx_obj);
+	desc |= ggtt_offset;
 
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
 	/* desc |= GEN8_CTX_FORCE_RESTORE; */
 
 	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(ring->dev) && INTEL_REVID(ring->dev) <= SKL_REVID_B0 &&
-	    (ring->id == BCS || ring->id == VCS ||
-	     ring->id == VECS || ring->id == VCS2))
+	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
+	    (engine->id == BCS || engine->id == VCS ||
+	     engine->id == VECS || engine->id == VCS2))
 		desc |= GEN8_CTX_FORCE_RESTORE;
 
 	return desc;
 }
 
-static uint32_t execlists_request_write_tail(struct intel_engine_cs *ring,
+static uint32_t execlists_request_write_tail(struct intel_engine_cs *engine,
 					     struct drm_i915_gem_request *rq)
 
 {
-	rq->ctx->engine[ring->id].ringbuf->regs[CTX_RING_TAIL+1] = rq->tail;
-	return execlists_ctx_descriptor(ring, rq->ctx->engine[ring->id].state);
+	struct intel_ringbuffer *ring = rq->ctx->engine[engine->id].ringbuf;
+	ring->regs[CTX_RING_TAIL+1] = rq->tail;
+	return execlists_ctx_descriptor(engine, ring->ggtt_offset);
 }
 
 static void execlists_submit_pair(struct intel_engine_cs *ring)
@@ -510,7 +511,8 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
-	if (WARN_ON(i915_gem_obj_ggtt_offset(ctx_obj) & 0xFFFFFFFF00000FFFULL)) {
+	ringbuf->ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	if (WARN_ON(ringbuf->ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
 		ret = -ENODEV;
 		goto unpin_ctx_obj;
 	}
@@ -533,6 +535,7 @@ reset_pin_count:
 	return ret;
 }
 
+
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request,
 					    struct intel_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 0899123c6bcc..55c91014bfdf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -98,6 +98,7 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 	uint32_t *regs;
+	uint32_t ggtt_offset;
 
 	struct intel_engine_cs *ring;
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (42 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 43/70] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-10  8:37   ` Daniel Vetter
  2015-04-07 15:21 ` [PATCH 45/70] drm/i915: Remove request->uniq Chris Wilson
                   ` (15 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9511993daeea..c394c0d13eb7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2570,7 +2570,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 
 	i915_queue_hangcheck(ring->dev);
 
-	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -2908,6 +2907,12 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	struct drm_i915_private *dev_priv =
 		container_of(work, typeof(*dev_priv), mm.idle_work.work);
 	struct drm_device *dev = dev_priv->dev;
+	struct intel_engine_cs *ring;
+	int i;
+
+	for_each_ring(ring, dev_priv, i)
+		if (!list_empty(&ring->request_list))
+			return;
 
 	intel_mark_idle(dev);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 45/70] drm/i915: Remove request->uniq
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (43 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-10  8:38   ` Daniel Vetter
  2015-04-07 15:21 ` [PATCH 46/70] drm/i915: Cache the reset_counter for the request Chris Wilson
                   ` (14 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula

We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h   |  4 ----
 drivers/gpu/drm/i915/i915_gem.c   |  1 -
 drivers/gpu/drm/i915/i915_trace.h | 13 ++++---------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 262ebb620112..89839751237c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1843,8 +1843,6 @@ struct drm_i915_private {
 		void (*stop_ring)(struct intel_engine_cs *ring);
 	} gt;
 
-	uint32_t request_uniq;
-
 	/*
 	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 	 * will be rejected. Instead look for a better place.
@@ -2120,8 +2118,6 @@ struct drm_i915_gem_request {
 	/** process identifier submitting this request */
 	struct pid *pid;
 
-	uint32_t uniq;
-
 	/**
 	 * The ELSP only accepts two elements at a time, so we queue
 	 * context/tail pairs on a given queue (ring->execlist_queue) until the
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c394c0d13eb7..e90894545fa4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2660,7 +2660,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	}
 
 	rq->ring = ring;
-	rq->uniq = dev_priv->request_uniq++;
 
 	if (i915.enable_execlists)
 		ret = intel_logical_ring_alloc_request_extras(rq, ctx);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index ce8ee9e8bced..6e2eee52aaa2 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -499,7 +499,6 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
-			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     ),
 
@@ -508,13 +507,11 @@ DECLARE_EVENT_CLASS(i915_gem_request,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->uniq,
-		      __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u",
+		      __entry->dev, __entry->ring, __entry->seqno)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
@@ -559,7 +556,6 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
-			     __field(u32, uniq)
 			     __field(u32, seqno)
 			     __field(bool, blocking)
 			     ),
@@ -575,14 +571,13 @@ TRACE_EVENT(i915_gem_request_wait_begin,
 						i915_gem_request_get_ring(req);
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->uniq = req ? req->uniq : 0;
 			   __entry->seqno = i915_gem_request_get_seqno(req);
 			   __entry->blocking =
 				     mutex_is_locked(&ring->dev->struct_mutex);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, blocking=%s",
-		      __entry->dev, __entry->ring, __entry->uniq,
+	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
+		      __entry->dev, __entry->ring,
 		      __entry->seqno, __entry->blocking ?  "yes (NB)" : "no")
 );
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 46/70] drm/i915: Cache the reset_counter for the request
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (44 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 45/70] drm/i915: Remove request->uniq Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 47/70] drm/i915: Allocate context objects from stolen Chris Wilson
                   ` (13 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Instead of querying the reset counter before every access to the ring,
query it the first time we touch the ring, and do a final compare when
submitting the request. For correctness, we need to then sanitize how
the reset_counter is incremented to prevent broken submission and
waiting across resets, in the process fixing the persistent -EIO we
still see today on failed waits.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         | 32 +++++++-----
 drivers/gpu/drm/i915/i915_drv.h         | 29 +++++++----
 drivers/gpu/drm/i915/i915_gem.c         | 87 ++++++++++++++-------------------
 drivers/gpu/drm/i915/i915_irq.c         | 28 ++++-------
 drivers/gpu/drm/i915/intel_display.c    | 10 ++--
 drivers/gpu/drm/i915/intel_lrc.c        |  7 ---
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 ---
 7 files changed, 87 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 1366e0ec4933..72b01323c549 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -827,6 +827,8 @@ int i915_resume_legacy(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	unsigned reset_counter;
 	bool simulated;
 	int ret;
 
@@ -836,17 +838,23 @@ int i915_reset(struct drm_device *dev)
 	intel_reset_gt_powersave(dev);
 
 	mutex_lock(&dev->struct_mutex);
+	reset_counter = atomic_inc_return(&error->reset_counter);
+	if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+		atomic_set_mask(I915_WEDGED, &error->reset_counter);
+		mutex_unlock(&dev->struct_mutex);
+		return -EIO;
+	}
 
 	i915_gem_reset(dev);
 
-	simulated = dev_priv->gpu_error.stop_rings != 0;
+	simulated = error->stop_rings != 0;
 
 	ret = intel_gpu_reset(dev);
 
 	/* Also reset the gpu hangman. */
 	if (simulated) {
 		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-		dev_priv->gpu_error.stop_rings = 0;
+		error->stop_rings = 0;
 		if (ret == -ENODEV) {
 			DRM_INFO("Reset not implemented, but ignoring "
 				 "error for simulated gpu hangs\n");
@@ -859,8 +867,7 @@ int i915_reset(struct drm_device *dev)
 
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
-		mutex_unlock(&dev->struct_mutex);
-		return ret;
+		goto error;
 	}
 
 	intel_overlay_reset(dev_priv);
@@ -879,20 +886,14 @@ int i915_reset(struct drm_device *dev)
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
-	dev_priv->gpu_error.reload_in_reset = true;
-
 	ret = i915_gem_init_hw(dev);
-
-	dev_priv->gpu_error.reload_in_reset = false;
-
-	mutex_unlock(&dev->struct_mutex);
 	if (ret) {
 		DRM_ERROR("Failed hw init on reset %d\n", ret);
-		return ret;
+		goto error;
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	/*
 	 * rps/rc6 re-init is necessary to restore state lost after the
 	 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -903,6 +904,11 @@ int i915_reset(struct drm_device *dev)
 		intel_enable_gt_powersave(dev);
 
 	return 0;
+
+error:
+	atomic_set_mask(I915_WEDGED, &error->reset_counter);
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 89839751237c..97f5d266b17c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1308,9 +1308,6 @@ struct i915_gpu_error {
 
 	/* For missed irq/seqno simulation. */
 	unsigned int test_irq_rings;
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset   */
-	bool reload_in_reset;
 };
 
 enum modeset_restore {
@@ -2072,6 +2069,7 @@ struct drm_i915_gem_request {
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
 	struct intel_engine_cs *ring;
+	unsigned reset_counter;
 
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
@@ -2767,24 +2765,38 @@ struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring);
 
 bool i915_gem_retire_requests(struct drm_device *dev);
-int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
+int __must_check i915_gem_check_wedge(unsigned reset_counter,
 				      bool interruptible);
 int __must_check i915_gem_check_olr(struct drm_i915_gem_request *req);
 
+static inline u32 i915_reset_counter(struct i915_gpu_error *error)
+{
+	return atomic_read(&error->reset_counter);
+}
+
+static inline bool __i915_reset_in_progress(u32 reset)
+{
+	return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+}
+
+static inline bool __i915_terminally_wedged(u32 reset)
+{
+	return reset & I915_WEDGED;
+}
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
-	return unlikely(atomic_read(&error->reset_counter)
-			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+	return __i915_reset_in_progress(i915_reset_counter(error));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) & I915_WEDGED;
+	return __i915_terminally_wedged(i915_reset_counter(error));
 }
 
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
+	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
@@ -2816,7 +2828,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 #define i915_add_request(ring) \
 	__i915_add_request(ring, NULL, NULL)
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct drm_i915_file_private *file_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e90894545fa4..729c7fa02e12 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,14 +100,19 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
 	spin_unlock(&dev_priv->mm.object_stat_lock);
 }
 
+inline static bool reset_complete(struct i915_gpu_error *error)
+{
+	unsigned reset_counter = i915_reset_counter(error);
+	return (!__i915_reset_in_progress(reset_counter) ||
+		__i915_terminally_wedged(reset_counter));
+}
+
 static int
 i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress(error) || \
-		   i915_terminally_wedged(error))
-	if (EXIT_COND)
+	if (reset_complete(error))
 		return 0;
 
 	/*
@@ -116,17 +121,16 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 	 * we should simply try to bail out and fail as gracefully as possible.
 	 */
 	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       EXIT_COND,
+					       reset_complete(error),
 					       10*HZ);
 	if (ret == 0) {
 		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
 		return -EIO;
 	} else if (ret < 0) {
 		return ret;
+	} else {
+		return 0;
 	}
-#undef EXIT_COND
-
-	return 0;
 }
 
 int i915_mutex_lock_interruptible(struct drm_device *dev)
@@ -1127,26 +1131,18 @@ put_rpm:
 }
 
 int
-i915_gem_check_wedge(struct i915_gpu_error *error,
+i915_gem_check_wedge(unsigned reset_counter,
 		     bool interruptible)
 {
-	if (i915_reset_in_progress(error)) {
+	if (__i915_reset_in_progress(reset_counter)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
 			return -EIO;
 
 		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
+		if (__i915_terminally_wedged(reset_counter))
 			return -EIO;
-
-		/*
-		 * Check if GPU Reset is in progress - we need intel_ring_begin
-		 * to work properly to reinit the hw state while the gpu is
-		 * still marked as reset-in-progress. Handle this with a flag.
-		 */
-		if (!error->reload_in_reset)
-			return -EAGAIN;
 	}
 
 	return 0;
@@ -1206,7 +1202,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @reset_counter: reset sequence associated with the given request
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  *
@@ -1221,7 +1216,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct drm_i915_file_private *file_priv)
@@ -1271,12 +1265,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) {
-			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
-			 * is truely gone. */
-			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-			if (ret == 0)
-				ret = -EAGAIN;
+		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
+			/* As we do not requeue the request over a GPU reset,
+			 * if one does occur we know that the request is
+			 * effectively complete.
+			 */
+			ret = 0;
 			break;
 		}
 
@@ -1414,17 +1408,11 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-	if (ret)
-		return ret;
-
 	ret = i915_gem_check_olr(req);
 	if (ret)
 		return ret;
 
-	ret = __i915_wait_request(req,
-				  atomic_read(&dev_priv->gpu_error.reset_counter),
-				  interruptible, NULL, NULL);
+	ret = __i915_wait_request(req, interruptible, NULL, NULL);
 	if (ret)
 		return ret;
 
@@ -1499,7 +1487,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_request *requests[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -1508,12 +1495,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
-	if (ret)
-		return ret;
-
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
-
 	if (readonly) {
 		struct drm_i915_gem_request *rq;
 
@@ -1544,8 +1525,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	mutex_unlock(&dev->struct_mutex);
 	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, file_priv);
+		ret = __i915_wait_request(requests[i], true, NULL, file_priv);
 	mutex_lock(&dev->struct_mutex);
 
 err:
@@ -2489,6 +2469,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	if (WARN_ON(request == NULL))
 		return -ENOMEM;
 
+	if (request->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
+		return -EAGAIN;
+
 	if (i915.enable_execlists) {
 		ringbuf = request->ctx->engine[ring->id].ringbuf;
 	} else
@@ -2640,18 +2623,24 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *rq;
 	int ret;
 
 	if (ring->outstanding_lazy_request)
 		return 0;
 
+	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
 	rq = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
 	if (rq == NULL)
 		return -ENOMEM;
 
 	kref_init(&rq->ref);
 	rq->i915 = dev_priv;
+	rq->reset_counter = reset_counter;
 
 	ret = i915_gem_get_seqno(ring->dev, &rq->seqno);
 	if (ret) {
@@ -2999,11 +2988,9 @@ void i915_gem_close_object(struct drm_gem_object *gem,
 int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
-	unsigned reset_counter;
 	int i, n = 0;
 	int ret;
 
@@ -3037,7 +3024,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -3050,7 +3036,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			ret = __i915_wait_request(req[i], reset_counter, true,
+			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  file->driver_priv);
 		i915_gem_request_unreference__unlocked(req[i]);
@@ -3088,7 +3074,6 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(rq,
-					  atomic_read(&i915->gpu_error.reset_counter),
 					  i915->mm.interruptible,
 					  NULL,
 					  &i915->rps.semaphores);
@@ -4298,14 +4283,15 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
 	struct drm_i915_gem_request *request, *target = NULL;
-	unsigned reset_counter;
 	int ret;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, false);
+	/* ABI: return -EIO if wedged */
+	ret = i915_gem_check_wedge(i915_reset_counter(&dev_priv->gpu_error),
+				   false);
 	if (ret)
 		return ret;
 
@@ -4316,7 +4302,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
@@ -4324,7 +4309,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_request(target, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 46bcbff89760..47c9c02e6731 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2299,6 +2299,13 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 		wake_up_all(&dev_priv->gpu_error.reset_queue);
 }
 
+static bool reset_pending(struct i915_gpu_error *error)
+{
+	unsigned reset_counter = i915_reset_counter(error);
+	return (__i915_reset_in_progress(reset_counter) &&
+		!__i915_terminally_wedged(reset_counter));
+}
+
 /**
  * i915_reset_and_wakeup - do process context error handling work
  *
@@ -2308,7 +2315,6 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 static void i915_reset_and_wakeup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_gpu_error *error = &dev_priv->gpu_error;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
 	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
@@ -2326,7 +2332,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
+	if (reset_pending(&dev_priv->gpu_error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
@@ -2354,25 +2360,9 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 
 		intel_runtime_pm_put(dev_priv);
 
-		if (ret == 0) {
-			/*
-			 * After all the gem state is reset, increment the reset
-			 * counter and wake up everyone waiting for the reset to
-			 * complete.
-			 *
-			 * Since unlock operations are a one-sided barrier only,
-			 * we need to insert a barrier here to order any seqno
-			 * updates before
-			 * the counter increment.
-			 */
-			smp_mb__before_atomic();
-			atomic_inc(&dev_priv->gpu_error.reset_counter);
-
+		if (ret == 0)
 			kobject_uevent_env(&dev->primary->kdev->kobj,
 					   KOBJ_CHANGE, reset_done_event);
-		} else {
-			atomic_set_mask(I915_WEDGED, &error->reset_counter);
-		}
 
 		/*
 		 * Note: The wake_up also serves as a memory barrier so that
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 0c2bb2ce04fc..69db1c3b26a8 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3196,8 +3196,7 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	bool pending;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	if (intel_crtc->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -9689,8 +9688,7 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	if (crtc->reset_counter != i915_reset_counter(&dev_priv->gpu_error))
 		return true;
 
 	/*
@@ -10109,9 +10107,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 		container_of(work, struct intel_mmio_flip, work);
 
 	if (mmio_flip->rq)
-		WARN_ON(__i915_wait_request(mmio_flip->rq,
-					    mmio_flip->crtc->reset_counter,
-					    false, NULL,
+		WARN_ON(__i915_wait_request(mmio_flip->rq, false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
 
 	intel_do_mmio_flip(mmio_flip->crtc);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 26f96999a4a9..fc57d4111e56 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -676,15 +676,8 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 				    struct intel_context *ctx, int num_dwords)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = logical_ring_prepare(ringbuf, ctx, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6b894ab9d0f2..b6b2e076fed4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2189,14 +2189,8 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
 int intel_ring_begin(struct intel_engine_cs *ring,
 		     int num_dwords)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 47/70] drm/i915: Allocate context objects from stolen
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (45 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 46/70] drm/i915: Cache the reset_counter for the request Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-10  8:39   ` Daniel Vetter
  2015-04-07 15:21 ` [PATCH 48/70] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
                   ` (12 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

As we never expose context objects directly to userspace, we can forgo
allocating a first-class GEM object for them and prefer to use the
limited resource of reserved/stolen memory for them. Note this means
that their initial contents are undefined.

However, a downside of using stolen objects for execlists is that we
cannot access the physical address directly (thanks MCH!) which prevents
their use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 4 +++-
 drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 18900f745bc6..b9c6b0ad1d0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -157,7 +157,9 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	obj = i915_gem_alloc_object(dev, size);
+	obj = i915_gem_object_create_stolen(dev, size);
+	if (obj == NULL)
+		obj = i915_gem_alloc_object(dev, size);
 	if (obj == NULL)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fc57d4111e56..a62ffaa45bd1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1711,7 +1711,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
-	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
+	ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (IS_ERR(ctx_obj)) {
 		ret = PTR_ERR(ctx_obj);
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 48/70] drm/i915: Introduce an internal allocator for disposable private objects
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (46 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 47/70] drm/i915: Allocate context objects from stolen Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 49/70] drm/i915: Do not zero initialise page tables Chris Wilson
                   ` (11 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swapped backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small varierty of other internal
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                |   1 +
 drivers/gpu/drm/i915/i915_drv.h              |   5 +
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  25 ++---
 drivers/gpu/drm/i915/i915_gem_internal.c     | 149 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  10 +-
 6 files changed, 168 insertions(+), 24 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_internal.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a69002e2257d..0054a058477d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_evict.o \
 	  i915_gem_execbuffer.o \
 	  i915_gem_gtt.o \
+	  i915_gem_internal.o \
 	  i915_gem.o \
 	  i915_gem_shrinker.o \
 	  i915_gem_stolen.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 97f5d266b17c..c710a9ea1458 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3027,6 +3027,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 gtt_offset,
 					       u32 size);
 
+/* i915_gem_internal.c */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size);
+
 /* i915_gem_shrinker.c */
 unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
 			      long target,
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 088020a1afc4..e68c439bceda 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -101,7 +101,7 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	struct drm_i915_gem_object *obj = NULL;
 	struct drm_i915_gem_object *tmp;
 	struct list_head *list;
-	int n;
+	int n, ret;
 
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
@@ -127,13 +127,6 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 			BUG_ON(tmp->last_write_req);
 		}
 
-		/* While we're looping, do some clean up */
-		if (tmp->madv == __I915_MADV_PURGED) {
-			list_del(&tmp->batch_pool_link);
-			drm_gem_object_unreference(&tmp->base);
-			continue;
-		}
-
 		if (tmp->base.size >= size) {
 			obj = tmp;
 			break;
@@ -141,20 +134,16 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	}
 
 	if (obj == NULL) {
-		int ret;
-
-		obj = i915_gem_alloc_object(pool->dev, size);
+		obj = i915_gem_object_create_internal(pool->dev, size);
 		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
-
-		ret = i915_gem_object_get_pages(obj);
-		if (ret)
-			return ERR_PTR(ret);
-
-		obj->madv = I915_MADV_DONTNEED;
 	}
 
-	list_move_tail(&obj->batch_pool_link, list);
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ERR_PTR(ret);
+
 	i915_gem_object_pin_pages(obj);
+	list_move_tail(&obj->batch_pool_link, list);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_internal.c b/drivers/gpu/drm/i915/i915_gem_internal.c
new file mode 100644
index 000000000000..583908392ff5
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_internal.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
+
+static void __i915_gem_object_free_pages(struct sg_table *st)
+{
+	struct sg_page_iter sg_iter;
+
+	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
+		page_cache_release(sg_page_iter_page(&sg_iter));
+
+	sg_free_table(st);
+	kfree(st);
+}
+
+static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj)
+{
+	const unsigned npages = obj->base.size / PAGE_SIZE;
+	struct sg_table *st;
+	struct scatterlist *sg;
+	unsigned long last_pfn = 0;	/* suppress gcc warning */
+	gfp_t gfp;
+	int i;
+
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (st == NULL)
+		return -ENOMEM;
+
+	if (sg_alloc_table(st, npages, GFP_KERNEL)) {
+		kfree(st);
+		return -ENOMEM;
+	}
+
+	sg = st->sgl;
+	st->nents = 0;
+
+	gfp = GFP_KERNEL;
+	gfp |= __GFP_NORETRY | __GFP_NOWARN | __GFP_NO_KSWAPD;
+	gfp &= ~(__GFP_IO | __GFP_WAIT);
+	for (i = 0; i < npages; i++) {
+		struct page *page;
+
+		page = alloc_page(gfp);
+		if (page == NULL) {
+			i915_gem_shrink_all(to_i915(obj->base.dev));
+			page = alloc_page(GFP_KERNEL);
+			if (page == NULL)
+				goto err;
+		}
+
+		/* XXX page allocator needs to check for SNB bugs */
+
+#ifdef CONFIG_SWIOTLB
+		if (swiotlb_nr_tbl()) {
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+			sg = sg_next(sg);
+			continue;
+		}
+#endif
+		if (!i || page_to_pfn(page) != last_pfn + 1) {
+			if (i)
+				sg = sg_next(sg);
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+		} else {
+			sg->length += PAGE_SIZE;
+		}
+		last_pfn = page_to_pfn(page);
+	}
+#ifdef CONFIG_SWIOTLB
+	if (!swiotlb_nr_tbl())
+#endif
+		sg_mark_end(sg);
+	obj->pages = st;
+	obj->madv = I915_MADV_DONTNEED;
+
+	return 0;
+
+err:
+	sg_mark_end(sg);
+	__i915_gem_object_free_pages(st);
+	return -ENOMEM;
+}
+
+static void i915_gem_object_put_pages_internal(struct drm_i915_gem_object *obj)
+{
+	__i915_gem_object_free_pages(obj->pages);
+
+	obj->dirty = 0;
+	obj->madv = I915_MADV_WILLNEED;
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_object_internal_ops = {
+	.get_pages = i915_gem_object_get_pages_internal,
+	.put_pages = i915_gem_object_put_pages_internal,
+};
+
+
+/**
+ * Creates a new object that wraps some internal memory for private use.
+ * This object is not backed by swappable storage, and as such its contents
+ * are volatile and only valid whilst pinned. If the object is reaped by the
+ * shrinker, its pages and data will be discarded. Equally, it is not a full
+ * GEM object and so not valid for access from userspace. This makes it useful
+ * for hardware interfaces like ringbuffers (which are pinned from the time
+ * the request is written to the time the hardware stops accessing it), but
+ * not for contexts (which need to be preserved when not active for later
+ * reuse).
+ */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned size)
+{
+	struct drm_i915_gem_object *obj;
+
+	obj = i915_gem_object_alloc(dev);
+	if (obj == NULL)
+		return NULL;
+
+	drm_gem_private_object_init(dev, &obj->base, size);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops);
+
+	return obj;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 521548a08578..4bb91cdadec9 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -57,7 +57,7 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->rodata->batch_items * 4 > 4096)
 		return -EINVAL;
 
-	so->obj = i915_gem_alloc_object(dev, 4096);
+	so->obj = i915_gem_object_create_internal(dev, 4096);
 	if (so->obj == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b6b2e076fed4..441adc8fa535 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -669,7 +669,7 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 
 	WARN_ON(ring->scratch.obj);
 
-	ring->scratch.obj = i915_gem_alloc_object(ring->dev, 4096);
+	ring->scratch.obj = i915_gem_object_create_internal(ring->dev, 4096);
 	if (ring->scratch.obj == NULL) {
 		DRM_ERROR("Failed to allocate seqno page\n");
 		ret = -ENOMEM;
@@ -1845,7 +1845,7 @@ static int init_status_page(struct intel_engine_cs *ring)
 		unsigned flags;
 		int ret;
 
-		obj = i915_gem_alloc_object(ring->dev, 4096);
+		obj = i915_gem_object_create_internal(ring->dev, 4096);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate status page\n");
 			return -ENOMEM;
@@ -1971,7 +1971,7 @@ int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	if (!HAS_LLC(dev))
 		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
 	if (obj == NULL)
-		obj = i915_gem_alloc_object(dev, ringbuf->size);
+		obj = i915_gem_object_create_internal(dev, ringbuf->size);
 	if (obj == NULL)
 		return -ENOMEM;
 
@@ -2446,7 +2446,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	if (INTEL_INFO(dev)->gen >= 8) {
 		if (i915_semaphore_is_enabled(dev)) {
-			obj = i915_gem_alloc_object(dev, 4096);
+			obj = i915_gem_object_create_internal(dev, 4096);
 			if (obj == NULL) {
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 				i915.semaphores = 0;
@@ -2552,7 +2552,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
-		obj = i915_gem_alloc_object(dev, I830_WA_SIZE);
+		obj = i915_gem_object_create_internal(dev, I830_WA_SIZE);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate batch bo\n");
 			return -ENOMEM;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 49/70] drm/i915: Do not zero initialise page tables
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (47 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 48/70] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 50/70] drm/i915: The argument for postfix is redundant Chris Wilson
                   ` (10 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Mika Kuoppala

After we successfully allocate them, we will fill them with their
initial contents (either the chain of page tables, or a pointer to the
scratch page).

Regression from
commit 06fda602dbca9c59d87db7da71192e4b54c9f5ff
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Tue Feb 24 16:22:36 2015 +0000

    drm/i915: Create page table allocators

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com> (v3+)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 85077beb9338..a80573105a61 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -429,7 +429,7 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
-	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	pd->page = alloc_page(GFP_KERNEL);
 	if (!pd->page) {
 		kfree(pd);
 		return ERR_PTR(-ENOMEM);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 50/70] drm/i915: The argument for postfix is redundant
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (48 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 49/70] drm/i915: Do not zero initialise page tables Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-10  8:53   ` Daniel Vetter
  2015-04-07 15:21 ` [PATCH 51/70] drm/i915: Record the position of the start of the request Chris Wilson
                   ` (9 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

We are conservative on the amount of free space available in the ring to
avoid overruning the potential MI_INTERRUPT after the seqno write.
Further undermining the justification for the change was that it was
applied incorrectly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++------------
 drivers/gpu/drm/i915/i915_gem.c         |  9 +--------
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_dvo.c        |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  8 +++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 6 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c710a9ea1458..4b46c5b5eb44 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2074,18 +2074,8 @@ struct drm_i915_gem_request {
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
 
-	/** Position in the ringbuffer of the start of the request */
-	u32 head;
-
-	/**
-	 * Position in the ringbuffer of the start of the postfix.
-	 * This is required to calculate the maximum available ringbuffer
-	 * space without overwriting the postfix.
-	 */
-	 u32 postfix;
-
-	/** Position in the ringbuffer of the end of the whole request */
-	u32 tail;
+	/** Position in the ringbuffer of the request */
+	u32 head, tail;
 
 	/**
 	 * Context and ring buffer related to this request
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 729c7fa02e12..d9b5bf4f1f21 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1359,7 +1359,7 @@ void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 * Note this requires that we are always called in request
 	 * completion order.
 	 */
-	request->ringbuf->last_retired_head = request->postfix;
+	request->ringbuf->last_retired_head = request->tail;
 
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
@@ -2495,13 +2495,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 			return ret;
 	}
 
-	/* Record the position of the start of the request so that
-	 * should we detect the updated seqno part-way through the
-	 * GPU processing the request, we never over-estimate the
-	 * position of the head.
-	 */
-	request->postfix = intel_ring_get_tail(ringbuf);
-
 	if (i915.enable_execlists) {
 		ret = ring->emit_request(ringbuf, request);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 8832f1b2a495..b7a00e464ba4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1072,7 +1072,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->seqno;
 			erq->jiffies = request->emitted_jiffies;
-			erq->tail = request->postfix;
+			erq->tail = request->tail;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_dvo.c b/drivers/gpu/drm/i915/intel_dvo.c
index 9a27ec7100ef..f45caa6af7d2 100644
--- a/drivers/gpu/drm/i915/intel_dvo.c
+++ b/drivers/gpu/drm/i915/intel_dvo.c
@@ -496,7 +496,7 @@ void intel_dvo_init(struct drm_device *dev)
 		int gpio;
 		bool dvoinit;
 		enum pipe pipe;
-		uint32_t dpll[2];
+		uint32_t dpll[I915_MAX_PIPES];
 
 		/* Allow the I2C driver info to specify the GPIO to be used in
 		 * special cases, but otherwise default to what's defined
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a62ffaa45bd1..b3ca88ff88eb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -409,7 +409,6 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 
 static int execlists_context_queue(struct intel_engine_cs *ring,
 				   struct intel_context *to,
-				   u32 tail,
 				   struct drm_i915_gem_request *request)
 {
 	if (WARN_ON(request == NULL))
@@ -421,8 +420,6 @@ static int execlists_context_queue(struct intel_engine_cs *ring,
 	i915_gem_request_reference(request);
 	WARN_ON(to != request->ctx);
 
-	request->tail = tail;
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_add_tail(&request->execlist_link, &ring->execlist_queue);
@@ -574,7 +571,7 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 			continue;
 
 		/* Would completion of this request free enough space? */
-		space = __intel_ring_space(request->postfix, ringbuf->tail,
+		space = __intel_ring_space(request->tail, ringbuf->tail,
 					   ringbuf->size);
 		if (space >= bytes)
 			break;
@@ -608,11 +605,12 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 	struct intel_engine_cs *ring = ringbuf->ring;
 
 	intel_logical_ring_advance(ringbuf);
+	request->tail = ringbuf->tail;
 
 	if (intel_ring_stopped(ring))
 		return;
 
-	execlists_context_queue(ring, ctx, ringbuf->tail, request);
+	execlists_context_queue(ring, ctx, request);
 }
 
 static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 441adc8fa535..0b68ac5a7298 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2096,7 +2096,7 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		space = __intel_ring_space(request->postfix, ringbuf->tail,
+		space = __intel_ring_space(request->tail, ringbuf->tail,
 					   ringbuf->size);
 		if (space >= n)
 			break;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 51/70] drm/i915: Record the position of the start of the request
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (49 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 50/70] drm/i915: The argument for postfix is redundant Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 52/70] drm/i915: Cache the execlist ctx descriptor Chris Wilson
                   ` (8 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Not only does it make for good documentation and debugging aide, but it
is also vital for when we want to unwind requests - such as when
throwing away an incomplete request.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 13 ++++++++----
 drivers/gpu/drm/i915/i915_gem_context.c |  9 +--------
 drivers/gpu/drm/i915/intel_lrc.c        | 35 ---------------------------------
 drivers/gpu/drm/i915/intel_lrc.h        |  2 --
 4 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9b5bf4f1f21..e9f2d2b102de 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2462,7 +2462,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
 	struct intel_ringbuffer *ringbuf;
-	u32 request_start;
 	int ret;
 
 	request = ring->outstanding_lazy_request;
@@ -2477,7 +2476,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 	} else
 		ringbuf = ring->buffer;
 
-	request_start = intel_ring_get_tail(ringbuf);
 	/*
 	 * Emit any outstanding flushes - execbuf can fail to emit the flush
 	 * after having emitted the batchbuffer command. Hence we need to fix
@@ -2505,7 +2503,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
 			return ret;
 	}
 
-	request->head = request_start;
 	request->tail = intel_ring_get_tail(ringbuf);
 
 	/* Whilst this request exists, batch_obj will be on the
@@ -2652,6 +2649,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		return ret;
 	}
 
+	rq->head = intel_ring_get_tail(rq->ringbuf);
 	ring->outstanding_lazy_request = rq;
 	return 0;
 }
@@ -2736,7 +2734,14 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 	}
 
 	/* This may not have been flushed before the reset, so clean it now */
-	i915_gem_request_assign(&ring->outstanding_lazy_request, NULL);
+	if (ring->outstanding_lazy_request) {
+		struct drm_i915_gem_request *request;
+
+		request = ring->outstanding_lazy_request;
+		request->ringbuf->tail = request->head;
+
+		i915_gem_request_assign(&ring->outstanding_lazy_request, NULL);
+	}
 }
 
 void i915_gem_restore_fences(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b9c6b0ad1d0f..43e58249235b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -298,15 +298,8 @@ void i915_gem_context_reset(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int i;
 
-	if (i915.enable_execlists) {
-		struct intel_context *ctx;
-
-		list_for_each_entry(ctx, &dev_priv->context_list, link) {
-			intel_lr_context_reset(dev, ctx);
-		}
-
+	if (i915.enable_execlists)
 		return;
-	}
 
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct intel_engine_cs *ring = &dev_priv->ring[i];
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b3ca88ff88eb..459a27a2b486 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1804,38 +1804,3 @@ error_unpin_ctx:
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;
 }
-
-void intel_lr_context_reset(struct drm_device *dev,
-			    struct intel_context *ctx)
-{
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_engine_cs *ring;
-	int i;
-
-	for_each_ring(ring, dev_priv, i) {
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
-		struct intel_ringbuffer *ringbuf =
-				ctx->engine[ring->id].ringbuf;
-		uint32_t *reg_state;
-		struct page *page;
-
-		if (!ctx_obj)
-			continue;
-
-		if (i915_gem_object_get_pages(ctx_obj)) {
-			WARN(1, "Failed get_pages for context obj\n");
-			continue;
-		}
-		page = i915_gem_object_get_page(ctx_obj, 1);
-		reg_state = kmap_atomic(page);
-
-		reg_state[CTX_RING_HEAD+1] = 0;
-		reg_state[CTX_RING_TAIL+1] = 0;
-
-		kunmap_atomic(reg_state);
-
-		ringbuf->head = 0;
-		ringbuf->tail = 0;
-	}
-}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 16c717672020..1aafb99cfff4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -70,8 +70,6 @@ static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
-void intel_lr_context_reset(struct drm_device *dev,
-			struct intel_context *ctx);
 
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 52/70] drm/i915: Cache the execlist ctx descriptor
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (50 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 51/70] drm/i915: Record the position of the start of the request Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 53/70] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
                   ` (7 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 56 +++++++++++++++++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +-
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 459a27a2b486..3fe63bf604b4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -230,37 +230,13 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
-static uint32_t execlists_ctx_descriptor(struct intel_engine_cs *engine,
-					 uint32_t ggtt_offset)
-{
-	uint32_t desc;
-
-	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
-	desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= ggtt_offset;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
-
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
-	    (engine->id == BCS || engine->id == VCS ||
-	     engine->id == VECS || engine->id == VCS2))
-		desc |= GEN8_CTX_FORCE_RESTORE;
-
-	return desc;
-}
-
 static uint32_t execlists_request_write_tail(struct intel_engine_cs *engine,
 					     struct drm_i915_gem_request *rq)
 
 {
 	struct intel_ringbuffer *ring = rq->ctx->engine[engine->id].ringbuf;
 	ring->regs[CTX_RING_TAIL+1] = rq->tail;
-	return execlists_ctx_descriptor(engine, ring->ggtt_offset);
+	return ring->descriptor;
 }
 
 static void execlists_submit_pair(struct intel_engine_cs *ring)
@@ -498,6 +474,7 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	u32 ggtt_offset;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -508,11 +485,12 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto reset_pin_count;
 
-	ringbuf->ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
-	if (WARN_ON(ringbuf->ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
+	ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	if (WARN_ON(ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
 		ret = -ENODEV;
 		goto unpin_ctx_obj;
 	}
+	ringbuf->descriptor = ggtt_offset | ring->execlist_ctx_descriptor;
 
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
@@ -1222,6 +1200,28 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	}
 }
 
+static uint32_t base_ctx_descriptor(struct intel_engine_cs *engine)
+{
+	uint32_t desc;
+
+	desc = GEN8_CTX_VALID;
+	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_L3LLC_COHERENT;
+	desc |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	if (IS_GEN9(engine->dev) && INTEL_REVID(engine->dev) <= SKL_REVID_B0 &&
+	    (engine->id == BCS || engine->id == VCS ||
+	     engine->id == VECS || engine->id == VCS2))
+		desc |= GEN8_CTX_FORCE_RESTORE;
+
+	return desc;
+}
+
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
 	int ret;
@@ -1243,6 +1243,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	if (ret)
 		return ret;
 
+	ring->execlist_ctx_descriptor = base_ctx_descriptor(ring);
+
 	ret = intel_lr_context_deferred_create(ring->default_context, ring);
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 55c91014bfdf..97832b6369a6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -98,7 +98,7 @@ struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
 	uint32_t *regs;
-	uint32_t ggtt_offset;
+	uint32_t descriptor;
 
 	struct intel_engine_cs *ring;
 
@@ -243,6 +243,7 @@ struct  intel_engine_cs {
 	struct drm_i915_gem_request *execlist_port[2];
 	struct list_head execlist_queue;
 	struct list_head execlist_completed;
+	u32 execlist_ctx_descriptor;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct intel_ringbuffer *ringbuf,
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 53/70] drm/i915: Eliminate vmap overhead for cmd parser
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (51 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 52/70] drm/i915: Cache the execlist ctx descriptor Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 54/70] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
                   ` (6 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

With a little complexity to handle cmds straddling page boundaries, we
can completely avoiding having to vmap the batch and the shadow batch
objects whilst running the command parser.

On ivb i7-3720MQ:

x11perf -dot before 54.3M, after 53.2M (max 203M)
glxgears before 7110 fps, after 7300 fps (max 7860 fps)

Before:
Time to blt 16384 bytes x      1:	 12.400µs, 1.2GiB/s
Time to blt 16384 bytes x   4096:	  3.055µs, 5.0GiB/s

After:
Time to blt 16384 bytes x      1:	  8.600µs, 1.8GiB/s
Time to blt 16384 bytes x   4096:	  2.456µs, 6.2GiB/s

Removing the vmap is mostly a win, except we lose in a few cases where
the batch size is greater than a page due to the extra complexity (loss
of a simple cache efficient large copy, and boundary handling).

v2: Reorder so that we do check oacontrol remaining set at end-of-batch

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 299 ++++++++++++++++-----------------
 1 file changed, 148 insertions(+), 151 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 9605ff8f2fcd..4a80ab953715 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -818,100 +818,6 @@ static bool valid_reg(const u32 *table, int count, u32 addr)
 	return false;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj,
-		       unsigned start, unsigned len)
-{
-	int i;
-	void *addr = NULL;
-	struct sg_page_iter sg_iter;
-	int first_page = start >> PAGE_SHIFT;
-	int last_page = (len + start + 4095) >> PAGE_SHIFT;
-	int npages = last_page - first_page;
-	struct page **pages;
-
-	pages = drm_malloc_ab(npages, sizeof(*pages));
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-		goto finish;
-	}
-
-	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
-		pages[i++] = sg_page_iter_page(&sg_iter);
-		if (i == npages)
-			break;
-	}
-
-	addr = vmap(pages, i, 0, PAGE_KERNEL);
-	if (addr == NULL) {
-		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-		goto finish;
-	}
-
-finish:
-	if (pages)
-		drm_free_large(pages);
-	return (u32*)addr;
-}
-
-/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
-static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
-		       struct drm_i915_gem_object *src_obj,
-		       u32 batch_start_offset,
-		       u32 batch_len)
-{
-	int needs_clflush = 0;
-	void *src_base, *src;
-	void *dst = NULL;
-	int ret;
-
-	if (batch_len > dest_obj->base.size ||
-	    batch_len + batch_start_offset > src_obj->base.size)
-		return ERR_PTR(-E2BIG);
-
-	if (WARN_ON(dest_obj->pages_pin_count == 0))
-		return ERR_PTR(-ENODEV);
-
-	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
-		return ERR_PTR(ret);
-	}
-
-	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
-	if (!src_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		ret = -ENOMEM;
-		goto unpin_src;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
-		goto unmap_src;
-	}
-
-	dst = vmap_batch(dest_obj, 0, batch_len);
-	if (!dst) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		ret = -ENOMEM;
-		goto unmap_src;
-	}
-
-	src = src_base + offset_in_page(batch_start_offset);
-	if (needs_clflush)
-		drm_clflush_virt_range(src, batch_len);
-
-	memcpy(dst, src, batch_len);
-
-unmap_src:
-	vunmap(src_base);
-unpin_src:
-	i915_gem_object_unpin_pages(src_obj);
-
-	return ret ? ERR_PTR(ret) : dst;
-}
-
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -1046,16 +952,34 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	u32 *cmd, *batch_base, *batch_end;
+	u32 tmp[128];
+	struct sg_page_iter src_iter, dst_iter;
+	const struct drm_i915_cmd_descriptor *desc;
+	int needs_clflush = 0;
+	void *src, *dst;
+	unsigned in, out;
+	u32 *buf, partial = 0, length;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
-	batch_base = copy_batch(shadow_batch_obj, batch_obj,
-				batch_start_offset, batch_len);
-	if (IS_ERR(batch_base)) {
-		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		return PTR_ERR(batch_base);
+	if (batch_len > shadow_batch_obj->base.size ||
+	    batch_len + batch_start_offset > batch_obj->base.size)
+		return -E2BIG;
+
+	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
+		return -ENODEV;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
+		return ret;
+	}
+
+	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
+		goto unpin;
 	}
 
 	/*
@@ -1063,68 +987,141 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = batch_base + (batch_len / sizeof(*batch_end));
+	ret = -EINVAL;
+
+	__sg_page_iter_start(&dst_iter,
+			     shadow_batch_obj->pages->sgl,
+			     shadow_batch_obj->pages->nents,
+			     0);
+	__sg_page_iter_next(&dst_iter);
+	dst = kmap_atomic(sg_page_iter_page(&dst_iter));
+
+	in = offset_in_page(batch_start_offset);
+	out = 0;
+	for_each_sg_page(batch_obj->pages->sgl,
+			 &src_iter,
+			 batch_obj->pages->nents,
+			 batch_start_offset >> PAGE_SHIFT) {
+		u32 this, i, j, k;
+		u32 *cmd, *page_end, *batch_end;
+
+		this = batch_len;
+		if (this > PAGE_SIZE - in)
+			this = PAGE_SIZE - in;
+
+		src = kmap_atomic(sg_page_iter_page(&src_iter));
+		if (needs_clflush)
+			drm_clflush_virt_range(src + in, this);
+
+		i = this;
+		j = in;
+		do {
+			/* We keep dst around so that we do not blow
+			 * the CPU caches immediately after the copy (due
+			 * to the kunmap_atomic(dst) doing a TLB flush.
+			 */
+			if (out == PAGE_SIZE) {
+				__sg_page_iter_next(&dst_iter);
+				kunmap_atomic(dst);
+				dst = kmap_atomic(sg_page_iter_page(&dst_iter));
+				out = 0;
+			}
 
-	cmd = batch_base;
-	while (cmd < batch_end) {
-		const struct drm_i915_cmd_descriptor *desc;
-		u32 length;
+			k = i;
+			if (k > PAGE_SIZE - out)
+				k = PAGE_SIZE - out;
+			if (k == PAGE_SIZE)
+				copy_page(dst, src);
+			else
+				memcpy(dst + out, src + j, k);
+
+			out += k;
+			j += k;
+			i -= k;
+		} while (i);
+
+		cmd = src + in;
+		page_end = (void *)cmd + this;
+		batch_end = (void *)cmd + batch_len;
+
+		if (partial) {
+			memcpy(tmp + partial, cmd, (length - partial)*4);
+			cmd -= partial;
+			partial = 0;
+			buf = tmp;
+			goto check;
+		}
 
-		if (*cmd == MI_BATCH_BUFFER_END)
-			break;
+		do {
+			if (*cmd == MI_BATCH_BUFFER_END) {
+				if (oacontrol_set) {
+					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
+					ret = -EINVAL;
+				} else
+					ret = 0;
+				goto unmap;
+			}
 
-		desc = find_cmd(ring, *cmd, &default_desc);
-		if (!desc) {
-			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-					 *cmd);
-			ret = -EINVAL;
-			break;
-		}
+			desc = find_cmd(ring, *cmd, &default_desc);
+			if (!desc) {
+				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+						 *cmd);
+				goto unmap;
+			}
 
-		/*
-		 * If the batch buffer contains a chained batch, return an
-		 * error that tells the caller to abort and dispatch the
-		 * workload as a non-secure batch.
-		 */
-		if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-			ret = -EACCES;
-			break;
-		}
+			/*
+			 * If the batch buffer contains a chained batch, return an
+			 * error that tells the caller to abort and dispatch the
+			 * workload as a non-secure batch.
+			 */
+			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
+				ret = -EACCES;
+				goto unmap;
+			}
 
-		if (desc->flags & CMD_DESC_FIXED)
-			length = desc->length.fixed;
-		else
-			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
-		if ((batch_end - cmd) < length) {
-			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
-					 *cmd,
-					 length,
-					 batch_end - cmd);
-			ret = -EINVAL;
-			break;
-		}
+			if (desc->flags & CMD_DESC_FIXED)
+				length = desc->length.fixed;
+			else
+				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+			if (cmd + length > page_end) {
+				if (length + cmd > batch_end) {
+					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
+							 *cmd, length, batch_end - cmd);
+					goto unmap;
+				}
+
+				if (WARN_ON(length > sizeof(tmp)/4)) {
+					ret = -ENODEV;
+					goto unmap;
+				}
+
+				partial = page_end - cmd;
+				memcpy(tmp, cmd, partial*4);
+				break;
+			}
 
-		if (!check_cmd(ring, desc, cmd, is_master, &oacontrol_set)) {
-			ret = -EINVAL;
-			break;
-		}
+			buf = cmd;
+check:
+			if (!check_cmd(ring, desc, buf, is_master, &oacontrol_set))
+				goto unmap;
 
-		cmd += length;
-	}
+			cmd += length;
+		} while (cmd < page_end);
 
-	if (oacontrol_set) {
-		DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
-		ret = -EINVAL;
-	}
+		batch_len -= this;
+		if (batch_len == 0)
+			break;
 
-	if (cmd >= batch_end) {
-		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
-		ret = -EINVAL;
+		kunmap_atomic(src);
+		in = 0;
 	}
 
-	vunmap(batch_base);
-
+unmap:
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+unpin:
+	i915_gem_object_unpin_pages(batch_obj);
 	return ret;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 54/70] drm/i915: Cache last cmd descriptor when parsing
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (52 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 53/70] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 55/70] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
                   ` (5 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

The cmd parser has the biggest impact on the BLT ring, because it is
relatively verbose composed to the other engines as the vertex data is
inline. It also typically has runs of repeating commands (again since
the vertex data is inline, it typically has sequences of XY_SETUP_BLT,
XY_SCANLINE_BLT..) We can easily reduce the impact of cmdparsing on
benchmarks by then caching the last descriptor and comparing it against
the next command header. To get maximum benefit, we also want to take
advantage of skipping a few validations and length determinations if the
header is unchanged between commands.

ivb i7-3720QM:
x11perf -dot: before 52.3M, after 124M (max 203M)
glxgears: before 7310 fps, after 7550 fps (max 7860 fps)

v2: Fix initial cached cmd descriptor to match MI_NOOP.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 129 +++++++++++++++------------------
 drivers/gpu/drm/i915/i915_drv.h        |  10 +--
 2 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 4a80ab953715..f4d3e7dc3835 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -754,6 +754,9 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
 	fini_hash_table(ring);
 }
 
+/*
+ * Returns a pointer to a descriptor for the command specified by cmd_header.
+ */
 static const struct drm_i915_cmd_descriptor*
 find_cmd_in_table(struct intel_engine_cs *ring,
 		  u32 cmd_header)
@@ -773,37 +776,6 @@ find_cmd_in_table(struct intel_engine_cs *ring,
 	return NULL;
 }
 
-/*
- * Returns a pointer to a descriptor for the command specified by cmd_header.
- *
- * The caller must supply space for a default descriptor via the default_desc
- * parameter. If no descriptor for the specified command exists in the ring's
- * command parser tables, this function fills in default_desc based on the
- * ring's default length encoding and returns default_desc.
- */
-static const struct drm_i915_cmd_descriptor*
-find_cmd(struct intel_engine_cs *ring,
-	 u32 cmd_header,
-	 struct drm_i915_cmd_descriptor *default_desc)
-{
-	const struct drm_i915_cmd_descriptor *desc;
-	u32 mask;
-
-	desc = find_cmd_in_table(ring, cmd_header);
-	if (desc)
-		return desc;
-
-	mask = ring->get_cmd_length_mask(cmd_header);
-	if (!mask)
-		return NULL;
-
-	BUG_ON(!default_desc);
-	default_desc->flags = CMD_DESC_SKIP;
-	default_desc->length.mask = mask;
-
-	return default_desc;
-}
-
 static bool valid_reg(const u32 *table, int count, u32 addr)
 {
 	if (table && count != 0) {
@@ -844,17 +816,6 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 		      const bool is_master,
 		      bool *oacontrol_set)
 {
-	if (desc->flags & CMD_DESC_REJECT) {
-		DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
-		return false;
-	}
-
-	if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
-		DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
-				 *cmd);
-		return false;
-	}
-
 	if (desc->flags & CMD_DESC_REGISTER) {
 		u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
 
@@ -954,12 +915,13 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 {
 	u32 tmp[128];
 	struct sg_page_iter src_iter, dst_iter;
-	const struct drm_i915_cmd_descriptor *desc;
+	struct drm_i915_cmd_descriptor default_desc = { CMD_DESC_SKIP };
+	const struct drm_i915_cmd_descriptor *desc = &default_desc;
+	u32 last_cmd_header = 0;
 	int needs_clflush = 0;
 	void *src, *dst;
 	unsigned in, out;
-	u32 *buf, partial = 0, length;
-	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	u32 *buf, partial = 0, length = 1;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 	int ret = 0;
 
@@ -1053,37 +1015,60 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		}
 
 		do {
-			if (*cmd == MI_BATCH_BUFFER_END) {
-				if (oacontrol_set) {
-					DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
-					ret = -EINVAL;
-				} else
-					ret = 0;
-				goto unmap;
-			}
+			if (*cmd != last_cmd_header) {
+				if (*cmd == MI_BATCH_BUFFER_END) {
+					if (unlikely(oacontrol_set)) {
+						DRM_DEBUG_DRIVER("CMD: batch set OACONTROL but did not clear it\n");
+						ret = -EINVAL;
+					} else
+						ret = 0;
+					goto unmap;
+				}
 
-			desc = find_cmd(ring, *cmd, &default_desc);
-			if (!desc) {
-				DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
-						 *cmd);
-				goto unmap;
-			}
+				desc = find_cmd_in_table(ring, *cmd);
+				if (desc) {
+					if (unlikely(desc->flags & CMD_DESC_REJECT)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					if (unlikely((desc->flags & CMD_DESC_MASTER) && !is_master)) {
+						DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					/*
+					 * If the batch buffer contains a
+					 * chained batch, return an error that
+					 * tells the caller to abort and
+					 * dispatch the workload as a
+					 * non-secure batch.
+					 */
+					if (unlikely(desc->cmd.value == MI_BATCH_BUFFER_START)) {
+						ret = -EACCES;
+						goto unmap;
+					}
+
+					if (desc->flags & CMD_DESC_FIXED)
+						length = desc->length.fixed;
+					else
+						length = (*cmd & desc->length.mask) + LENGTH_BIAS;
+				} else {
+					u32 mask = ring->get_cmd_length_mask(*cmd);
+					if (unlikely(!mask)) {
+						DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n", *cmd);
+						goto unmap;
+					}
+
+					default_desc.length.mask = mask;
+					desc = &default_desc;
+
+					length = (*cmd & mask) + LENGTH_BIAS;
+				}
 
-			/*
-			 * If the batch buffer contains a chained batch, return an
-			 * error that tells the caller to abort and dispatch the
-			 * workload as a non-secure batch.
-			 */
-			if (desc->cmd.value == MI_BATCH_BUFFER_START) {
-				ret = -EACCES;
-				goto unmap;
+				last_cmd_header = *cmd;
 			}
 
-			if (desc->flags & CMD_DESC_FIXED)
-				length = desc->length.fixed;
-			else
-				length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
-
 			if (cmd + length > page_end) {
 				if (length + cmd > batch_end) {
 					DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%u batchlen=%td\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4b46c5b5eb44..c48909f6baa2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2205,12 +2205,12 @@ struct drm_i915_cmd_descriptor {
 	 *                  is the DRM master
 	 */
 	u32 flags;
+#define CMD_DESC_SKIP     (0)
 #define CMD_DESC_FIXED    (1<<0)
-#define CMD_DESC_SKIP     (1<<1)
-#define CMD_DESC_REJECT   (1<<2)
-#define CMD_DESC_REGISTER (1<<3)
-#define CMD_DESC_BITMASK  (1<<4)
-#define CMD_DESC_MASTER   (1<<5)
+#define CMD_DESC_REJECT   (1<<1)
+#define CMD_DESC_REGISTER (1<<2)
+#define CMD_DESC_BITMASK  (1<<3)
+#define CMD_DESC_MASTER   (1<<4)
 
 	/*
 	 * The command's unique identification bits and the bitmask to get them.
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 55/70] drm/i915: Use WC copies on !llc platforms for the command parser
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (53 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 54/70] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 56/70] drm/i915: Cache kmap between relocations Chris Wilson
                   ` (4 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Since we blow the TLB caches by using kmap/kunmap, we may as well go the
whole hog and see if declaring our destination page as WC is faster than
keeping it as WB and using clflush. It should be!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f4d3e7dc3835..61248223f95b 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -918,8 +918,9 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	struct drm_i915_cmd_descriptor default_desc = { CMD_DESC_SKIP };
 	const struct drm_i915_cmd_descriptor *desc = &default_desc;
 	u32 last_cmd_header = 0;
-	int needs_clflush = 0;
 	void *src, *dst;
+	int src_needs_clflush = 0;
+	bool dst_needs_clflush;
 	unsigned in, out;
 	u32 *buf, partial = 0, length = 1;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
@@ -932,13 +933,17 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	if (WARN_ON(shadow_batch_obj->pages_pin_count == 0))
 		return -ENODEV;
 
-	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &src_needs_clflush);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
 		return ret;
 	}
 
-	ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
+	dst_needs_clflush = !INTEL_INFO(shadow_batch_obj->base.dev)->has_llc;
+	if (dst_needs_clflush)
+		ret = i915_gem_object_set_to_gtt_domain(shadow_batch_obj, true);
+	else
+		ret = i915_gem_object_set_to_cpu_domain(shadow_batch_obj, true);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
 		goto unpin;
@@ -972,7 +977,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			this = PAGE_SIZE - in;
 
 		src = kmap_atomic(sg_page_iter_page(&src_iter));
-		if (needs_clflush)
+		if (src_needs_clflush)
 			drm_clflush_virt_range(src + in, this);
 
 		i = this;
@@ -984,6 +989,8 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 			 */
 			if (out == PAGE_SIZE) {
 				__sg_page_iter_next(&dst_iter);
+				if (dst_needs_clflush)
+					drm_clflush_virt_range(dst, PAGE_SIZE);
 				kunmap_atomic(dst);
 				dst = kmap_atomic(sg_page_iter_page(&dst_iter));
 				out = 0;
@@ -1104,6 +1111,8 @@ check:
 
 unmap:
 	kunmap_atomic(src);
+	if (dst_needs_clflush)
+		drm_clflush_virt_range(dst, PAGE_SIZE);
 	kunmap_atomic(dst);
 unpin:
 	i915_gem_object_unpin_pages(batch_obj);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 56/70] drm/i915: Cache kmap between relocations
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (54 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 55/70] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 57/70] drm/i915: intel_ring_initialized() must be simple and inline Chris Wilson
                   ` (3 subsequent siblings)
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.

Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 145 +++++++++++++++++++----------
 1 file changed, 96 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 16fd922afb72..9afd2dcba43b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -248,9 +248,48 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 		obj->cache_level != I915_CACHE_NONE);
 }
 
+struct reloc_entry_cache {
+	void *vaddr;
+	unsigned page;
+	enum { KMAP, IOMAP } type;
+};
+
+static void reloc_entry_cache_init(struct reloc_entry_cache *cache)
+{
+	cache->page = -1;
+	cache->vaddr = NULL;
+}
+
+static void reloc_entry_cache_fini(struct reloc_entry_cache *cache)
+{
+	if (cache->vaddr == NULL)
+		return;
+
+	switch (cache->type) {
+	case KMAP: kunmap_atomic(cache->vaddr); break;
+	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
+	}
+}
+
+static void *reloc_kmap(struct drm_i915_gem_object *obj,
+			struct reloc_entry_cache *cache,
+			int page)
+{
+	if (cache->page != page) {
+		if (cache->vaddr)
+			kunmap_atomic(cache->vaddr);
+		cache->page = page;
+		cache->vaddr = kmap_atomic(i915_gem_object_get_page(obj, page));
+		cache->type = KMAP;
+	}
+
+	return cache->vaddr;
+}
+
 static int
 relocate_entry_cpu(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_entry_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -263,30 +302,41 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
 
 	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
+static void *reloc_iomap(struct drm_i915_private *i915,
+			 struct reloc_entry_cache *cache,
+			 uint64_t offset)
+{
+	if (cache->page != offset >> PAGE_SHIFT) {
+		if (cache->vaddr)
+			io_mapping_unmap_atomic(cache->vaddr);
+		cache->page = offset >> PAGE_SHIFT;
+		cache->vaddr =
+			io_mapping_map_atomic_wc(i915->gtt.mappable,
+						 offset & PAGE_MASK);
+		cache->type = IOMAP;
+	}
+
+	return cache->vaddr;
+}
 static int
 relocate_entry_gtt(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_entry_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -307,26 +357,17 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 	/* Map the page containing the relocation we're going to perform.  */
 	offset = i915_gem_obj_ggtt_offset(obj);
 	offset += reloc->offset;
-	reloc_page = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-					      offset & PAGE_MASK);
+	reloc_page = reloc_iomap(dev_priv, cache, offset);
 	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
 
 	if (INTEL_INFO(dev)->gen >= 8) {
 		offset += sizeof(uint32_t);
-
-		if (offset_in_page(offset) == 0) {
-			io_mapping_unmap_atomic(reloc_page);
-			reloc_page =
-				io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-							 offset);
-		}
-
+		if (offset_in_page(offset) == 0)
+			reloc_page = reloc_iomap(dev_priv, cache, offset);
 		iowrite32(upper_32_bits(delta),
 			  reloc_page + offset_in_page(offset));
 	}
 
-	io_mapping_unmap_atomic(reloc_page);
-
 	return 0;
 }
 
@@ -342,6 +383,7 @@ clflush_write32(void *addr, uint32_t value)
 static int
 relocate_entry_clflush(struct drm_i915_gem_object *obj,
 		       struct drm_i915_gem_relocation_entry *reloc,
+		       struct reloc_entry_cache *cache,
 		       uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -354,31 +396,26 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
 
 	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
-				   struct drm_i915_gem_relocation_entry *reloc)
+				   struct drm_i915_gem_relocation_entry *reloc,
+				   struct reloc_entry_cache *cache)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
@@ -463,11 +500,11 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EFAULT;
 
 	if (use_cpu_reloc(obj))
-		ret = relocate_entry_cpu(obj, reloc, target_offset);
+		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
 	else if (obj->map_and_fenceable)
-		ret = relocate_entry_gtt(obj, reloc, target_offset);
+		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
 	else if (cpu_has_clflush)
-		ret = relocate_entry_clflush(obj, reloc, target_offset);
+		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
 	else {
 		WARN_ONCE(1, "Impossible case in relocation handling\n");
 		ret = -ENODEV;
@@ -490,9 +527,11 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret;
+	struct reloc_entry_cache cache;
+	int remain, ret = 0;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
+	reloc_entry_cache_init(&cache);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -502,21 +541,24 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 			count = ARRAY_SIZE(stack_reloc);
 		remain -= count;
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0])))
-			return -EFAULT;
+		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+			ret = -EFAULT;
+			goto out;
+		}
 
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
 			if (ret)
-				return ret;
+				goto out;
 
 			if (r->presumed_offset != offset &&
 			    __copy_to_user_inatomic(&user_relocs->presumed_offset,
 						    &r->presumed_offset,
 						    sizeof(r->presumed_offset))) {
-				return -EFAULT;
+				ret = -EFAULT;
+				goto out;
 			}
 
 			user_relocs++;
@@ -524,7 +566,9 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 		} while (--count);
 	}
 
-	return 0;
+out:
+	reloc_entry_cache_fini(&cache);
+	return ret;
 #undef N_RELOC
 }
 
@@ -534,15 +578,18 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
 				      struct drm_i915_gem_relocation_entry *relocs)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret;
+	struct reloc_entry_cache cache;
+	int i, ret = 0;
 
+	reloc_entry_cache_init(&cache);
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i]);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
 		if (ret)
-			return ret;
+			break;
 	}
+	reloc_entry_cache_fini(&cache);
 
-	return 0;
+	return ret;
 }
 
 static int
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 57/70] drm/i915: intel_ring_initialized() must be simple and inline
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (55 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 56/70] drm/i915: Cache kmap between relocations Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-12-08 15:02   ` [PATCH 0/1] " Dave Gordon
  2015-04-07 15:21 ` [PATCH 58/70] drm/i915: Before shrink_all we only need to idle the GPU Chris Wilson
                   ` (2 subsequent siblings)
  59 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

Fixes regression from
commit 48d823878d64f93163f5a949623346748bbce1b4
Author: Oscar Mateo <oscar.mateo@intel.com>
Date:   Thu Jul 24 17:04:23 2014 +0100

    drm/i915/bdw: Generic logical ring init and cleanup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 18 ++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 35 +++++++++++++++------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 +++++-
 3 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3fe63bf604b4..db93eed9eacd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1241,12 +1241,28 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
-		return ret;
+		goto error;
 
 	ring->execlist_ctx_descriptor = base_ctx_descriptor(ring);
 
 	ret = intel_lr_context_deferred_create(ring->default_context, ring);
+	if (ret)
+		goto error;
+
+	return 0;
+error:
+	if (ring->cleanup)
+		ring->cleanup(ring);
+
+	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
+
+	if (ring->status_page.obj) {
+		kunmap(sg_page(ring->status_page.obj->pages->sgl));
+		ring->status_page.obj = NULL;
+	}
 
+	ring->dev = NULL;
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0b68ac5a7298..913efe47054d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,23 +33,6 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-bool
-intel_ring_initialized(struct intel_engine_cs *ring)
-{
-	struct drm_device *dev = ring->dev;
-
-	if (!dev)
-		return false;
-
-	if (i915.enable_execlists) {
-		struct intel_context *dctx = ring->default_context;
-		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
-
-		return ringbuf->obj;
-	} else
-		return ring->buffer && ring->buffer->obj;
-}
-
 int __intel_ring_space(int head, int tail, int size)
 {
 	int space = head - tail;
@@ -1992,8 +1975,10 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	WARN_ON(ring->buffer);
 
 	ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
-	if (!ringbuf)
-		return -ENOMEM;
+	if (!ringbuf) {
+		ret = -ENOMEM;
+		goto error;
+	}
 	ring->buffer = ringbuf;
 
 	ring->dev = dev;
@@ -2050,8 +2035,18 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	return 0;
 
 error:
+	if (ring->cleanup)
+		ring->cleanup(ring);
+
+	cleanup_status_page(ring);
+
+	i915_cmd_parser_fini_ring(ring);
+	i915_gem_batch_pool_fini(&ring->batch_pool);
+
 	kfree(ringbuf);
 	ring->buffer = NULL;
+
+	ring->dev = NULL;
 	return ret;
 }
 
@@ -2083,6 +2078,8 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 
 	kfree(ringbuf);
 	ring->buffer = NULL;
+
+	ring->dev = NULL;
 }
 
 static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 97832b6369a6..75268b7d2d41 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -327,7 +327,11 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
-bool intel_ring_initialized(struct intel_engine_cs *ring);
+static inline bool
+intel_ring_initialized(struct intel_engine_cs *ring)
+{
+	return ring->dev != NULL;
+}
 
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 58/70] drm/i915: Before shrink_all we only need to idle the GPU
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (56 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 57/70] drm/i915: intel_ring_initialized() must be simple and inline Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 15:21 ` [PATCH 59/70] drm/i915: Simplify object is-pinned checking for shrinker Chris Wilson
  2015-04-07 16:28 ` Chris Wilson
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

We can forgo an evict-everything here in favour of the simpler
i915_gpu_idle() as the shrinker operation itself will unbind any vma as
required. The simplicity allows us to ignore any errors whilst idling
and still force the request retirement to run.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 87bfced67998..3b44ed54cf46 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -158,7 +158,13 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
  */
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
-	i915_gem_evict_everything(dev_priv->dev);
+	int ignore;
+
+	ignore = i915_gpu_idle(dev_priv->dev);
+	(void)ignore;
+
+	i915_gem_retire_requests(dev_priv->dev);
+
 	return i915_gem_shrink(dev_priv, LONG_MAX,
 			       I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 59/70] drm/i915: Simplify object is-pinned checking for shrinker
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (57 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 58/70] drm/i915: Before shrink_all we only need to idle the GPU Chris Wilson
@ 2015-04-07 15:21 ` Chris Wilson
  2015-04-07 16:28 ` Chris Wilson
  59 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 15:21 UTC (permalink / raw)
  To: intel-gfx

When looking for viable candidates to shrink, we only want objects that
are not pinned. However to do so we performed a double iteration over
the vma in the objects, first looking for the pin-count, then looking
for allocations. We can do both at once and be slightly more explicit in
our validity test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 3b44ed54cf46..d64c54b329b2 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -190,9 +190,12 @@ static int num_vma_bound(struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int count = 0;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (drm_mm_node_allocated(&vma->node))
 			count++;
+		if (vma->pin_count)
+			count++;
+	}
 
 	return count;
 }
@@ -218,8 +221,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 			count += obj->base.size >> PAGE_SHIFT;
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (!i915_gem_obj_is_pinned(obj) &&
-		    obj->pages_pin_count == num_vma_bound(obj))
+		if (obj->pages_pin_count == num_vma_bound(obj))
 			count += obj->base.size >> PAGE_SHIFT;
 	}
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 59/70] drm/i915: Simplify object is-pinned checking for shrinker
  2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
                   ` (58 preceding siblings ...)
  2015-04-07 15:21 ` [PATCH 59/70] drm/i915: Simplify object is-pinned checking for shrinker Chris Wilson
@ 2015-04-07 16:28 ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 60/70] drm/i915: Make evict-everything more robust Chris Wilson
                     ` (10 more replies)
  59 siblings, 11 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

When looking for viable candidates to shrink, we only want objects that
are not pinned. However to do so we performed a double iteration over
the vma in the objects, first looking for the pin-count, then looking
for allocations. We can do both at once and be slightly more explicit in
our validity test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 3b44ed54cf46..d64c54b329b2 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -190,9 +190,12 @@ static int num_vma_bound(struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int count = 0;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, vma_link) {
 		if (drm_mm_node_allocated(&vma->node))
 			count++;
+		if (vma->pin_count)
+			count++;
+	}
 
 	return count;
 }
@@ -218,8 +221,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 			count += obj->base.size >> PAGE_SHIFT;
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (!i915_gem_obj_is_pinned(obj) &&
-		    obj->pages_pin_count == num_vma_bound(obj))
+		if (obj->pages_pin_count == num_vma_bound(obj))
 			count += obj->base.size >> PAGE_SHIFT;
 	}
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 60/70] drm/i915: Make evict-everything more robust
  2015-04-07 16:28 ` Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

Since we are operating at the global level, we can simply iterate over
the bound list using the robust method developed for the shrinker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      | 54 +++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 3 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c48909f6baa2..97372869097f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2994,7 +2994,7 @@ int __must_check i915_gem_evict_something(struct drm_device *dev,
 int __must_check
 i915_gem_evict_range(struct drm_device *dev, struct i915_address_space *vm,
 		     unsigned long start, unsigned long end);
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
+int i915_gem_evict_vm(struct i915_address_space *vm);
 int i915_gem_evict_everything(struct drm_device *dev);
 
 /* belongs in i915_gem_gtt.h */
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 9754740edecd..cf33f982da8e 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -254,7 +254,6 @@ i915_gem_evict_range(struct drm_device *dev, struct i915_address_space *vm,
 /**
  * i915_gem_evict_vm - Evict all idle vmas from a vm
  * @vm: Address space to cleanse
- * @do_idle: Boolean directing whether to idle first.
  *
  * This function evicts all idles vmas from a vm. If all unpinned vmas should be
  * evicted the @do_idle needs to be set to true.
@@ -265,7 +264,7 @@ i915_gem_evict_range(struct drm_device *dev, struct i915_address_space *vm,
  * To clarify: This is for freeing up virtual address space, not for freeing
  * memory in e.g. the shrinker.
  */
-int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
+int i915_gem_evict_vm(struct i915_address_space *vm)
 {
 	struct i915_vma *vma, *next;
 	int ret;
@@ -273,16 +272,14 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
-	if (do_idle) {
-		ret = i915_gpu_idle(vm->dev);
-		if (ret)
-			return ret;
-
-		i915_gem_retire_requests(vm->dev);
+	ret = i915_gpu_idle(vm->dev);
+	if (ret)
+		return ret;
 
-		WARN_ON(!list_empty(&vm->active_list));
-	}
+	i915_gem_retire_requests(vm->dev);
+	WARN_ON(!list_empty(&vm->active_list));
 
+	/* Having flushed everything, unbind() should never raise an error */
 	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
 		if (vma->pin_count == 0)
 			WARN_ON(i915_vma_unbind(vma));
@@ -297,23 +294,19 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
  * This functions tries to evict all gem objects from all address spaces. Used
  * by the shrinker as a last-ditch effort and for suspend, before releasing the
  * backing storage of all unbound objects.
+ *
+ * This is similar to i915_gem_shrink_all() with the important exception that
+ * we keep a reference to the obj->pages after unbinding (so we can avoid
+ * any expensive migration between the CPU and GPU).
  */
 int
 i915_gem_evict_everything(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_address_space *vm, *v;
-	bool lists_empty = true;
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct list_head still_in_list;
 	int ret;
 
-	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-		lists_empty = (list_empty(&vm->inactive_list) &&
-			       list_empty(&vm->active_list));
-		if (!lists_empty)
-			lists_empty = false;
-	}
-
-	if (lists_empty)
+	if (list_empty(&dev_priv->mm.bound_list))
 		return -ENOSPC;
 
 	trace_i915_gem_evict_everything(dev);
@@ -328,9 +321,22 @@ i915_gem_evict_everything(struct drm_device *dev)
 
 	i915_gem_retire_requests(dev);
 
-	/* Having flushed everything, unbind() should never raise an error */
-	list_for_each_entry_safe(vm, v, &dev_priv->vm_list, global_link)
-		WARN_ON(i915_gem_evict_vm(vm, false));
+	INIT_LIST_HEAD(&still_in_list);
+	while (!list_empty(&dev_priv->mm.bound_list)) {
+		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma, *v;
+
+		obj = list_first_entry(&dev_priv->mm.bound_list,
+				       typeof(*obj), global_list);
+		list_move_tail(&obj->global_list, &still_in_list);
+
+		drm_gem_object_reference(&obj->base);
+		list_for_each_entry_safe(vma, v, &obj->vma_list, vma_link)
+			if (WARN_ON(i915_vma_unbind(vma)))
+				break;
+		drm_gem_object_unreference(&obj->base);
+	}
+	list_splice(&still_in_list, &dev_priv->mm.bound_list);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 9afd2dcba43b..bd48393fb91f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -820,7 +820,7 @@ err:
 		list_for_each_entry(vma, vmas, exec_list)
 			i915_gem_execbuffer_unreserve_vma(vma);
 
-		ret = i915_gem_evict_vm(vm, true);
+		ret = i915_gem_evict_vm(vm);
 		if (ret)
 			return ret;
 	} while (1);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock
  2015-04-07 16:28 ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 60/70] drm/i915: Make evict-everything more robust Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-14 14:52     ` Tvrtko Ursulin
  2015-04-07 16:28   ` [PATCH 62/70] drm/i915: Reduce locking inside busy ioctl Chris Wilson
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.

Also double check that the object is still pinned to the display plane
before processing the state change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 +-
 drivers/gpu/drm/i915/i915_gem.c          |  2 +-
 drivers/gpu/drm/i915/intel_frontbuffer.c | 40 +++++++++++++++++---------------
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 97372869097f..eeffefa10612 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1545,7 +1545,7 @@ struct intel_pipe_crc {
 };
 
 struct i915_frontbuffer_tracking {
-	struct mutex lock;
+	spinlock_t lock;
 
 	/*
 	 * Tracking bits for delayed frontbuffer flushing du to gpu activity or
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e9f2d2b102de..43baac2c1e20 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5260,7 +5260,7 @@ i915_gem_load(struct drm_device *dev)
 
 	i915_gem_shrinker_init(dev_priv);
 
-	mutex_init(&dev_priv->fb_tracking.lock);
+	spin_lock_init(&dev_priv->fb_tracking.lock);
 }
 
 void i915_gem_release(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
index a20cffb78c0f..28ce2ab94189 100644
--- a/drivers/gpu/drm/i915/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
@@ -139,16 +139,14 @@ void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	if (!obj->frontbuffer_bits)
+	if (!obj->frontbuffer_bits || !obj->pin_display)
 		return;
 
 	if (ring) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
-		dev_priv->fb_tracking.busy_bits
-			|= obj->frontbuffer_bits;
-		dev_priv->fb_tracking.flip_bits
-			&= ~obj->frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
+		spin_lock(&dev_priv->fb_tracking.lock);
+		dev_priv->fb_tracking.busy_bits |= obj->frontbuffer_bits;
+		dev_priv->fb_tracking.flip_bits &= ~obj->frontbuffer_bits;
+		spin_unlock(&dev_priv->fb_tracking.lock);
 	}
 
 	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
@@ -175,9 +173,12 @@ void intel_frontbuffer_flush(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	/* Delay flushing when rings are still busy.*/
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
+
+	if (frontbuffer_bits == 0)
+		return;
 
 	intel_mark_fb_busy(dev, frontbuffer_bits, NULL);
 
@@ -204,21 +205,21 @@ void intel_fb_obj_flush(struct drm_i915_gem_object *obj,
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	if (!obj->frontbuffer_bits)
+	if (!obj->frontbuffer_bits || !obj->pin_display)
 		return;
 
 	frontbuffer_bits = obj->frontbuffer_bits;
 
 	if (retire) {
-		mutex_lock(&dev_priv->fb_tracking.lock);
+		spin_lock(&dev_priv->fb_tracking.lock);
 		/* Filter out new bits since rendering started. */
 		frontbuffer_bits &= dev_priv->fb_tracking.busy_bits;
-
 		dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-		mutex_unlock(&dev_priv->fb_tracking.lock);
+		spin_unlock(&dev_priv->fb_tracking.lock);
 	}
 
-	intel_frontbuffer_flush(dev, frontbuffer_bits);
+	if (frontbuffer_bits)
+		intel_frontbuffer_flush(dev, frontbuffer_bits);
 }
 
 /**
@@ -238,11 +239,11 @@ void intel_frontbuffer_flip_prepare(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	dev_priv->fb_tracking.flip_bits |= frontbuffer_bits;
 	/* Remove stale busy bits due to the old buffer. */
 	dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 }
 
 /**
@@ -260,11 +261,12 @@ void intel_frontbuffer_flip_complete(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	mutex_lock(&dev_priv->fb_tracking.lock);
+	spin_lock(&dev_priv->fb_tracking.lock);
 	/* Mask any cancelled flips. */
 	frontbuffer_bits &= dev_priv->fb_tracking.flip_bits;
 	dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits;
-	mutex_unlock(&dev_priv->fb_tracking.lock);
+	spin_unlock(&dev_priv->fb_tracking.lock);
 
-	intel_frontbuffer_flush(dev, frontbuffer_bits);
+	if (frontbuffer_bits)
+		intel_frontbuffer_flush(dev, frontbuffer_bits);
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 62/70] drm/i915: Reduce locking inside busy ioctl
  2015-04-07 16:28 ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 60/70] drm/i915: Make evict-everything more robust Chris Wilson
  2015-04-07 16:28   ` [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

We can do the trivial check for an inactive buffer without acquiring the
struct_mutex reducing contention. If we are required to flush the object
and check for retirements, then we do indeed have to resort to taking
the struct_mutex

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 49 ++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43baac2c1e20..f9ea8c932a6a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4522,34 +4522,37 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_object *obj;
 	int ret;
 
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
-	if (&obj->base == NULL) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (&obj->base == NULL)
+		return -ENOENT;
 
-	/* Count all active objects as busy, even if they are currently not used
-	 * by the gpu. Users of this interface expect objects to eventually
-	 * become non-busy without any further actions, therefore emit any
-	 * necessary flushes here.
-	 */
-	ret = i915_gem_object_flush_active(obj);
-	if (ret)
-		goto unref;
+	if (obj->active) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			goto unref;
 
-	BUILD_BUG_ON(I915_NUM_RINGS > 16);
-	args->busy = obj->active << 16;
-	if (obj->last_write_req)
-		args->busy |= obj->last_write_req->ring->id;
+		/* Count all active objects as busy, even if they are
+		 * currently not used by the gpu. Users of this interface
+		 * expect objects to eventually become non-busy without any
+		 * further actions, therefore emit any necessary flushes here.
+		 */
+		ret = i915_gem_object_flush_active(obj);
+		if (ret == 0) {
+			BUILD_BUG_ON(I915_NUM_RINGS > 16);
+			args->busy = obj->active << 16;
+			if (obj->last_write_req)
+				args->busy |= obj->last_write_req->ring->id;
+		}
 
+		drm_gem_object_unreference(&obj->base);
+		mutex_unlock(&dev->struct_mutex);
+	} else {
+		ret = 0;
+		args->busy = 0;
 unref:
-	drm_gem_object_unreference(&obj->base);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
+		drm_gem_object_unreference_unlocked(&obj->base);
+	}
+
 	return ret;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl
  2015-04-07 16:28 ` Chris Wilson
                     ` (2 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 62/70] drm/i915: Reduce locking inside busy ioctl Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-10  9:14     ` Daniel Vetter
  2015-04-07 16:28   ` [PATCH 64/70] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f9ea8c932a6a..3e3d8ed3b97d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1606,25 +1606,28 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
 {
 	struct drm_i915_gem_sw_finish *args = data;
 	struct drm_i915_gem_object *obj;
-	int ret = 0;
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
+	int ret;
 
 	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
-	if (&obj->base == NULL) {
-		ret = -ENOENT;
-		goto unlock;
-	}
+	if (&obj->base == NULL)
+		return -ENOENT;
 
 	/* Pinned buffers may be scanout, so flush the cache */
-	if (obj->pin_display)
+	if (obj->pin_display) {
+		ret = i915_mutex_lock_interruptible(dev);
+		if (ret)
+			goto unref;
+
 		i915_gem_object_flush_cpu_write_domain(obj);
 
-	drm_gem_object_unreference(&obj->base);
-unlock:
-	mutex_unlock(&dev->struct_mutex);
+		drm_gem_object_unreference(&obj->base);
+		mutex_unlock(&dev->struct_mutex);
+	} else {
+		ret = 0;
+unref:
+		drm_gem_object_unreference_unlocked(&obj->base);
+	}
+
 	return ret;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 64/70] drm/i915: Remove pinned check from madvise ioctl
  2015-04-07 16:28 ` Chris Wilson
                     ` (3 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 65/70] drm/i915: Reduce locking for gen6+ GT interrupts Chris Wilson
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3e3d8ed3b97d..bd60bb552920 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4593,11 +4593,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 		goto unlock;
 	}
 
-	if (i915_gem_obj_is_pinned(obj)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	if (obj->pages &&
 	    obj->tiling_mode != I915_TILING_NONE &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -4616,7 +4611,6 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 
 	args->retained = obj->madv != __I915_MADV_PURGED;
 
-out:
 	drm_gem_object_unreference(&obj->base);
 unlock:
 	mutex_unlock(&dev->struct_mutex);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 65/70] drm/i915: Reduce locking for gen6+ GT interrupts
  2015-04-07 16:28 ` Chris Wilson
                     ` (4 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 64/70] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 66/70] drm/i915: Remove obj->pin_mappable Chris Wilson
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 47c9c02e6731..eecbbab921d9 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2090,9 +2090,8 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 	intel_uncore_check_errors(dev);
 
 	/* disable master interrupt before clearing iir  */
-	de_ier = I915_READ(DEIER);
-	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
-	POSTING_READ(DEIER);
+	de_ier = I915_READ_FW(DEIER);
+	I915_WRITE_FW(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
 
 	/* Disable south interrupts. We'll only write to SDEIIR once, so further
 	 * interrupts will will be stored on its back queue, and then we'll be
@@ -2100,16 +2099,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 	 * it, we'll get an interrupt if SDEIIR still has something to process
 	 * due to its back queue). */
 	if (!HAS_PCH_NOP(dev)) {
-		sde_ier = I915_READ(SDEIER);
-		I915_WRITE(SDEIER, 0);
-		POSTING_READ(SDEIER);
+		sde_ier = I915_READ_FW(SDEIER);
+		I915_WRITE_FW(SDEIER, 0);
 	}
 
 	/* Find, clear, then process each source of interrupt */
 
-	gt_iir = I915_READ(GTIIR);
+	gt_iir = I915_READ_FW(GTIIR);
 	if (gt_iir) {
-		I915_WRITE(GTIIR, gt_iir);
+		I915_WRITE_FW(GTIIR, gt_iir);
 		ret = IRQ_HANDLED;
 		if (INTEL_INFO(dev)->gen >= 6)
 			snb_gt_irq_handler(dev, dev_priv, gt_iir);
@@ -2136,12 +2134,10 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 		}
 	}
 
-	I915_WRITE(DEIER, de_ier);
-	POSTING_READ(DEIER);
-	if (!HAS_PCH_NOP(dev)) {
-		I915_WRITE(SDEIER, sde_ier);
-		POSTING_READ(SDEIER);
-	}
+	I915_WRITE_FW(DEIER, de_ier);
+	if (!HAS_PCH_NOP(dev))
+		I915_WRITE_FW(SDEIER, sde_ier);
+	POSTING_READ_FW(DEIER);
 
 	return ret;
 }
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 66/70] drm/i915: Remove obj->pin_mappable
  2015-04-07 16:28 ` Chris Wilson
                     ` (5 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 65/70] drm/i915: Reduce locking for gen6+ GT interrupts Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-13 11:35     ` Tvrtko Ursulin
  2015-04-07 16:28   ` [PATCH 67/70] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 6 +++---
 drivers/gpu/drm/i915/i915_drv.h     | 1 -
 drivers/gpu/drm/i915/i915_gem.c     | 6 +-----
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2e851c6a310c..6508eec3cf60 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -166,9 +166,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 	}
 	if (obj->stolen)
 		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
-	if (obj->pin_mappable || obj->fault_mappable) {
+	if (obj->pin_display || obj->fault_mappable) {
 		char s[3], *t = s;
-		if (obj->pin_mappable)
+		if (obj->pin_display)
 			*t++ = 'p';
 		if (obj->fault_mappable)
 			*t++ = 'f';
@@ -464,7 +464,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 			size += i915_gem_obj_ggtt_size(obj);
 			++count;
 		}
-		if (obj->pin_mappable) {
+		if (obj->pin_display) {
 			mappable_size += i915_gem_obj_ggtt_size(obj);
 			++mappable_count;
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eeffefa10612..2c72ee0214b5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1980,7 +1980,6 @@ struct drm_i915_gem_object {
 	 * accurate mappable working set.
 	 */
 	unsigned int fault_mappable:1;
-	unsigned int pin_mappable:1;
 	unsigned int pin_display:1;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bd60bb552920..3d4463930267 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4445,9 +4445,6 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
 	vma->pin_count++;
-	if (flags & PIN_MAPPABLE)
-		obj->pin_mappable |= true;
-
 	return 0;
 }
 
@@ -4487,8 +4484,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 	WARN_ON(vma->pin_count == 0);
 	WARN_ON(!i915_gem_obj_ggtt_bound_view(obj, view));
 
-	if (--vma->pin_count == 0 && view->type == I915_GGTT_VIEW_NORMAL)
-		obj->pin_mappable = false;
+	--vma->pin_count;
 }
 
 bool
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 67/70] drm/i915: Start passing around i915_vma from execbuffer
  2015-04-07 16:28 ` Chris Wilson
                     ` (6 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 66/70] drm/i915: Remove obj->pin_mappable Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 68/70] drm/i915: Simplify vma-walker for i915_gem_objects Chris Wilson
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

During execbuffer we look up the i915_vma in order to reserver them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |  13 +++
 drivers/gpu/drm/i915/i915_gem.c            | 170 ++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 114 ++++++++++---------
 3 files changed, 154 insertions(+), 143 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c72ee0214b5..ba593ee78863 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2637,6 +2637,19 @@ void i915_gem_close_object(struct drm_gem_object *obj,
 void i915_gem_free_object(struct drm_gem_object *obj);
 void i915_gem_vma_destroy(struct i915_vma *vma);
 
+int __must_check
+i915_vma_pin(struct i915_vma *vma,
+	     uint32_t size,
+	     uint32_t alignment,
+	     uint64_t flags);
+
+static inline void i915_vma_unpin(struct i915_vma *vma)
+{
+	WARN_ON(vma->pin_count == 0);
+	WARN_ON(!drm_mm_node_allocated(&vma->node));
+	vma->pin_count--;
+}
+
 #define PIN_MAPPABLE 0x1
 #define PIN_NONBLOCK 0x2
 #define PIN_GLOBAL 0x4
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3d4463930267..7b27236f2c29 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3644,10 +3644,9 @@ static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
 /**
  * Finds free space in the GTT aperture and binds the object there.
  */
-static struct i915_vma *
+static int
 i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
-			   struct i915_address_space *vm,
-			   const struct i915_ggtt_view *ggtt_view,
+			   struct i915_vma *vma,
 			   uint32_t size,
 			   unsigned alignment,
 			   uint64_t flags)
@@ -3658,13 +3657,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	unsigned long start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	unsigned long end =
-		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
-	struct i915_vma *vma;
+		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vma->vm->total;
 	int ret;
 
-	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
-		return ERR_PTR(-EINVAL);
-
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
 					   obj->tiling_mode);
@@ -3681,7 +3676,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 						unfenced_alignment;
 	if (flags & PIN_MAPPABLE && alignment & (fence_alignment - 1)) {
 		DRM_DEBUG("Invalid object alignment requested %u\n", alignment);
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 	}
 
 	size = max_t(u32, size, obj->base.size);
@@ -3696,57 +3691,51 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 			  size, obj->base.size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
 			  end);
-		return ERR_PTR(-E2BIG);
+		return -E2BIG;
 	}
 
 	ret = i915_gem_object_get_pages(obj);
 	if (ret)
-		return ERR_PTR(ret);
+		return ret;
 
 	i915_gem_object_pin_pages(obj);
 
-	vma = ggtt_view ? i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
-			  i915_gem_obj_lookup_or_create_vma(obj, vm);
-
-	if (IS_ERR(vma))
-		goto err_unpin;
-
 	if (flags & PIN_OFFSET_FIXED) {
 		uint64_t offset = flags & PIN_OFFSET_MASK;
 		if (offset & (alignment - 1)) {
-			vma = ERR_PTR(-EINVAL);
-			goto err_free_vma;
+			ret = -EINVAL;
+			goto err_unpin;
 		}
 		vma->node.start = offset;
 		vma->node.size = size;
 		vma->node.color = obj->cache_level;
-		ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
-			ret = i915_gem_evict_range(dev, vm, start, end);
+			ret = i915_gem_evict_range(dev, vma->vm, start, end);
 			if (ret == 0)
-				ret = drm_mm_reserve_node(&vm->mm, &vma->node);
-		}
-		if (ret) {
-			vma = ERR_PTR(ret);
-			goto err_free_vma;
+				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
+			if (ret)
+				goto err_unpin;
 		}
 	} else {
 search_free:
-		ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
+		ret = drm_mm_insert_node_in_range_generic(&vma->vm->mm,
+							  &vma->node,
 							  size, alignment,
 							  obj->cache_level,
 							  start, end,
 							  DRM_MM_SEARCH_DEFAULT,
 							  DRM_MM_CREATE_DEFAULT);
 		if (ret) {
-			ret = i915_gem_evict_something(dev, vm, size, alignment,
+			ret = i915_gem_evict_something(dev, vma->vm,
+						       size, alignment,
 						       obj->cache_level,
 						       start, end,
 						       flags);
 			if (ret == 0)
 				goto search_free;
 
-			goto err_free_vma;
+			goto err_unpin;
 		}
 	}
 	if (WARN_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level))) {
@@ -3775,20 +3764,17 @@ search_free:
 		goto err_finish_gtt;
 
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
-	list_add_tail(&vma->mm_list, &vm->inactive_list);
+	list_add_tail(&vma->mm_list, &vma->vm->inactive_list);
 
-	return vma;
+	return 0;
 
 err_finish_gtt:
 	i915_gem_gtt_finish_object(obj);
 err_remove_node:
 	drm_mm_remove_node(&vma->node);
-err_free_vma:
-	i915_gem_vma_destroy(vma);
-	vma = ERR_PTR(ret);
 err_unpin:
 	i915_gem_object_unpin_pages(obj);
-	return vma;
+	return ret;
 }
 
 bool
@@ -4347,6 +4333,65 @@ i915_vma_misplaced(struct i915_vma *vma,
 	return false;
 }
 
+int
+i915_vma_pin(struct i915_vma *vma,
+	     uint32_t size,
+	     uint32_t alignment,
+	     uint64_t flags)
+{
+	struct drm_i915_gem_object *obj = vma->obj;
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	unsigned bound = vma->bound;
+	int ret;
+
+	if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
+		return -EBUSY;
+
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		/* In true PPGTT, bind has possibly changed PDEs, which
+		 * means we must do a context switch before the GPU can
+		 * accurately read some of the VMAs.
+		 */
+		ret = i915_gem_object_bind_to_vm(obj, vma,
+						 size, alignment, flags);
+		if (ret)
+			return ret;
+	}
+
+	if (flags & PIN_GLOBAL && !(vma->bound & GLOBAL_BIND)) {
+		ret = i915_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+		if (ret)
+			return ret;
+	}
+
+	if ((bound ^ vma->bound) & GLOBAL_BIND) {
+		bool mappable, fenceable;
+		u32 fence_size, fence_alignment;
+
+		fence_size = i915_gem_get_gtt_size(obj->base.dev,
+						   obj->base.size,
+						   obj->tiling_mode);
+		fence_alignment = i915_gem_get_gtt_alignment(obj->base.dev,
+							     obj->base.size,
+							     obj->tiling_mode,
+							     true);
+
+		fenceable = (vma->node.size >= fence_size &&
+			     (vma->node.start & (fence_alignment - 1)) == 0);
+
+		mappable = (vma->node.start + fence_size <=
+			    dev_priv->gtt.mappable_end);
+
+		obj->map_and_fenceable = mappable && fenceable;
+	}
+
+	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
+
+	vma->pin_count++;
+	return 0;
+}
+
 static int
 i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		       struct i915_address_space *vm,
@@ -4357,7 +4402,6 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	struct i915_vma *vma;
-	unsigned bound;
 	int ret;
 
 	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
@@ -4379,9 +4423,6 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		return PTR_ERR(vma);
 
 	if (vma) {
-		if (WARN_ON(vma->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
-			return -EBUSY;
-
 		if (i915_vma_misplaced(vma, size, alignment, flags)) {
 			unsigned long offset;
 			offset = ggtt_view ? i915_gem_obj_ggtt_offset_view(obj, ggtt_view) :
@@ -4403,49 +4444,14 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	bound = vma ? vma->bound : 0;
-	if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
-		/* In true PPGTT, bind has possibly changed PDEs, which
-		 * means we must do a context switch before the GPU can
-		 * accurately read some of the VMAs.
-		 */
-		vma = i915_gem_object_bind_to_vm(obj, vm, ggtt_view,
-						 size, alignment, flags);
+	if (vma == NULL) {
+		vma = ggtt_view ? i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
+			i915_gem_obj_lookup_or_create_vma(obj, vm);
 		if (IS_ERR(vma))
 			return PTR_ERR(vma);
 	}
 
-	if (flags & PIN_GLOBAL && !(vma->bound & GLOBAL_BIND)) {
-		ret = i915_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
-		if (ret)
-			return ret;
-	}
-
-	if ((bound ^ vma->bound) & GLOBAL_BIND) {
-		bool mappable, fenceable;
-		u32 fence_size, fence_alignment;
-
-		fence_size = i915_gem_get_gtt_size(obj->base.dev,
-						   obj->base.size,
-						   obj->tiling_mode);
-		fence_alignment = i915_gem_get_gtt_alignment(obj->base.dev,
-							     obj->base.size,
-							     obj->tiling_mode,
-							     true);
-
-		fenceable = (vma->node.size >= fence_size &&
-			     (vma->node.start & (fence_alignment - 1)) == 0);
-
-		mappable = (vma->node.start + fence_size <=
-			    dev_priv->gtt.mappable_end);
-
-		obj->map_and_fenceable = mappable && fenceable;
-	}
-
-	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
-
-	vma->pin_count++;
-	return 0;
+	return i915_vma_pin(vma, size, alignment, flags);
 }
 
 int
@@ -4478,13 +4484,7 @@ void
 i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 				const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma = i915_gem_obj_to_ggtt_view(obj, view);
-
-	BUG_ON(!vma);
-	WARN_ON(vma->pin_count == 0);
-	WARN_ON(!i915_gem_obj_ggtt_bound_view(obj, view));
-
-	--vma->pin_count;
+	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
 }
 
 bool
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bd48393fb91f..734a7ef56a93 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -42,6 +42,7 @@
 
 struct eb_vmas {
 	struct list_head vmas;
+	struct i915_vma *batch_vma;
 	int and;
 	union {
 		struct i915_vma *lut[0];
@@ -88,6 +89,26 @@ eb_reset(struct eb_vmas *eb)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
+static struct i915_vma *
+eb_get_batch(struct eb_vmas *eb)
+{
+	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
+
+	/*
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
+	 */
+	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
+		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
+
+	return vma;
+}
+
 static int
 eb_lookup_vmas(struct eb_vmas *eb,
 	       struct drm_i915_gem_exec_object2 *exec,
@@ -165,6 +186,9 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		++i;
 	}
 
+	/* take note of the batch buffer before we might reorder the lists */
+	eb->batch_vma = eb_get_batch(eb);
+
 	return 0;
 
 
@@ -644,16 +668,16 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 			flags |= entry->offset | PIN_OFFSET_FIXED;
 	}
 
-	ret = i915_gem_object_pin(obj, vma->vm,
-				  entry->pad_to_size,
-				  entry->alignment,
-				  flags);
-	if ((ret == -ENOSPC  || ret == -E2BIG) &&
+	ret = i915_vma_pin(vma,
+			   entry->pad_to_size,
+			   entry->alignment,
+			   flags);
+	if ((ret == -ENOSPC || ret == -E2BIG) &&
 	    only_mappable_for_reloc(entry->flags))
-		ret = i915_gem_object_pin(obj, vma->vm,
-					  entry->pad_to_size,
-					  entry->alignment,
-					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
+		ret = i915_vma_pin(vma,
+				   entry->pad_to_size,
+				   entry->alignment,
+				   flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
 	if (ret)
 		return ret;
 
@@ -1194,11 +1218,10 @@ i915_emit_box(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static struct drm_i915_gem_object*
+static struct i915_vma*
 i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
 			  struct eb_vmas *eb,
-			  struct drm_i915_gem_object *batch_obj,
 			  u32 batch_start_offset,
 			  u32 batch_len,
 			  bool is_master)
@@ -1210,10 +1233,10 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
 						   PAGE_ALIGN(batch_len));
 	if (IS_ERR(shadow_batch_obj))
-		return shadow_batch_obj;
+		return ERR_CAST(shadow_batch_obj);
 
 	ret = i915_parse_cmds(ring,
-			      batch_obj,
+			      eb->batch_vma->obj,
 			      shadow_batch_obj,
 			      batch_start_offset,
 			      batch_len,
@@ -1235,14 +1258,12 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	shadow_batch_obj->base.pending_read_domains = I915_GEM_DOMAIN_COMMAND;
-
-	return shadow_batch_obj;
+	return vma;
 
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
 	if (ret == -EACCES) /* unhandled chained batch */
-		return batch_obj;
+		return NULL;
 	else
 		return ERR_PTR(ret);
 }
@@ -1442,26 +1463,6 @@ static int gen8_dispatch_bsd_ring(struct drm_device *dev,
 	}
 }
 
-static struct drm_i915_gem_object *
-eb_get_batch(struct eb_vmas *eb)
-{
-	struct i915_vma *vma = list_entry(eb->vmas.prev, typeof(*vma), exec_list);
-
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if ((vma->exec_entry->flags & EXEC_OBJECT_PINNED) == 0)
-		vma->exec_entry->flags |= __EXEC_OBJECT_NEEDS_BIAS;
-
-	return vma->obj;
-}
-
 static int
 i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_file *file,
@@ -1470,7 +1471,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
-	struct drm_i915_gem_object *batch_obj;
 	struct drm_i915_gem_exec_object2 shadow_exec_entry;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
@@ -1582,9 +1582,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
-	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = eb_get_batch(eb);
-
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
 	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
@@ -1605,24 +1602,25 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Set the pending read domains for the batch buffer to COMMAND */
-	if (batch_obj->base.pending_write_domain) {
+	if (eb->batch_vma->obj->base.pending_write_domain) {
 		DRM_DEBUG("Attempting to use self-modifying batch buffer\n");
 		ret = -EINVAL;
 		goto err;
 	}
 
 	if (i915_needs_cmd_parser(ring) && args->batch_len) {
-		batch_obj = i915_gem_execbuffer_parse(ring,
-						      &shadow_exec_entry,
-						      eb,
-						      batch_obj,
-						      args->batch_start_offset,
-						      args->batch_len,
-						      file->is_master);
-		if (IS_ERR(batch_obj)) {
-			ret = PTR_ERR(batch_obj);
+		struct i915_vma *vma;
+
+		vma = i915_gem_execbuffer_parse(ring, &shadow_exec_entry, eb,
+						args->batch_start_offset,
+						args->batch_len,
+						file->is_master);
+		if (IS_ERR(vma)) {
+			ret = PTR_ERR(vma);
 			goto err;
 		}
+		if (vma)
+			eb->batch_vma = vma;
 
 		/*
 		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
@@ -1641,7 +1639,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		exec_start = 0;
 	}
 
-	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
+	eb->batch_vma->obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
@@ -1657,17 +1655,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		ret = i915_gem_obj_ggtt_pin(batch_obj, 0, 0);
+		ret = i915_gem_obj_ggtt_pin(eb->batch_vma->obj, 0, 0);
 		if (ret)
 			goto err;
 
-		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
+		exec_start += i915_gem_obj_ggtt_offset(eb->batch_vma->obj);
 	} else
-		exec_start += i915_gem_obj_offset(batch_obj, vm);
+		exec_start += eb->batch_vma->node.start;
 
 	ret = dev_priv->gt.execbuf_submit(dev, file, ring, ctx, args,
-					  &eb->vmas, batch_obj, exec_start,
-					  dispatch_flags);
+					  &eb->vmas, eb->batch_vma->obj,
+					  exec_start, dispatch_flags);
 
 	/*
 	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
@@ -1676,7 +1674,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * active.
 	 */
 	if (dispatch_flags & I915_DISPATCH_SECURE)
-		i915_gem_object_ggtt_unpin(batch_obj);
+		i915_vma_unpin(eb->batch_vma);
 err:
 	/* the request owns the ref now */
 	i915_gem_context_unreference(ctx);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 68/70] drm/i915: Simplify vma-walker for i915_gem_objects
  2015-04-07 16:28 ` Chris Wilson
                     ` (7 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 67/70] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 69/70] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
  2015-04-07 16:28   ` [PATCH 70/70] drm/i915: Use vma as the primary token for managing binding Chris Wilson
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

When walking the vma of the GGTT, we already have the node size
available, so no need to do a double lookup. To further simplify, just
ignore the mappable counts, which will be useful for inspecting other VM
later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6508eec3cf60..7c84420b374f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -405,12 +405,8 @@ static void print_batch_pool_stats(struct seq_file *m,
 
 #define count_vmas(list, member) do { \
 	list_for_each_entry(vma, list, member) { \
-		size += i915_gem_obj_ggtt_size(vma->obj); \
+		size += vma->node.size; \
 		++count; \
-		if (vma->obj->map_and_fenceable) { \
-			mappable_size += i915_gem_obj_ggtt_size(vma->obj); \
-			++mappable_count; \
-		} \
 	} \
 } while (0)
 
@@ -440,15 +436,13 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	seq_printf(m, "%u [%u] objects, %zu [%zu] bytes in gtt\n",
 		   count, mappable_count, size, mappable_size);
 
-	size = count = mappable_size = mappable_count = 0;
+	size = count = 0;
 	count_vmas(&vm->active_list, mm_list);
-	seq_printf(m, "  %u [%u] active objects, %zu [%zu] bytes\n",
-		   count, mappable_count, size, mappable_size);
+	seq_printf(m, "  %u active objects, %zu bytes\n", count, size);
 
-	size = count = mappable_size = mappable_count = 0;
+	size = count = 0;
 	count_vmas(&vm->inactive_list, mm_list);
-	seq_printf(m, "  %u [%u] inactive objects, %zu [%zu] bytes\n",
-		   count, mappable_count, size, mappable_size);
+	seq_printf(m, "  %u inactive objects, %zu bytes\n", count, size);
 
 	size = count = purgeable_size = purgeable_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list) {
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 69/70] drm/i915: Skip holding an object reference for execbuf preparation
  2015-04-07 16:28 ` Chris Wilson
                     ` (8 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 68/70] drm/i915: Simplify vma-walker for i915_gem_objects Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  2015-04-07 16:28   ` [PATCH 70/70] drm/i915: Use vma as the primary token for managing binding Chris Wilson
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

This is a golden oldie! We can shave a couple of locked instructions for
about 10% of the per-object overhead by not taking an extra kref whilst
reserving objects for an execbuf. Due to lock management this is safe,
as we cannot lose the original object reference without the lock.
Equally, because this relies on the heavy BKL^W struct_mutex, it is also
likely to be only a temporary optimisation until we have fine grained
locking. (That's what we said 5 years ago!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 734a7ef56a93..1b673c55934e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -142,7 +142,6 @@ eb_lookup_vmas(struct eb_vmas *eb,
 			goto err;
 		}
 
-		drm_gem_object_reference(&obj->base);
 		list_add_tail(&obj->obj_exec_link, &objects);
 	}
 	spin_unlock(&file->table_lock);
@@ -260,7 +259,6 @@ static void eb_destroy(struct eb_vmas *eb)
 				       exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		drm_gem_object_unreference(&vma->obj->base);
 	}
 	kfree(eb);
 }
@@ -873,7 +871,6 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
-		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* [PATCH 70/70] drm/i915: Use vma as the primary token for managing binding
  2015-04-07 16:28 ` Chris Wilson
                     ` (9 preceding siblings ...)
  2015-04-07 16:28   ` [PATCH 69/70] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
@ 2015-04-07 16:28   ` Chris Wilson
  10 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-07 16:28 UTC (permalink / raw)
  To: intel-gfx

This is a nasty patch that does multiple things:

1. Fixes the obj->pin_display confusion (separated out by Tvrtko).
2. Simplifies the view API
3. Introduces a vma hashtable for lookups (optimising for OglDrvCtx,
igt/gem_ctx_thrash  and friends)
4. Introduces the VMA as the binding token used. That is when you bind
your object your given a VMA cookie which you then use for all queries
(such as how much and where in the VM am I) and then to unbind. This is
to try and kill all the repeated i915_obj_to_vma() when we already have
the vma. This is less successful than hoped (~90% is a trivial
conversion that naturally operates with i915_vma rather than the obj.
The biggest sticking point is the atomic modesetting where we do not
have the ability to track per-instance data.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c          |   8 +-
 drivers/gpu/drm/i915/i915_drv.h              |  56 +++---
 drivers/gpu/drm/i915/i915_gem.c              | 282 +++++++++------------------
 drivers/gpu/drm/i915/i915_gem_context.c      |  52 +++--
 drivers/gpu/drm/i915/i915_gem_evict.c        |   2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  46 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c          | 103 ++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h          |   3 +-
 drivers/gpu/drm/i915/i915_gem_render_state.c |  22 +--
 drivers/gpu/drm/i915/i915_gem_render_state.h |   1 +
 drivers/gpu/drm/i915/i915_gem_shrinker.c     |   4 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c       |   6 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c      |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c        |   6 +-
 drivers/gpu/drm/i915/intel_display.c         |  59 +++---
 drivers/gpu/drm/i915/intel_drv.h             |  10 +-
 drivers/gpu/drm/i915/intel_fbdev.c           |   9 +-
 drivers/gpu/drm/i915/intel_lrc.c             |  34 ++--
 drivers/gpu/drm/i915/intel_overlay.c         |  39 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 127 +++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h      |   2 +
 21 files changed, 433 insertions(+), 440 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7c84420b374f..e62fa2236ece 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -114,7 +114,7 @@ static const char *get_tiling_flag(struct drm_i915_gem_object *obj)
 
 static inline const char *get_global_flag(struct drm_i915_gem_object *obj)
 {
-	return i915_gem_obj_to_ggtt(obj) ? "g" : " ";
+	return i915_gem_obj_to_ggtt(obj, NULL) ? "g" : " ";
 }
 
 static void
@@ -146,7 +146,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (vma->pin_count > 0)
 			pin_count++;
 	}
@@ -155,7 +155,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (display)");
 	if (obj->fence_reg != I915_FENCE_REG_NONE)
 		seq_printf(m, " (fence: %d)", obj->fence_reg);
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (!vma->is_ggtt)
 			seq_puts(m, " (pp");
 		else
@@ -329,7 +329,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 		stats->shared += obj->base.size;
 
 	if (USES_FULL_PPGTT(obj->base.dev)) {
-		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
 			struct i915_hw_ppgtt *ppgtt;
 
 			if (!drm_mm_node_allocated(&vma->node))
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ba593ee78863..b9830a48436b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -802,6 +802,7 @@ struct intel_context {
 	/* Legacy ring buffer submission */
 	struct {
 		struct drm_i915_gem_object *rcs_state;
+		struct i915_vma *rcs_vma;
 		bool initialized;
 	} legacy_hw_ctx;
 
@@ -809,6 +810,7 @@ struct intel_context {
 	bool rcs_initialized;
 	struct {
 		struct drm_i915_gem_object *state;
+		struct i915_vma *vma;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
 	} engine[I915_NUM_RINGS];
@@ -1919,6 +1921,7 @@ struct drm_i915_gem_object {
 
 	/** List of VMAs backed by this object */
 	struct list_head vma_list;
+	struct hlist_head *vma_ht;
 
 	/** Stolen memory for this object, instead of being backed by shmem. */
 	struct drm_mm_node *stolen;
@@ -1980,7 +1983,6 @@ struct drm_i915_gem_object {
 	 * accurate mappable working set.
 	 */
 	unsigned int fault_mappable:1;
-	unsigned int pin_display:1;
 
 	/*
 	 * Is the object to be mapped as read-only to the GPU
@@ -1994,6 +1996,8 @@ struct drm_i915_gem_object {
 
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
+	unsigned int pin_display;
+
 	struct sg_table *pages;
 	int pages_pin_count;
 	struct get_page {
@@ -2656,13 +2660,13 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 #define PIN_OFFSET_BIAS 0x8
 #define PIN_OFFSET_FIXED 0x10
 #define PIN_OFFSET_MASK (~4095)
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
 		    uint32_t size,
 		    uint32_t alignment,
 		    uint64_t flags);
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 uint32_t size,
@@ -2840,13 +2844,12 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
 int __must_check
 i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_engine_cs *pipelined,
 				     const struct i915_ggtt_view *view);
-void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					      const struct i915_ggtt_view *view);
+void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma);
 int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 				int align);
 int i915_gem_open(struct drm_device *dev, struct drm_file *file);
@@ -2878,7 +2881,7 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 static inline unsigned long
 i915_gem_obj_ggtt_offset(struct drm_i915_gem_object *o)
 {
-	return i915_gem_obj_ggtt_offset_view(o, &i915_ggtt_view_normal);
+	return i915_gem_obj_ggtt_offset_view(o, NULL);
 }
 
 bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o);
@@ -2891,29 +2894,16 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
 struct i915_vma *
 i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-			  const struct i915_ggtt_view *view);
+		     struct i915_address_space *vm,
+		     const struct i915_ggtt_view *view);
 
 struct i915_vma *
 i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view);
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view);
 
-static inline struct i915_vma *
-i915_gem_obj_to_ggtt(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_obj_to_ggtt_view(obj, &i915_ggtt_view_normal);
-}
 bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
 
-/* Some GGTT VM helpers */
-#define i915_obj_to_ggtt(obj) \
-	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
-
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
@@ -2921,10 +2911,20 @@ i915_vm_to_ppgtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
+/* Some GGTT VM helpers */
+#define i915_obj_to_ggtt(obj) \
+	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
+
+static inline struct i915_vma *
+i915_gem_obj_to_ggtt(struct drm_i915_gem_object *obj,
+		     const struct i915_ggtt_view *view)
+{
+	return i915_gem_obj_to_vma(obj, i915_obj_to_ggtt(obj), view);
+}
 
 static inline bool i915_gem_obj_ggtt_bound(struct drm_i915_gem_object *obj)
 {
-	return i915_gem_obj_ggtt_bound_view(obj, &i915_ggtt_view_normal);
+	return i915_gem_obj_ggtt_bound_view(obj, NULL);
 }
 
 static inline unsigned long
@@ -2933,7 +2933,7 @@ i915_gem_obj_ggtt_size(struct drm_i915_gem_object *obj)
 	return i915_gem_obj_size(obj, i915_obj_to_ggtt(obj));
 }
 
-static inline int __must_check
+static inline struct i915_vma * __must_check
 i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 		      uint32_t alignment,
 		      unsigned flags)
@@ -2945,7 +2945,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 static inline int
 i915_gem_object_ggtt_unbind(struct drm_i915_gem_object *obj)
 {
-	return i915_vma_unbind(i915_gem_obj_to_ggtt(obj));
+	return i915_vma_unbind(i915_gem_obj_to_ggtt(obj, NULL));
 }
 
 void i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
@@ -2953,7 +2953,7 @@ void i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 static inline void
 i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
 {
-	i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
+	i915_gem_object_ggtt_unpin_view(obj, NULL);
 }
 
 /* i915_gem_context.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b27236f2c29..42410571440d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -295,7 +295,7 @@ drop_pages(struct drm_i915_gem_object *obj)
 	int ret;
 
 	drm_gem_object_reference(&obj->base);
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link)
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link)
 		if (i915_vma_unbind(vma))
 			break;
 
@@ -789,14 +789,17 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 			 struct drm_file *file)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_vma *vma;
 	ssize_t remain;
 	loff_t offset, page_base;
 	char __user *user_data;
 	int page_offset, page_length, ret;
 
-	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
-	if (ret)
+	vma = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto out;
+	}
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
@@ -809,7 +812,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 	user_data = to_user_ptr(args->data_ptr);
 	remain = args->size;
 
-	offset = i915_gem_obj_ggtt_offset(obj) + args->offset;
+	offset = vma->node.start + args->offset;
 
 	intel_fb_obj_invalidate(obj, NULL, ORIGIN_GTT);
 
@@ -844,7 +847,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 out_flush:
 	intel_fb_obj_flush(obj, false);
 out_unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	i915_vma_unpin(vma);
 out:
 	return ret;
 }
@@ -1720,6 +1723,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct drm_i915_gem_object *obj = to_intel_bo(vma->vm_private_data);
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_vma *ggtt;
 	pgoff_t page_offset;
 	unsigned long pfn;
 	int ret = 0;
@@ -1753,8 +1757,8 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* Now bind it into the GTT if needed */
-	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
-	if (ret)
+	ggtt = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+	if (IS_ERR(ggtt))
 		goto unlock;
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, write);
@@ -1766,7 +1770,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		goto unpin;
 
 	/* Finally, remap it using the new GTT offset */
-	pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
+	pfn = dev_priv->gtt.mappable_base + ggtt->node.start;
 	pfn >>= PAGE_SHIFT;
 
 	if (!obj->fault_mappable) {
@@ -1789,7 +1793,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 				    (unsigned long)vmf->virtual_address,
 				    pfn + page_offset);
 unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	ggtt->pin_count--;
 unlock:
 	mutex_unlock(&dev->struct_mutex);
 out:
@@ -2379,7 +2383,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	if (obj->active)
 		return;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (!list_empty(&vma->mm_list))
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
@@ -3184,7 +3188,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	int ret;
 
-	if (list_empty(&vma->vma_link))
+	if (list_empty(&vma->obj_link))
 		return 0;
 
 	if (!drm_mm_node_allocated(&vma->node)) {
@@ -3929,7 +3933,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	vma = i915_gem_obj_to_ggtt(obj);
+	vma = i915_gem_obj_to_ggtt(obj, NULL);
 	if (vma && drm_mm_node_allocated(&vma->node) && !obj->active)
 		list_move_tail(&vma->mm_list,
 			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
@@ -3947,13 +3951,12 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	if (obj->cache_level == cache_level)
 		return 0;
 
-	if (i915_gem_obj_is_pinned(obj)) {
-		DRM_DEBUG("can not change the cache level of pinned objects\n");
-		return -EBUSY;
-	}
-
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
 		if (!i915_gem_valid_gtt_space(vma, cache_level)) {
+			if (vma->pin_count) {
+				DRM_DEBUG("can not change the cache level of pinned objects\n");
+				return -EBUSY;
+			}
 			ret = i915_vma_unbind(vma);
 			if (ret)
 				return ret;
@@ -3977,7 +3980,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				return ret;
 		}
 
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
+		list_for_each_entry(vma, &obj->vma_list, obj_link)
 			if (drm_mm_node_allocated(&vma->node)) {
 				ret = i915_vma_bind(vma, cache_level,
 						    vma->bound & GLOBAL_BIND);
@@ -3986,7 +3989,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			}
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
+	list_for_each_entry(vma, &obj->vma_list, obj_link)
 		vma->node.color = cache_level;
 	obj->cache_level = cache_level;
 
@@ -4078,48 +4081,29 @@ unlock:
 	return ret;
 }
 
-static bool is_pin_display(struct drm_i915_gem_object *obj)
-{
-	struct i915_vma *vma;
-
-	vma = i915_gem_obj_to_ggtt(obj);
-	if (!vma)
-		return false;
-
-	/* There are 2 sources that pin objects:
-	 *   1. The display engine (scanouts, sprites, cursors);
-	 *   2. Reservations for execbuffer;
-	 *
-	 * We can ignore reservations as we hold the struct_mutex and
-	 * are only called outside of the reservation path.
-	 */
-	return vma->pin_count;
-}
-
 /*
  * Prepare buffer for display plane (scanout, cursors, etc).
  * Can be called from an uninterruptible phase (modesetting) and allows
  * any flushes to be pipelined (for pageflips).
  */
-int
+struct i915_vma *
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_engine_cs *pipelined,
 				     const struct i915_ggtt_view *view)
 {
 	u32 old_read_domains, old_write_domain;
-	bool was_pin_display;
+	struct i915_vma *vma;
 	int ret;
 
 	ret = i915_gem_object_sync(obj, pipelined);
 	if (ret)
-		return ret;
+		return ERR_PTR(ret);
 
 	/* Mark the pin_display early so that we account for the
 	 * display coherency whilst setting up the cache domains.
 	 */
-	was_pin_display = obj->pin_display;
-	obj->pin_display = true;
+	obj->pin_display++;
 
 	/* The display engine is not coherent with the LLC cache on gen6.  As
 	 * a result, we make sure that the pinning that is about to occur is
@@ -4132,8 +4116,10 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_object_set_cache_level(obj,
 					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
-	if (ret)
+	if (ret) {
+		vma = ERR_PTR(ret);
 		goto err_unpin_display;
+	}
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
@@ -4142,14 +4128,15 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 * put it anyway and hope that userspace can cope (but always first
 	 * try to preserve the existing ABI).
 	 */
-	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
-				       view->type == I915_GGTT_VIEW_NORMAL ?
-				       PIN_MAPPABLE : 0);
-	if (ret)
-		ret = i915_gem_obj_ggtt_pin(obj, alignment, 0);
-	if (ret)
+	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
+				       view ? PIN_MAPPABLE : 0);
+	if (IS_ERR(vma))
+		vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment, 0);
+	if (IS_ERR(vma))
 		goto err_unpin_display;
 
+	WARN_ON(obj->pin_display > vma->pin_count);
+
 	i915_gem_object_flush_cpu_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -4165,21 +4152,22 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 					    old_read_domains,
 					    old_write_domain);
 
-	return 0;
+	return vma;
 
 err_unpin_display:
-	WARN_ON(was_pin_display != is_pin_display(obj));
-	obj->pin_display = was_pin_display;
-	return ret;
+	obj->pin_display--;
+	return vma;
 }
 
 void
-i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					 const struct i915_ggtt_view *view)
+i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 {
-	i915_gem_object_ggtt_unpin_view(obj, view);
+	WARN_ON(vma->obj->pin_display == 0);
+	vma->obj->pin_display--;
+
+	i915_vma_unpin(vma);
 
-	obj->pin_display = is_pin_display(obj);
+	WARN_ON(vma->obj->pin_display > vma->pin_count);
 }
 
 int
@@ -4392,99 +4380,89 @@ i915_vma_pin(struct i915_vma *vma,
 	return 0;
 }
 
-static int
-i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
-		       struct i915_address_space *vm,
-		       const struct i915_ggtt_view *ggtt_view,
-		       uint32_t size,
-		       uint32_t alignment,
-		       uint64_t flags)
+static struct i915_vma *
+__i915_gem_object_pin(struct drm_i915_gem_object *obj,
+		      struct i915_address_space *vm,
+		      const struct i915_ggtt_view *ggtt_view,
+		      uint32_t size,
+		      uint32_t alignment,
+		      uint64_t flags)
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	struct i915_vma *vma;
 	int ret;
 
 	if (WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base))
-		return -ENODEV;
+		return ERR_PTR(-ENODEV);
 
 	if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !vm->is_ggtt))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
-		return -EINVAL;
-
-	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
-		return -EINVAL;
-
-	vma = ggtt_view ? i915_gem_obj_to_ggtt_view(obj, ggtt_view) :
-			  i915_gem_obj_to_vma(obj, vm);
+		return ERR_PTR(-EINVAL);
 
+	vma = i915_gem_obj_to_vma(obj, vm, ggtt_view);
 	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+		return vma;
 
 	if (vma) {
 		if (i915_vma_misplaced(vma, size, alignment, flags)) {
-			unsigned long offset;
-			offset = ggtt_view ? i915_gem_obj_ggtt_offset_view(obj, ggtt_view) :
-					     i915_gem_obj_offset(obj, vm);
 			WARN(vma->pin_count,
 			     "bo is already pinned in %s with incorrect alignment:"
 			     " offset=%lx, req.alignment=%x, req.map_and_fenceable=%d,"
 			     " obj->map_and_fenceable=%d\n",
-			     ggtt_view ? "ggtt" : "ppgtt",
-			     offset,
-			     alignment,
+			     vma->is_ggtt ? "ggtt" : "ppgtt",
+			     (long)vma->node.start, alignment,
 			     !!(flags & PIN_MAPPABLE),
 			     obj->map_and_fenceable);
 			ret = i915_vma_unbind(vma);
 			if (ret)
-				return ret;
+				return ERR_PTR(ret);
 
 			vma = NULL;
 		}
 	}
 
 	if (vma == NULL) {
-		vma = ggtt_view ? i915_gem_obj_lookup_or_create_ggtt_vma(obj, ggtt_view) :
-			i915_gem_obj_lookup_or_create_vma(obj, vm);
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, ggtt_view);
 		if (IS_ERR(vma))
-			return PTR_ERR(vma);
+			return vma;
 	}
 
-	return i915_vma_pin(vma, size, alignment, flags);
+	ret = i915_vma_pin(vma, size, alignment, flags);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return vma;
 }
 
-int
+struct i915_vma *
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
 		    uint32_t size,
 		    uint32_t alignment,
 		    uint64_t flags)
 {
-	return i915_gem_object_do_pin(obj, vm,
-				      vm->is_ggtt ? &i915_ggtt_view_normal : NULL,
-				      size, alignment, flags);
+	return __i915_gem_object_pin(obj, vm, NULL,
+				     size, alignment, flags);
 }
 
-int
+struct i915_vma *
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 uint32_t size,
 			 uint32_t alignment,
 			 uint64_t flags)
 {
-	if (WARN_ONCE(!view, "no view specified"))
-		return -EINVAL;
-
-	return i915_gem_object_do_pin(obj, i915_obj_to_ggtt(obj), view,
-				      size, alignment, flags | PIN_GLOBAL);
+	return __i915_gem_object_pin(obj, i915_obj_to_ggtt(obj), view,
+				     size, alignment, flags | PIN_GLOBAL);
 }
 
 void
 i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 				const struct i915_ggtt_view *view)
 {
-	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
+	i915_vma_unpin(i915_gem_obj_to_ggtt(obj, view));
 }
 
 bool
@@ -4492,11 +4470,6 @@ i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
 {
 	if (obj->fence_reg != I915_FENCE_REG_NONE) {
 		struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-		struct i915_vma *ggtt_vma = i915_gem_obj_to_ggtt(obj);
-
-		WARN_ON(!ggtt_vma ||
-			dev_priv->fence_regs[obj->fence_reg].pin_count >
-			ggtt_vma->pin_count);
 		dev_priv->fence_regs[obj->fence_reg].pin_count++;
 		return true;
 	} else
@@ -4726,7 +4699,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 
 	trace_i915_gem_object_destroy(obj);
 
-	list_for_each_entry_safe(vma, next, &obj->vma_list, vma_link) {
+	list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
 		int ret;
 
 		vma->pin_count = 0;
@@ -4773,42 +4746,13 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	drm_gem_object_release(&obj->base);
 	i915_gem_info_remove_obj(dev_priv, obj->base.size);
 
+	kfree(obj->vma_ht);
 	kfree(obj->bit_17);
 	i915_gem_object_free(obj);
 
 	intel_runtime_pm_put(dev_priv);
 }
 
-struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma;
-	}
-	return NULL;
-}
-
-struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-					   const struct i915_ggtt_view *view)
-{
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(obj);
-	struct i915_vma *vma;
-
-	if (WARN_ONCE(!view, "no view specified"))
-		return ERR_PTR(-EINVAL);
-
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma;
-	return NULL;
-}
-
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
 	struct i915_address_space *vm = NULL;
@@ -4823,7 +4767,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
 	if (!vm->is_ggtt)
 		i915_ppgtt_put(i915_vm_to_ppgtt(vm));
 
-	list_del(&vma->vma_link);
+	list_del(&vma->obj_link);
+	hash_del(&vma->obj_node);
 
 	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
@@ -5348,13 +5293,9 @@ i915_gem_obj_offset(struct drm_i915_gem_object *o,
 
 	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
 
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma->node.start;
-	}
+	vma = i915_gem_obj_to_vma(o, vm, NULL);
+	if (vma)
+		return vma->node.start;
 
 	WARN(1, "%s vma for this object not found.\n",
 	     vm->is_ggtt ? "global" : "ppgtt");
@@ -5365,13 +5306,9 @@ unsigned long
 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
 			      const struct i915_ggtt_view *view)
 {
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, vma_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma->node.start;
+	struct i915_vma *vma = i915_gem_obj_to_ggtt(o, view);
+	if (vma)
+		return vma->node.start;
 
 	WARN(1, "global vma for this object not found.\n");
 	return -1;
@@ -5380,39 +5317,22 @@ i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
 bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 			struct i915_address_space *vm)
 {
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
-			return true;
-	}
-
-	return false;
+	struct i915_vma *vma = i915_gem_obj_to_vma(o, vm, NULL);
+	return vma && drm_mm_node_allocated(&vma->node);
 }
 
 bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
 				  const struct i915_ggtt_view *view)
 {
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(o);
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, vma_link)
-		if (vma->vm == ggtt &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view) &&
-		    drm_mm_node_allocated(&vma->node))
-			return true;
-
-	return false;
+	struct i915_vma *vma = i915_gem_obj_to_ggtt(o, view);
+	return vma && drm_mm_node_allocated(&vma->node);
 }
 
 bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &o->vma_list, vma_link)
+	list_for_each_entry(vma, &o->vma_list, obj_link)
 		if (drm_mm_node_allocated(&vma->node))
 			return true;
 
@@ -5429,26 +5349,16 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 
 	BUG_ON(list_empty(&o->vma_list));
 
-	list_for_each_entry(vma, &o->vma_list, vma_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma->node.size;
-	}
+	vma = i915_gem_obj_to_vma(o, vm, NULL);
+	if (vma)
+		return vma->node.size;
+
 	return 0;
 }
 
 bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 {
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (vma->is_ggtt &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->pin_count > 0)
-			return true;
-	}
-	return false;
+	struct i915_vma *vma = i915_gem_obj_to_ggtt(obj, NULL);
+	return vma && vma->pin_count;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 43e58249235b..e8b3c56256c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -243,6 +243,7 @@ i915_gem_create_context(struct drm_device *dev,
 			struct drm_i915_file_private *file_priv)
 {
 	const bool is_global_default_ctx = file_priv == NULL;
+	struct i915_vma *vma = NULL;
 	struct intel_context *ctx;
 	int ret = 0;
 
@@ -260,12 +261,10 @@ i915_gem_create_context(struct drm_device *dev,
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+		vma = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
 					    get_context_alignment(dev), 0);
-		if (ret) {
-			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
+		if (IS_ERR(vma))
 			goto err_destroy;
-		}
 	}
 
 	if (USES_FULL_PPGTT(dev)) {
@@ -286,8 +285,8 @@ i915_gem_create_context(struct drm_device *dev,
 	return ctx;
 
 err_unpin:
-	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state)
-		i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state);
+	if (vma)
+		i915_vma_unpin(vma);
 err_destroy:
 	i915_gem_context_unreference(ctx);
 	return ERR_PTR(ret);
@@ -481,7 +480,7 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 
 static inline int
 mi_set_context(struct intel_engine_cs *ring,
-	       struct intel_context *new_context,
+	       struct intel_context *to,
 	       u32 hw_flags)
 {
 	u32 flags = hw_flags | MI_MM_SPACE_GTT;
@@ -535,8 +534,7 @@ mi_set_context(struct intel_engine_cs *ring,
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring, i915_gem_obj_ggtt_offset(new_context->legacy_hw_ctx.rcs_state) |
-			flags);
+	intel_ring_emit(ring, to->legacy_hw_ctx.rcs_vma->node.start | flags);
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -631,7 +629,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	struct intel_context *from = ring->last_context;
 	u32 hw_flags = 0;
 	bool uninitialized = false;
-	struct i915_vma *vma;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -644,10 +641,18 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* Trying to pin first makes error handling easier. */
 	if (ring == &dev_priv->ring[RCS]) {
-		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
 					    get_context_alignment(ring->dev), 0);
-		if (ret)
-			return ret;
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
+
+		to->legacy_hw_ctx.rcs_vma = vma;
+		if (WARN_ON(!(vma->bound & GLOBAL_BIND))) {
+			ret = -ENODEV;
+			goto unpin_out;
+		}
 	}
 
 	/*
@@ -689,16 +694,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_out;
 
-	vma = i915_gem_obj_to_ggtt(to->legacy_hw_ctx.rcs_state);
-	if (!(vma->bound & GLOBAL_BIND)) {
-		ret = i915_vma_bind(vma,
-				    to->legacy_hw_ctx.rcs_state->cache_level,
-				    GLOBAL_BIND);
-		/* This shouldn't ever fail. */
-		if (WARN_ONCE(ret, "GGTT context bind failed!"))
-			goto unpin_out;
-	}
-
 	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
 		/* NB: If we inhibit the restore, the context is not allowed to
@@ -754,7 +749,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (from != NULL) {
 		from->legacy_hw_ctx.rcs_state->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->legacy_hw_ctx.rcs_state), ring);
+		i915_vma_move_to_active(from->legacy_hw_ctx.rcs_vma, ring);
 		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
 		 * whole damn pipeline, we don't need to explicitly mark the
 		 * object dirty. The only exception is that the context must be
@@ -765,7 +760,8 @@ static int do_switch(struct intel_engine_cs *ring,
 		from->legacy_hw_ctx.rcs_state->dirty = 1;
 
 		/* obj is kept alive until the next request by its active ref */
-		i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state);
+		i915_vma_unpin(from->legacy_hw_ctx.rcs_vma);
+		from->legacy_hw_ctx.rcs_vma = NULL;
 		i915_gem_context_unreference(from);
 	}
 
@@ -787,8 +783,10 @@ done:
 	return 0;
 
 unpin_out:
-	if (ring->id == RCS)
-		i915_gem_object_ggtt_unpin(to->legacy_hw_ctx.rcs_state);
+	if (ring->id == RCS) {
+		i915_vma_unpin(to->legacy_hw_ctx.rcs_vma);
+		to->legacy_hw_ctx.rcs_vma = NULL;
+	}
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index cf33f982da8e..9f14b4e87842 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -331,7 +331,7 @@ i915_gem_evict_everything(struct drm_device *dev)
 		list_move_tail(&obj->global_list, &still_in_list);
 
 		drm_gem_object_reference(&obj->base);
-		list_for_each_entry_safe(vma, v, &obj->vma_list, vma_link)
+		list_for_each_entry_safe(vma, v, &obj->vma_list, obj_link)
 			if (WARN_ON(i915_vma_unbind(vma)))
 				break;
 		drm_gem_object_unreference(&obj->base);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1b673c55934e..eac86d97f935 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -162,7 +162,7 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
 		if (IS_ERR(vma)) {
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
@@ -244,7 +244,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 		i915_gem_object_unpin_fence(obj);
 
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
-		vma->pin_count--;
+		i915_vma_unpin(vma);
 
 	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
@@ -1215,7 +1215,7 @@ i915_emit_box(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static struct i915_vma*
+static struct i915_vma *
 i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
 			  struct eb_vmas *eb,
@@ -1224,7 +1224,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  bool is_master)
 {
 	struct drm_i915_gem_object *shadow_batch_obj;
-	struct i915_vma *vma;
+	struct i915_vma *vma = NULL;
 	int ret;
 
 	shadow_batch_obj = i915_gem_batch_pool_get(&ring->batch_pool,
@@ -1238,31 +1238,28 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			      batch_start_offset,
 			      batch_len,
 			      is_master);
-	if (ret)
+	if (ret) {
+		if (ret != -EACCES) /* unhandled chained batch */
+			vma = ERR_PTR(ret);
 		goto err;
+	}
 
-	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
-	if (ret)
+	vma = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err;
-
-	i915_gem_object_unpin_pages(shadow_batch_obj);
+	}
 
 	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
-	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	return vma;
-
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
-	if (ret == -EACCES) /* unhandled chained batch */
-		return NULL;
-	else
-		return ERR_PTR(ret);
+	return vma;
 }
 
 int
@@ -1642,6 +1639,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (dispatch_flags & I915_DISPATCH_SECURE) {
+		struct i915_vma *vma;
 		/*
 		 * So on first glance it looks freaky that we pin the batch here
 		 * outside of the reservation loop. But:
@@ -1652,17 +1650,17 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		ret = i915_gem_obj_ggtt_pin(eb->batch_vma->obj, 0, 0);
-		if (ret)
+		vma = i915_gem_obj_ggtt_pin(eb->batch_vma->obj, 0, 0);
+		if (IS_ERR(vma))
 			goto err;
-
-		exec_start += i915_gem_obj_ggtt_offset(eb->batch_vma->obj);
-	} else
-		exec_start += eb->batch_vma->node.start;
+		eb->batch_vma = vma;
+	}
 
 	ret = dev_priv->gt.execbuf_submit(dev, file, ring, ctx, args,
-					  &eb->vmas, eb->batch_vma->obj,
-					  exec_start, dispatch_flags);
+					  &eb->vmas,
+					  eb->batch_vma->obj,
+					  exec_start + eb->batch_vma->node.start,
+					  dispatch_flags);
 
 	/*
 	 * FIXME: We crucially rely upon the active tracking for the (ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a80573105a61..3cf5fb62aff5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1668,8 +1668,8 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 				       true);
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		struct i915_vma *vma = i915_gem_obj_to_vma(obj,
-							   &dev_priv->gtt.base);
+		struct i915_vma *vma =
+		       	i915_gem_obj_to_vma(obj, &dev_priv->gtt.base, NULL);
 		if (!vma)
 			continue;
 
@@ -2051,12 +2051,12 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 
 	/* Mark any preallocated objects as occupied */
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		struct i915_vma *vma = i915_gem_obj_to_vma(obj, ggtt_vm);
+		struct i915_vma *vma = i915_gem_obj_to_vma(obj, ggtt_vm, NULL);
 
-		DRM_DEBUG_KMS("reserving preallocated space: %lx + %zx\n",
-			      i915_gem_obj_ggtt_offset(obj), obj->base.size);
+		DRM_DEBUG_KMS("reserving preallocated space: %lx + %lx\n",
+			      (long)vma->node.start, (long)vma->node.size);
 
-		WARN_ON(i915_gem_obj_ggtt_bound(obj));
+		WARN_ON(vma->bound & GLOBAL_BIND);
 		ret = drm_mm_reserve_node(&ggtt_vm->mm, &vma->node);
 		if (ret) {
 			DRM_DEBUG_KMS("Reservation failed: %i\n", ret);
@@ -2534,6 +2534,17 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	return 0;
 }
 
+inline static unsigned __vma_hash(struct i915_address_space *vm,
+				  unsigned int type)
+{
+	return hash_min((unsigned long)vm | type, 4);
+}
+
+inline static unsigned vma_hash(const struct i915_vma *vma)
+{
+	return __vma_hash(vma->vm, vma->ggtt_view.type);
+}
+
 static struct i915_vma *
 __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 		      struct i915_address_space *vm,
@@ -2541,14 +2552,11 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	if (WARN_ON(vm->is_ggtt != !!ggtt_view))
-		return ERR_PTR(-EINVAL);
-
 	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->vma_link);
+	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->mm_list);
 	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
@@ -2557,8 +2565,6 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	if (INTEL_INFO(vm->dev)->gen >= 6) {
 		if (vm->is_ggtt) {
-			vma->ggtt_view = *ggtt_view;
-
 			vma->unbind_vma = ggtt_unbind_vma;
 			vma->bind_vma = ggtt_bind_vma;
 		} else {
@@ -2567,52 +2573,83 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 		}
 	} else {
 		BUG_ON(!vm->is_ggtt);
-		vma->ggtt_view = *ggtt_view;
 		vma->unbind_vma = i915_ggtt_unbind_vma;
 		vma->bind_vma = i915_ggtt_bind_vma;
 	}
 
-	list_add_tail(&vma->vma_link, &obj->vma_list);
+	if (ggtt_view)
+		vma->ggtt_view = *ggtt_view;
+
 	if (!vm->is_ggtt)
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 
+
+	INIT_HLIST_NODE(&vma->obj_node);
+	if (obj->vma_ht == NULL &&
+	    obj->vma_list.next->next != obj->vma_list.prev->prev) {
+		obj->vma_ht = kmalloc(sizeof(struct hlist_head)*16, GFP_KERNEL);
+		if (obj->vma_ht) {
+			struct i915_vma *old;
+
+			__hash_init(obj->vma_ht, 16);
+			list_for_each_entry(old, &obj->vma_list, obj_link)
+				hlist_add_head(&old->obj_node,
+					       &obj->vma_ht[vma_hash(old)]);
+		}
+	}
+	if (obj->vma_ht)
+		hlist_add_head(&vma->obj_node, &obj->vma_ht[vma_hash(vma)]);
+	list_add_tail(&vma->obj_link, &obj->vma_list);
 	return vma;
 }
 
-struct i915_vma *
-i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm)
+static inline bool vma_matches(struct i915_vma *vma,
+			       struct i915_address_space *vm,
+			       const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma;
+	if (vma->vm != vm)
+		return false;
 
-	vma = i915_gem_obj_to_vma(obj, vm);
-	if (!vma)
-		vma = __i915_gem_vma_create(obj, vm,
-					    vm->is_ggtt ? &i915_ggtt_view_normal : NULL);
+	if (vma->ggtt_view.type != (view ? view->type : 0))
+		return false;
 
-	return vma;
+	return true;
 }
 
 struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view)
+i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
+		    struct i915_address_space *vm,
+		    const struct i915_ggtt_view *view)
 {
-	struct i915_address_space *ggtt = i915_obj_to_ggtt(obj);
 	struct i915_vma *vma;
 
-	if (WARN_ON(!view))
-		return ERR_PTR(-EINVAL);
+	if (obj->vma_ht == NULL) {
+		list_for_each_entry(vma, &obj->vma_list, obj_link) {
+			if (vma_matches(vma, vm, view))
+				return vma;
+		}
+	} else {
+		int bkt = __vma_hash(vm, view ? view->type : 0);
+		hlist_for_each_entry(vma, &obj->vma_ht[bkt], obj_node)
+			if (vma_matches(vma, vm, view))
+				return vma;
+	}
 
-	vma = i915_gem_obj_to_ggtt_view(obj, view);
+	return NULL;
+}
 
-	if (IS_ERR(vma))
-		return vma;
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view)
+{
+	struct i915_vma *vma;
 
+	vma = i915_gem_obj_to_vma(obj, vm, view);
 	if (!vma)
-		vma = __i915_gem_vma_create(obj, ggtt, view);
+		vma = __i915_gem_vma_create(obj, vm, view);
 
 	return vma;
-
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 4e6cdaba2569..bdae99da71c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -174,7 +174,8 @@ struct i915_vma {
 	/** This object's place on the active/inactive lists */
 	struct list_head mm_list;
 
-	struct list_head vma_link; /* Link in the object's VMA list */
+	struct list_head obj_link; /* Link in the object's VMA list */
+	struct hlist_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 4bb91cdadec9..140581b66481 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -47,7 +47,7 @@ render_state_get_rodata(struct drm_device *dev, const int gen)
 
 static int render_state_init(struct render_state *so, struct drm_device *dev)
 {
-	int ret;
+	struct i915_vma *vma;
 
 	so->gen = INTEL_INFO(dev)->gen;
 	so->rodata = render_state_get_rodata(dev, so->gen);
@@ -61,16 +61,16 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 	if (so->obj == NULL)
 		return -ENOMEM;
 
-	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
-	if (ret)
-		goto free_gem;
+	vma = i915_gem_obj_ggtt_pin(so->obj, 0, 0);
+	if (IS_ERR(vma)) {
+		drm_gem_object_unreference(&so->obj->base);
+		return PTR_ERR(vma);
+	}
 
-	so->ggtt_offset = i915_gem_obj_ggtt_offset(so->obj);
-	return 0;
+	so->vma = vma;
+	so->ggtt_offset = vma->node.start;
 
-free_gem:
-	drm_gem_object_unreference(&so->obj->base);
-	return ret;
+	return 0;
 }
 
 static int render_state_setup(struct render_state *so)
@@ -124,7 +124,7 @@ static int render_state_setup(struct render_state *so)
 
 void i915_gem_render_state_fini(struct render_state *so)
 {
-	i915_gem_object_ggtt_unpin(so->obj);
+	i915_vma_unpin(so->vma);
 	drm_gem_object_unreference(&so->obj->base);
 }
 
@@ -171,7 +171,7 @@ int i915_gem_render_state_init(struct intel_engine_cs *ring)
 	if (ret)
 		goto out;
 
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
+	i915_vma_move_to_active(so.vma, ring);
 
 	ret = __i915_add_request(ring, NULL, so.obj);
 	/* __i915_add_request moves object to inactive if it fails */
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
index c44961ed3fad..09eb56fafdc0 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.h
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -35,6 +35,7 @@ struct intel_renderstate_rodata {
 struct render_state {
 	const struct intel_renderstate_rodata *rodata;
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	u64 ggtt_offset;
 	int gen;
 };
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index d64c54b329b2..bd1cf921aead 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -127,7 +127,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 
 			/* For the unbound phase, this should be a no-op! */
 			list_for_each_entry_safe(vma, v,
-						 &obj->vma_list, vma_link)
+						 &obj->vma_list, obj_link)
 				if (i915_vma_unbind(vma))
 					break;
 
@@ -190,7 +190,7 @@ static int num_vma_bound(struct drm_i915_gem_object *obj)
 	struct i915_vma *vma;
 	int count = 0;
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
+	list_for_each_entry(vma, &obj->vma_list, obj_link) {
 		if (drm_mm_node_allocated(&vma->node))
 			count++;
 		if (vma->pin_count)
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 348ed5abcdbf..51e0f11aed90 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -514,7 +514,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (gtt_offset == I915_GTT_OFFSET_NONE)
 		return obj;
 
-	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_out;
@@ -533,9 +533,9 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 			DRM_DEBUG_KMS("failed to allocate stolen GTT space\n");
 			goto err_vma;
 		}
-	}
 
-	vma->bound |= GLOBAL_BIND;
+		vma->bound |= GLOBAL_BIND;
+	}
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_add_tail(&vma->mm_list, &ggtt->inactive_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1719078c763a..d96276caab49 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -79,7 +79,7 @@ static unsigned long cancel_userptr(struct drm_i915_gem_object *obj)
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-		list_for_each_entry_safe(vma, tmp, &obj->vma_list, vma_link) {
+		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
 			int ret = i915_vma_unbind(vma);
 			WARN_ON(ret && ret != -EIO);
 		}
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index b7a00e464ba4..fc69f53059ef 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -607,7 +607,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 
 	reloc_offset = dst->gtt_offset;
 	if (vm->is_ggtt)
-		vma = i915_gem_obj_to_ggtt(src);
+		vma = i915_gem_obj_to_ggtt(src, NULL);
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   vma && (vma->bound & GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= dev_priv->gtt.mappable_end);
@@ -737,7 +737,7 @@ static u32 capture_pinned_bo(struct drm_i915_error_buffer *err,
 		if (err == last)
 			break;
 
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
+		list_for_each_entry(vma, &obj->vma_list, obj_link)
 			if (vma->vm == vm && vma->pin_count > 0)
 				capture_bo(err++, vma);
 	}
@@ -1096,7 +1096,7 @@ static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
 	error->active_bo_count[ndx] = i;
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		list_for_each_entry(vma, &obj->vma_list, vma_link)
+		list_for_each_entry(vma, &obj->vma_list, obj_link)
 			if (vma->vm == vm && vma->pin_count > 0)
 				i++;
 	}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 69db1c3b26a8..0cfa852983c6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2348,7 +2348,7 @@ intel_fill_fb_ggtt_view(struct i915_ggtt_view *view, struct drm_framebuffer *fb,
 	return 0;
 }
 
-int
+struct i915_vma *
 intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 			   struct drm_framebuffer *fb,
 			   const struct drm_plane_state *plane_state,
@@ -2358,6 +2358,7 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
+	struct i915_vma *vma;
 	u32 alignment;
 	int ret;
 
@@ -2386,17 +2387,17 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	case I915_FORMAT_MOD_Yf_TILED:
 		if (WARN_ONCE(INTEL_INFO(dev)->gen < 9,
 			  "Y tiling bo slipped through, driver bug!\n"))
-			return -EINVAL;
+			return ERR_PTR(-ENODEV);
 		alignment = 1 * 1024 * 1024;
 		break;
 	default:
 		MISSING_CASE(fb->modifier[0]);
-		return -EINVAL;
+		return ERR_PTR(-ENODEV);
 	}
 
 	ret = intel_fill_fb_ggtt_view(&view, fb, plane_state);
 	if (ret)
-		return ret;
+		return ERR_PTR(ret);
 
 	/* Note that the w/a also requires 64 PTE of padding following the
 	 * bo. We currently fill all unused PTE with the shadow page and so
@@ -2416,10 +2417,12 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 	intel_runtime_pm_get(dev_priv);
 
 	dev_priv->mm.interruptible = false;
-	ret = i915_gem_object_pin_to_display_plane(obj, alignment, pipelined,
+	vma = i915_gem_object_pin_to_display_plane(obj, alignment, pipelined,
 						   &view);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_interruptible;
+	}
 
 	if (obj->map_and_fenceable) {
 		/* Install a fence for tiled scan-out. Pre-i965 always needs a
@@ -2447,31 +2450,33 @@ intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 
 	dev_priv->mm.interruptible = true;
 	intel_runtime_pm_put(dev_priv);
-	return 0;
+	return vma;
 
 err_unpin:
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 err_interruptible:
 	dev_priv->mm.interruptible = true;
 	intel_runtime_pm_put(dev_priv);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 static void intel_unpin_fb_obj(struct drm_framebuffer *fb,
-			       const struct drm_plane_state *plane_state)
+			       const struct drm_plane_state *state)
 {
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
-	int ret;
+	struct i915_vma *vma;
 
 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
 
-	ret = intel_fill_fb_ggtt_view(&view, fb, plane_state);
-	WARN_ONCE(ret, "Couldn't get view from plane state!");
+	WARN_ONCE(intel_fill_fb_ggtt_view(&view, fb, state),
+		  "Couldn't get view from plane state!");
 
 	if (obj->map_and_fenceable)
 		i915_gem_object_unpin_fence(obj);
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+
+	vma = i915_gem_obj_to_ggtt(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 }
 
 /* Computes the linear offset to the base tile and adjusts x, y. bytes per pixel
@@ -10229,6 +10234,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	struct intel_unpin_work *work;
 	struct intel_engine_cs *ring;
 	bool mmio_flip;
+	struct i915_vma *vma;
 	int ret;
 
 	/*
@@ -10333,11 +10339,13 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 * synchronisation, so all we want here is to pin the framebuffer
 	 * into the display plane and skip any waits.
 	 */
-	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
+	vma = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state,
 					 mmio_flip ? i915_gem_request_get_ring(obj->last_write_req) : ring);
-	if (ret)
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto cleanup_pending;
+	}
 
 	work->gtt_offset = intel_plane_obj_offset(to_intel_plane(primary), obj)
 						  + intel_crtc->dspaddr_offset;
@@ -12544,7 +12552,11 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		if (ret)
 			DRM_DEBUG_KMS("failed to attach phys object\n");
 	} else {
-		ret = intel_pin_and_fence_fb_obj(plane, fb, new_state, NULL);
+		struct i915_vma *vma;
+
+		vma = intel_pin_and_fence_fb_obj(plane, fb, new_state, NULL);
+		if (IS_ERR(vma))
+			ret = PTR_ERR(vma);
 	}
 
 	if (ret == 0)
@@ -12565,18 +12577,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_framebuffer *fb,
-		       const struct drm_plane_state *old_state)
+		       const struct drm_plane_state *state)
 {
 	struct drm_device *dev = plane->dev;
-	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
-
-	if (WARN_ON(!obj))
-		return;
 
 	if (plane->type != DRM_PLANE_TYPE_CURSOR ||
 	    !INTEL_INFO(dev)->cursor_needs_physical) {
+		struct drm_i915_gem_object *obj = intel_fb_obj(fb);
+
+		if (WARN_ON(!obj))
+			return;
+
 		mutex_lock(&dev->struct_mutex);
-		intel_unpin_fb_obj(fb, old_state);
+		intel_unpin_fb_obj(fb, state);
 		mutex_unlock(&dev->struct_mutex);
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 160f6a28e9a1..ba4c872e2fa1 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -960,10 +960,10 @@ bool intel_get_load_detect_pipe(struct drm_connector *connector,
 void intel_release_load_detect_pipe(struct drm_connector *connector,
 				    struct intel_load_detect_pipe *old,
 				    struct drm_modeset_acquire_ctx *ctx);
-int intel_pin_and_fence_fb_obj(struct drm_plane *plane,
-			       struct drm_framebuffer *fb,
-			       const struct drm_plane_state *plane_state,
-			       struct intel_engine_cs *pipelined);
+struct i915_vma *intel_pin_and_fence_fb_obj(struct drm_plane *plane,
+					    struct drm_framebuffer *fb,
+					    const struct drm_plane_state *plane_state,
+					    struct intel_engine_cs *pipelined);
 struct drm_framebuffer *
 __intel_framebuffer_create(struct drm_device *dev,
 			   struct drm_mode_fb_cmd2 *mode_cmd,
@@ -977,7 +977,7 @@ int intel_prepare_plane_fb(struct drm_plane *plane,
 			   const struct drm_plane_state *new_state);
 void intel_cleanup_plane_fb(struct drm_plane *plane,
 			    struct drm_framebuffer *fb,
-			    const struct drm_plane_state *old_state);
+			    const struct drm_plane_state *state);
 int intel_plane_atomic_get_property(struct drm_plane *plane,
 				    const struct drm_plane_state *state,
 				    struct drm_property *property,
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 4e7e7da2e03b..033ad90201f9 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -119,6 +119,7 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
 	struct drm_device *dev = helper->dev;
 	struct drm_mode_fb_cmd2 mode_cmd = {};
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	int size, ret;
 
 	/* we don't do packed 24bpp */
@@ -151,13 +152,15 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
 	}
 
 	/* Flush everything out, we'll be doing GTT only from now on */
-	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL);
-	if (ret) {
+	vma = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		DRM_ERROR("failed to pin obj: %d\n", ret);
 		goto out_fb;
 	}
 
 	ifbdev->fb = to_intel_framebuffer(fb);
+	//ifbdev->vma = vma;
 
 	return 0;
 
@@ -279,7 +282,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	return 0;
 
 out_unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	//intel_unpin_fb_obj(&ifbdev->fb->base, NULL);
 	drm_gem_object_unreference(&obj->base);
 out_unlock:
 	mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db93eed9eacd..45f3d487944e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -475,17 +475,20 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
 	u32 ggtt_offset;
+	struct i915_vma *vma;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (ctx->engine[ring->id].pin_count++)
 		return 0;
 
-	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-	if (ret)
+	vma = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto reset_pin_count;
+	}
 
-	ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	ggtt_offset = vma->node.start;
 	if (WARN_ON(ggtt_offset & 0xFFFFFFFF00000FFFULL)) {
 		ret = -ENODEV;
 		goto unpin_ctx_obj;
@@ -500,17 +503,17 @@ static int intel_lr_context_pin(struct intel_engine_cs *ring,
 	ringbuf->regs[CTX_RING_BUFFER_START+1] =
 		i915_gem_obj_ggtt_offset(ringbuf->obj);
 
+	ctx->engine[ring->id].vma = vma;
 	return 0;
 
 unpin_ctx_obj:
-	i915_gem_object_ggtt_unpin(ctx_obj);
+	i915_vma_unpin(vma);
 reset_pin_count:
 	ctx->engine[ring->id].pin_count = 0;
 
 	return ret;
 }
 
-
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request,
 					    struct intel_context *ctx)
 {
@@ -778,17 +781,18 @@ int intel_execlists_submission(struct drm_device *dev, struct drm_file *file,
 static void intel_lr_context_unpin(struct intel_engine_cs *ring,
 				   struct intel_context *ctx)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
 
 	if (--ctx->engine[ring->id].pin_count)
 		return;
 
-	kunmap(i915_gem_object_get_page(ctx_obj, 1));
+	kunmap(i915_gem_object_get_page(ctx->engine[ring->id].state, 1));
 	ringbuf->regs = NULL;
 
 	intel_unpin_ringbuffer_obj(ringbuf);
-	i915_gem_object_ggtt_unpin(ctx_obj);
+
+	i915_vma_unpin(ctx->engine[ring->id].vma);
+	ctx->engine[ring->id].vma = NULL;
 }
 
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
@@ -1147,7 +1151,7 @@ static int intel_lr_context_render_state_init(struct intel_engine_cs *ring,
 	if (ret)
 		goto out;
 
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), ring);
+	i915_vma_move_to_active(so.vma, ring);
 
 	ret = __i915_add_request(ring, file, so.obj);
 	/* intel_logical_ring_add_request moves object to inactive if it
@@ -1735,12 +1739,14 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	}
 
 	if (is_global_default_ctx) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
-		if (ret) {
-			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
-					ret);
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		if (IS_ERR(vma)) {
+			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %ld\n",
+					 PTR_ERR(vma));
 			drm_gem_object_unreference(&ctx_obj->base);
-			return ret;
+			return PTR_ERR(vma);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 5fd2d5ac02e2..936cf160bb7d 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -170,8 +170,8 @@ struct overlay_registers {
 struct intel_overlay {
 	struct drm_device *dev;
 	struct intel_crtc *crtc;
-	struct drm_i915_gem_object *vid_bo;
-	struct drm_i915_gem_object *old_vid_bo;
+	struct drm_i915_gem_object *vid_bo, *old_vid_bo;
+	struct i915_vma *vid_vma, *old_vid_vma;
 	bool active;
 	bool pfit_active;
 	u32 pfit_vscale_ratio; /* shifted-point number, (1<<12) == 1.0 */
@@ -197,7 +197,7 @@ intel_overlay_map_regs(struct intel_overlay *overlay)
 		regs = (struct overlay_registers __iomem *)overlay->reg_bo->phys_handle->vaddr;
 	else
 		regs = io_mapping_map_wc(dev_priv->gtt.mappable,
-					 i915_gem_obj_ggtt_offset(overlay->reg_bo));
+					 overlay->flip_addr);
 
 	return regs;
 }
@@ -299,7 +299,7 @@ static void intel_overlay_release_old_vid_tail(struct intel_overlay *overlay)
 {
 	struct drm_i915_gem_object *obj = overlay->old_vid_bo;
 
-	i915_gem_object_ggtt_unpin(obj);
+	i915_gem_object_unpin_from_display_plane(overlay->old_vid_vma);
 	drm_gem_object_unreference(&obj->base);
 
 	overlay->old_vid_bo = NULL;
@@ -718,6 +718,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	struct drm_device *dev = overlay->dev;
 	u32 swidth, swidthsw, sheight, ostride;
 	enum pipe pipe = overlay->crtc->pipe;
+	struct i915_vma *vma;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 	WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex));
@@ -726,10 +727,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
-	ret = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL,
-						   &i915_ggtt_view_normal);
-	if (ret != 0)
-		return ret;
+	vma = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL, NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	ret = i915_gem_object_put_fence(new_bo);
 	if (ret)
@@ -772,7 +772,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	swidth = params->src_w;
 	swidthsw = calc_swidthsw(overlay->dev, params->offset_Y, tmp_width);
 	sheight = params->src_h;
-	iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_Y, &regs->OBUF_0Y);
+	iowrite32(vma->node.start + params->offset_Y, &regs->OBUF_0Y);
 	ostride = params->stride_Y;
 
 	if (params->format & I915_OVERLAY_YUV_PLANAR) {
@@ -786,8 +786,8 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 				      params->src_w/uv_hscale);
 		swidthsw |= max_t(u32, tmp_U, tmp_V) << 16;
 		sheight |= (params->src_h/uv_vscale) << 16;
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_U, &regs->OBUF_0U);
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_V, &regs->OBUF_0V);
+		iowrite32(vma->node.start + params->offset_U, &regs->OBUF_0U);
+		iowrite32(vma->node.start + params->offset_V, &regs->OBUF_0V);
 		ostride |= params->stride_UV << 16;
 	}
 
@@ -812,7 +812,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 			  INTEL_FRONTBUFFER_OVERLAY(pipe));
 
 	overlay->old_vid_bo = overlay->vid_bo;
+	overlay->old_vid_vma = overlay->vid_vma;
 	overlay->vid_bo = new_bo;
+	overlay->vid_vma = vma;
 
 	intel_frontbuffer_flip(dev,
 			       INTEL_FRONTBUFFER_OVERLAY(pipe));
@@ -820,7 +822,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	return 0;
 
 out_unpin:
-	i915_gem_object_ggtt_unpin(new_bo);
+	i915_gem_object_unpin_from_display_plane(vma);
 	return ret;
 }
 
@@ -1383,12 +1385,15 @@ void intel_setup_overlay(struct drm_device *dev)
 		}
 		overlay->flip_addr = reg_bo->phys_handle->busaddr;
 	} else {
-		ret = i915_gem_obj_ggtt_pin(reg_bo, PAGE_SIZE, PIN_MAPPABLE);
-		if (ret) {
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_ggtt_pin(reg_bo, PAGE_SIZE, PIN_MAPPABLE);
+		if (IS_ERR(vma)) {
 			DRM_ERROR("failed to pin overlay register bo\n");
+			ret = PTR_ERR(vma);
 			goto out_free_bo;
 		}
-		overlay->flip_addr = i915_gem_obj_ggtt_offset(reg_bo);
+		overlay->flip_addr = vma->node.start;
 
 		ret = i915_gem_object_set_to_gtt_domain(reg_bo, true);
 		if (ret) {
@@ -1466,7 +1471,7 @@ intel_overlay_map_regs_atomic(struct intel_overlay *overlay)
 			overlay->reg_bo->phys_handle->vaddr;
 	else
 		regs = io_mapping_map_atomic_wc(dev_priv->gtt.mappable,
-						i915_gem_obj_ggtt_offset(overlay->reg_bo));
+						overlay->flip_addr);
 
 	return regs;
 }
@@ -1499,7 +1504,7 @@ intel_overlay_capture_error_state(struct drm_device *dev)
 	if (OVERLAY_NEEDS_PHYSICAL(overlay->dev))
 		error->base = (__force long)overlay->reg_bo->phys_handle->vaddr;
 	else
-		error->base = i915_gem_obj_ggtt_offset(overlay->reg_bo);
+		error->base = overlay->flip_addr;
 
 	regs = intel_overlay_map_regs_atomic(overlay);
 	if (!regs)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 913efe47054d..dfde7fd7b45e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -636,10 +636,11 @@ intel_fini_pipe_control(struct intel_engine_cs *ring)
 	if (ring->scratch.obj == NULL)
 		return;
 
-	if (INTEL_INFO(dev)->gen >= 5) {
+	if (INTEL_INFO(dev)->gen >= 5)
 		kunmap(sg_page(ring->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(ring->scratch.obj);
-	}
+
+	if (ring->scratch.vma)
+		i915_vma_unpin(ring->scratch.vma);
 
 	drm_gem_object_unreference(&ring->scratch.obj->base);
 	ring->scratch.obj = NULL;
@@ -648,6 +649,7 @@ intel_fini_pipe_control(struct intel_engine_cs *ring)
 int
 intel_init_pipe_control(struct intel_engine_cs *ring)
 {
+	struct i915_vma *vma;
 	int ret;
 
 	WARN_ON(ring->scratch.obj);
@@ -663,11 +665,13 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_obj_ggtt_pin(ring->scratch.obj, 4096, 0);
-	if (ret)
+	vma = i915_gem_obj_ggtt_pin(ring->scratch.obj, 4096, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_unref;
+	}
 
-	ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(ring->scratch.obj);
+	ring->scratch.gtt_offset = vma->node.start;
 	ring->scratch.cpu_page = kmap(sg_page(ring->scratch.obj->pages->sgl));
 	if (ring->scratch.cpu_page == NULL) {
 		ret = -ENOMEM;
@@ -676,10 +680,11 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 
 	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
 			 ring->name, ring->scratch.gtt_offset);
+	ring->scratch.vma = vma;
 	return 0;
 
 err_unpin:
-	i915_gem_object_ggtt_unpin(ring->scratch.obj);
+	i915_vma_unpin(vma);
 err_unref:
 	drm_gem_object_unreference(&ring->scratch.obj->base);
 err:
@@ -1823,45 +1828,45 @@ static void cleanup_status_page(struct intel_engine_cs *ring)
 static int init_status_page(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	unsigned flags;
+	int ret;
 
-	if ((obj = ring->status_page.obj) == NULL) {
-		unsigned flags;
-		int ret;
+	if (ring->status_page.obj)
+		return 0;
 
-		obj = i915_gem_object_create_internal(ring->dev, 4096);
-		if (obj == NULL) {
-			DRM_ERROR("Failed to allocate status page\n");
-			return -ENOMEM;
-		}
+	obj = i915_gem_object_create_internal(ring->dev, 4096);
+	if (obj == NULL) {
+		DRM_ERROR("Failed to allocate status page\n");
+		return -ENOMEM;
+	}
 
-		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-		if (ret)
-			goto err_unref;
-
-		flags = 0;
-		if (!HAS_LLC(ring->dev))
-			/* On g33, we cannot place HWS above 256MiB, so
-			 * restrict its pinning to the low mappable arena.
-			 * Though this restriction is not documented for
-			 * gen4, gen5, or byt, they also behave similarly
-			 * and hang if the HWS is placed at the top of the
-			 * GTT. To generalise, it appears that all !llc
-			 * platforms have issues with us placing the HWS
-			 * above the mappable region (even though we never
-			 * actualy map it).
-			 */
-			flags |= PIN_MAPPABLE;
-		ret = i915_gem_obj_ggtt_pin(obj, 4096, flags);
-		if (ret) {
-err_unref:
-			drm_gem_object_unreference(&obj->base);
-			return ret;
-		}
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
 
-		ring->status_page.obj = obj;
+	flags = 0;
+	if (!HAS_LLC(ring->dev))
+		/* On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actualy map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	vma = i915_gem_obj_ggtt_pin(obj, 4096, flags);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
 	}
 
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
+	ring->status_page.obj = obj;
+	ring->status_page.gfx_addr = vma->node.start;
+
 	ring->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
 	memset(ring->status_page.page_addr, 0, PAGE_SIZE);
 
@@ -1869,6 +1874,10 @@ err_unref:
 			ring->name, ring->status_page.gfx_addr);
 
 	return 0;
+
+err_unref:
+	drm_gem_object_unreference(&obj->base);
+	return ret;
 }
 
 static int init_phys_status_page(struct intel_engine_cs *ring)
@@ -1894,7 +1903,8 @@ void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 		i915_gem_object_unpin_vmap(ringbuf->obj);
 	else
 		iounmap(ringbuf->virtual_start);
-	i915_gem_object_ggtt_unpin(ringbuf->obj);
+	i915_vma_unpin(ringbuf->vma);
+	ringbuf->vma = NULL;
 }
 
 int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
@@ -1902,11 +1912,12 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj = ringbuf->obj;
+	struct i915_vma *vma;
 	int ret;
 
-	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
-	if (ret)
-		return ret;
+	vma = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	if (HAS_LLC(dev_priv) && !obj->stolen) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
@@ -1932,10 +1943,11 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 		}
 	}
 
+	ringbuf->vma = vma;
 	return 0;
 
 unpin:
-	i915_gem_object_ggtt_unpin(obj);
+	i915_vma_unpin(vma);
 	return ret;
 }
 
@@ -2448,14 +2460,18 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 				i915.semaphores = 0;
 			} else {
+				struct i915_vma *vma;
+
 				i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-				ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK);
-				if (ret != 0) {
+				vma = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK);
+				if (IS_ERR(vma)) {
 					drm_gem_object_unreference(&obj->base);
 					DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
 					i915.semaphores = 0;
-				} else
-					dev_priv->semaphore_obj = obj;
+					obj = NULL;
+				}
+
+				dev_priv->semaphore_obj = obj;
 			}
 		}
 
@@ -2549,21 +2565,24 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
+		struct i915_vma *vma;
+
 		obj = i915_gem_object_create_internal(dev, I830_WA_SIZE);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate batch bo\n");
 			return -ENOMEM;
 		}
 
-		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
-		if (ret != 0) {
+		vma = i915_gem_obj_ggtt_pin(obj, 0, 0);
+		if (IS_ERR(vma)) {
 			drm_gem_object_unreference(&obj->base);
-			DRM_ERROR("Failed to ping batch bo\n");
-			return ret;
+			DRM_ERROR("Failed to pin batch bo\n");
+			return PTR_ERR(vma);
 		}
 
 		ring->scratch.obj = obj;
-		ring->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
+		ring->scratch.vma = vma;
+		ring->scratch.gtt_offset = vma->node.start;
 	}
 
 	ret = intel_init_ring_buffer(dev, ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 75268b7d2d41..58931f902ccf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -101,6 +101,7 @@ struct intel_ringbuffer {
 	uint32_t descriptor;
 
 	struct intel_engine_cs *ring;
+	struct i915_vma *vma;
 
 	u32 head;
 	u32 tail;
@@ -289,6 +290,7 @@ struct  intel_engine_cs {
 
 	struct {
 		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
 		u32 gtt_offset;
 		volatile u32 *cpu_page;
 	} scratch;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-04-07 15:20 ` [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
@ 2015-04-08 11:16   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:25PM +0100, Chris Wilson wrote:
> The biggest user of i915_gem_object_get_page() is the relocation
> processing during execbuffer. Typically userspace passes in a set of
> relocations in sorted order. Sadly, we alternate between relocations
> increasing from the start of the buffers, and relocations decreasing
> from the end. However the majority of consecutive lookups will still be
> in the same page. We could cache the start of the last sg chain, however
> for most callers, the entire sgl is inside a single chain and so we see
> no improve from the extra layer of caching.
> 
> v2: Avoid the double increment inside unlikely()
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: John Harrison <John.C.Harrison@Intel.com>

Indeed this makes gem_exec_big a lot faster. Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem.c |  4 ++++
>  2 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4f5dae9a23f9..51b21483b95f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1987,6 +1987,10 @@ struct drm_i915_gem_object {
>  
>  	struct sg_table *pages;
>  	int pages_pin_count;
> +	struct get_page {
> +		struct scatterlist *sg;
> +		int last;
> +	} get_page;
>  
>  	/* prime dma-buf support */
>  	void *dma_buf_vmapping;
> @@ -2665,15 +2669,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  				    int *needs_clflush);
>  
>  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> -static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
> +
> +static inline int __sg_page_count(struct scatterlist *sg)
> +{
> +	return sg->length >> PAGE_SHIFT;
> +}
> +
> +static inline struct page *
> +i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
>  {
> -	struct sg_page_iter sg_iter;
> +	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
> +		return NULL;
>  
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
> -		return sg_page_iter_page(&sg_iter);
> +	if (n < obj->get_page.last) {
> +		obj->get_page.sg = obj->pages->sgl;
> +		obj->get_page.last = 0;
> +	}
> +
> +	while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
> +		obj->get_page.last += __sg_page_count(obj->get_page.sg++);
> +		if (unlikely(sg_is_chain(obj->get_page.sg)))
> +			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
> +	}
>  
> -	return NULL;
> +	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
>  }
> +
>  static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
>  {
>  	BUG_ON(obj->pages == NULL);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index be4f2645b637..567affeafec4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2178,6 +2178,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>  		return ret;
>  
>  	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> +
> +	obj->get_page.sg = obj->pages->sgl;
> +	obj->get_page.last = 0;
> +
>  	return 0;
>  }
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips
  2015-04-07 15:20 ` [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
@ 2015-04-08 11:23   ` Daniel Vetter
  2015-04-08 11:29     ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:27PM +0100, Chris Wilson wrote:
> Synchronising to an object active on the same ring is a no-op, for the
> benefit of execbuffer scheduler. However, for CS flips this means that
> we can forgo checking whether the last write request of the object is
> actually queued and more importantly whether the cache flush for the
> write was emitted.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Does this go boom in reality, i.e. bugzilla/igt? If so I guess this should
be for 4.1+cc:stable? Otherwise I think I'll punt since olr is on its
demise anyway.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 4af89c27504e..0415e40cef6e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -10347,6 +10347,12 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
>  		i915_gem_request_assign(&work->flip_queued_req,
>  					obj->last_write_req);
>  	} else {
> +		if (obj->last_write_req) {
> +			ret = i915_gem_check_olr(obj->last_write_req);
> +			if (ret)
> +				goto cleanup_unpin;
> +		}
> +
>  		ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring,
>  						   page_flip_flags);
>  		if (ret)
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips
  2015-04-08 11:23   ` Daniel Vetter
@ 2015-04-08 11:29     ` Chris Wilson
  0 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-08 11:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Apr 08, 2015 at 01:23:50PM +0200, Daniel Vetter wrote:
> On Tue, Apr 07, 2015 at 04:20:27PM +0100, Chris Wilson wrote:
> > Synchronising to an object active on the same ring is a no-op, for the
> > benefit of execbuffer scheduler. However, for CS flips this means that
> > we can forgo checking whether the last write request of the object is
> > actually queued and more importantly whether the cache flush for the
> > write was emitted.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Does this go boom in reality, i.e. bugzilla/igt? If so I guess this should
> be for 4.1+cc:stable? Otherwise I think I'll punt since olr is on its
> demise anyway.

Who can say for sure? Maybe
https://bugs.freedesktop.org/show_bug.cgi?id=80948 it is cache dirt(?)
without a known cause.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request
  2015-04-07 15:20 ` [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
@ 2015-04-08 11:30   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Ander Conselvan de Oliveira, intel-gfx

On Tue, Apr 07, 2015 at 04:20:30PM +0100, Chris Wilson wrote:
> As we perform the mmio-flip without any locking and then try to acquire
> the struct_mutex prior to dereferencing the request, it is possible for
> userspace to queue a new pageflip before the worker can finish clearing
> the old state - and then it will clear the new flip request. The result
> is that the new flip could be completed before the GPU has finished
> rendering.
> 
> The bugs stems from removing the seqno checking in
> commit 536f5b5e86b225dab94c7ff8061ae482b6077387
> Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
> Date:   Thu Nov 6 11:03:40 2014 +0200
> 
>     drm/i915: Make mmio flip wait for seqno in the work function
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

Usual question: Does this go boom/do we have an igt?

> ---
>  drivers/gpu/drm/i915/i915_drv.h      |  6 ++++--
>  drivers/gpu/drm/i915/intel_display.c | 39 ++++++++++++++++++------------------
>  drivers/gpu/drm/i915/intel_drv.h     |  4 ++--
>  3 files changed, 25 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 31011988d153..0bc913934d3f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2140,10 +2140,12 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
>  	return req ? req->ring : NULL;
>  }
>  
> -static inline void
> +static inline struct drm_i915_gem_request *
>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>  {
> -	kref_get(&req->ref);
> +	if (req)
> +		kref_get(&req->ref);
> +	return req;

Neat pattern but since it's different than all the other reference
counting functions I don't think it's a good idea ...

>  }
>  
>  static inline void
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 0415e40cef6e..94c09bf0047d 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -10105,22 +10105,18 @@ static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
>  
>  static void intel_mmio_flip_work_func(struct work_struct *work)
>  {
> -	struct intel_crtc *crtc =
> -		container_of(work, struct intel_crtc, mmio_flip.work);
> -	struct intel_mmio_flip *mmio_flip;
> +	struct intel_mmio_flip *mmio_flip =
> +		container_of(work, struct intel_mmio_flip, work);
>  
> -	mmio_flip = &crtc->mmio_flip;
> -	if (mmio_flip->req)
> -		WARN_ON(__i915_wait_request(mmio_flip->req,
> -					    crtc->reset_counter,
> -					    false, NULL, NULL) != 0);
> +	if (mmio_flip->rq)
> +		WARN_ON(__i915_wait_request(mmio_flip->rq,
> +					    mmio_flip->crtc->reset_counter,
> +					    false, NULL, NULL));
>  
> -	intel_do_mmio_flip(crtc);
> -	if (mmio_flip->req) {
> -		mutex_lock(&crtc->base.dev->struct_mutex);
> -		i915_gem_request_assign(&mmio_flip->req, NULL);
> -		mutex_unlock(&crtc->base.dev->struct_mutex);
> -	}
> +	intel_do_mmio_flip(mmio_flip->crtc);
> +
> +	i915_gem_request_unreference__unlocked(mmio_flip->rq);
> +	kfree(mmio_flip);
>  }
>  
>  static int intel_queue_mmio_flip(struct drm_device *dev,
> @@ -10130,12 +10126,17 @@ static int intel_queue_mmio_flip(struct drm_device *dev,
>  				 struct intel_engine_cs *ring,
>  				 uint32_t flags)
>  {
> -	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_mmio_flip *mmio_flip;
> +
> +	mmio_flip = kmalloc(sizeof(*mmio_flip), GFP_KERNEL);
> +	if (mmio_flip == NULL)
> +		return -ENOMEM;
>  
> -	i915_gem_request_assign(&intel_crtc->mmio_flip.req,
> -				obj->last_write_req);
> +	mmio_flip->rq = i915_gem_request_reference(obj->last_write_req);
> +	mmio_flip->crtc = to_intel_crtc(crtc);
>  
> -	schedule_work(&intel_crtc->mmio_flip.work);
> +	INIT_WORK(&mmio_flip->work, intel_mmio_flip_work_func);
> +	schedule_work(&mmio_flip->work);
>  
>  	return 0;
>  }
> @@ -13059,8 +13060,6 @@ static void intel_crtc_init(struct drm_device *dev, int pipe)
>  	dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
>  	dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base;
>  
> -	INIT_WORK(&intel_crtc->mmio_flip.work, intel_mmio_flip_work_func);
> -
>  	drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs);
>  
>  	WARN_ON(drm_crtc_index(&intel_crtc->base) != intel_crtc->pipe);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 686014bd5ec0..0bcc5f36a810 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -403,8 +403,9 @@ struct intel_pipe_wm {
>  };
>  
>  struct intel_mmio_flip {
> -	struct drm_i915_gem_request *req;
>  	struct work_struct work;
> +	struct drm_i915_gem_request *rq;

I haven't really followed the discussion with Tvrkto, but what exactly is
the distinction between req and rq again? At least to me rq sounds more
like a short-form for runqueue than request, so why not just leave it at
req?

Besides these two nitpick patch looks good.
-Daniel

> +	struct intel_crtc *crtc;
>  };
>  
>  struct skl_pipe_wm {
> @@ -489,7 +490,6 @@ struct intel_crtc {
>  	} wm;
>  
>  	int scanline_offset;
> -	struct intel_mmio_flip mmio_flip;
>  
>  	struct intel_crtc_atomic_commit atomic;
>  };
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips
  2015-04-07 15:20 ` [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
@ 2015-04-08 11:31   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Tue, Apr 07, 2015 at 04:20:31PM +0100, Chris Wilson wrote:
> If we hit a vblank and see that have a pageflip queue but not yet
> processed, ensure that the GPU is running at maximum in order to clear
> the backlog. Pageflips are only queued for the following vblank, if we
> miss it, there will be a visible stutter. Boosting the GPU frequency
> doesn't prevent us from missing the target vblank, but it should help
> the subsequent frames hitting theirs.
> 
> v2: Reorder vblank vs flip-complete so that we only check for a missed
> flip after processing the completion events, and avoid spurious boosts.
> 
> v3: Rename missed_vblank
> v4: Rebase
> v5: Cancel the outstanding work in runtime suspend
> v6: Rebase
> v7: Rebase required fixing
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Deepak S<deepak.s@linux.intel.com>

This one also had an r-b from Deepak already, I applied all the rps tuning
patches from this series.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/intel_display.c | 11 ++++++++---
>  drivers/gpu/drm/i915/intel_drv.h     |  2 ++
>  drivers/gpu/drm/i915/intel_pm.c      | 35 +++++++++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 94c09bf0047d..1846fb510ebb 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -10195,6 +10195,7 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
>  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> +	struct intel_unpin_work *work;
>  
>  	WARN_ON(!in_interrupt());
>  
> @@ -10202,12 +10203,16 @@ void intel_check_page_flip(struct drm_device *dev, int pipe)
>  		return;
>  
>  	spin_lock(&dev->event_lock);
> -	if (intel_crtc->unpin_work && __intel_pageflip_stall_check(dev, crtc)) {
> +	work = intel_crtc->unpin_work;
> +	if (work != NULL && __intel_pageflip_stall_check(dev, crtc)) {
>  		WARN_ONCE(1, "Kicking stuck page flip: queued at %d, now %d\n",
> -			 intel_crtc->unpin_work->flip_queued_vblank,
> -			 drm_vblank_count(dev, pipe));
> +			 work->flip_queued_vblank, drm_vblank_count(dev, pipe));
>  		page_flip_completed(intel_crtc);
> +		work = NULL;
>  	}
> +	if (work != NULL &&
> +	    drm_vblank_count(dev, pipe) - work->flip_queued_vblank > 1)
> +		intel_queue_rps_boost_for_request(dev, work->flip_queued_req);
>  	spin_unlock(&dev->event_lock);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 0bcc5f36a810..4f1d02af1237 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1263,6 +1263,8 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv);
>  void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
>  void gen6_rps_idle(struct drm_i915_private *dev_priv);
>  void gen6_rps_boost(struct drm_i915_private *dev_priv);
> +void intel_queue_rps_boost_for_request(struct drm_device *dev,
> +				       struct drm_i915_gem_request *rq);
>  void ilk_wm_get_hw_state(struct drm_device *dev);
>  void skl_wm_get_hw_state(struct drm_device *dev);
>  void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 50c03472ea41..3e98f30517c6 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6772,6 +6772,41 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
>  		return val / GT_FREQUENCY_MULTIPLIER;
>  }
>  
> +struct request_boost {
> +	struct work_struct work;
> +	struct drm_i915_gem_request *rq;
> +};
> +
> +static void __intel_rps_boost_work(struct work_struct *work)
> +{
> +	struct request_boost *boost = container_of(work, struct request_boost, work);
> +
> +	if (!i915_gem_request_completed(boost->rq, true))
> +		gen6_rps_boost(to_i915(boost->rq->ring->dev));
> +
> +	i915_gem_request_unreference__unlocked(boost->rq);
> +	kfree(boost);
> +}
> +
> +void intel_queue_rps_boost_for_request(struct drm_device *dev,
> +				       struct drm_i915_gem_request *rq)
> +{
> +	struct request_boost *boost;
> +
> +	if (rq == NULL || INTEL_INFO(dev)->gen < 6)
> +		return;
> +
> +	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
> +	if (boost == NULL)
> +		return;
> +
> +	i915_gem_request_reference(rq);
> +	boost->rq = rq;
> +
> +	INIT_WORK(&boost->work, __intel_rps_boost_work);
> +	queue_work(to_i915(dev)->wq, &boost->work);
> +}
> +
>  void intel_pm_setup(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs
  2015-04-07 15:20 ` [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
@ 2015-04-08 11:33   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:39PM +0100, Chris Wilson wrote:
> Since we use obj->active as a hint in many places throughout the code,
> knowing its state in debugfs is extremely useful.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Merged all the batch pool tuning plus this one with Tvrtko's review.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 11eebc28775a..e87f031abc99 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -123,8 +123,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  	struct i915_vma *vma;
>  	int pin_count = 0;
>  
> -	seq_printf(m, "%pK: %s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
> +	seq_printf(m, "%pK: %s%s%s%s %8zdKiB %02x %02x %x %x %x%s%s%s",
>  		   &obj->base,
> +		   obj->active ? "*" : " ",
>  		   get_pin_flag(obj),
>  		   get_tiling_flag(obj),
>  		   get_global_flag(obj),
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects
  2015-04-07 15:20 ` [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
@ 2015-04-08 11:34   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:40PM +0100, Chris Wilson wrote:
> This is just so that I don't have to read about the batch pool on
> systems that are not using it! Rather than using a newline between the
> kernel clients and userspace clients, just distinguish the internal
> allocations with a '[k]'
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index e87f031abc99..fbba5c267f5d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -362,16 +362,18 @@ static int per_file_stats(int id, void *ptr, void *data)
>  	return 0;
>  }
>  
> -#define print_file_stats(m, name, stats) \
> -	seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
> -		   name, \
> -		   stats.count, \
> -		   stats.total, \
> -		   stats.active, \
> -		   stats.inactive, \
> -		   stats.global, \
> -		   stats.shared, \
> -		   stats.unbound)
> +#define print_file_stats(m, name, stats) do { \
> +	if (stats.count) \
> +		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
> +			   name, \
> +			   stats.count, \
> +			   stats.total, \
> +			   stats.active, \
> +			   stats.inactive, \
> +			   stats.global, \
> +			   stats.shared, \
> +			   stats.unbound); \
> +} while (0)
>  
>  static void print_batch_pool_stats(struct seq_file *m,
>  				   struct drm_i915_private *dev_priv)
> @@ -392,7 +394,7 @@ static void print_batch_pool_stats(struct seq_file *m,
>  		}
>  	}
>  
> -	print_file_stats(m, "batch pool", stats);
> +	print_file_stats(m, "[k]batch pool", stats);
>  }
>  
>  #define count_vmas(list, member) do { \
> @@ -478,8 +480,6 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  
>  	seq_putc(m, '\n');
>  	print_batch_pool_stats(m, dev_priv);
> -
> -	seq_putc(m, '\n');
>  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
>  		struct file_stats stats;
>  		struct task_struct *task;
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-07 15:20 ` [PATCH 17/70] drm/i915: Optimistically spin for the request completion Chris Wilson
@ 2015-04-08 11:39   ` Daniel Vetter
  2015-04-08 13:43     ` Rantala, Valtteri
  2015-04-13 11:34   ` Tvrtko Ursulin
  1 sibling, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Rantala, Valtteri, Eero Tamminen

On Tue, Apr 07, 2015 at 04:20:41PM +0100, Chris Wilson wrote:
> This provides a nice boost to mesa in swap bound scenarios (as mesa
> throttles itself to the previous frame and given the scenario that will
> complete shortly). It will also provide a good boost to systems running
> with semaphores disabled and so frequently waiting on the GPU as it
> switches rings. In the most favourable of microbenchmarks, this can
> increase performance by around 15% - though in practice improvements
> will be marginal and rarely noticeable.
> 
> v2: Account for user timeouts
> v3: Limit the spinning to a single jiffie (~1us) at most. On an
> otherwise idle system, there is no scheduler contention and so without a
> limit we would spin until the GPU is ready.
> v4: Drop forcewake - the lazy coherent access doesn't require it, and we
> have no reason to believe that the forcewake itself improves seqno
> coherency - it only adds delay.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>

Eero/Valtterri, do you have perf data for this one?

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 44 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c7d9ee2f708a..47650327204e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1181,6 +1181,29 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
>  	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
>  }
>  
> +static int __i915_spin_request(struct drm_i915_gem_request *rq)
> +{
> +	unsigned long timeout;
> +
> +	if (i915_gem_request_get_ring(rq)->irq_refcount)
> +		return -EBUSY;
> +
> +	timeout = jiffies + 1;
> +	while (!need_resched()) {
> +		if (i915_gem_request_completed(rq, true))
> +			return 0;
> +
> +		if (time_after_eq(jiffies, timeout))
> +			break;
> +
> +		cpu_relax_lowlatency();
> +	}
> +	if (i915_gem_request_completed(rq, false))
> +		return 0;
> +
> +	return -EAGAIN;
> +}
> +
>  /**
>   * __i915_wait_request - wait until execution of request has finished
>   * @req: duh!
> @@ -1225,12 +1248,20 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	if (INTEL_INFO(dev)->gen >= 6)
>  		gen6_rps_boost(dev_priv, file_priv);
>  
> -	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
> -		return -ENODEV;
> -
>  	/* Record current time in case interrupted by signal, or wedged */
>  	trace_i915_gem_request_wait_begin(req);
>  	before = ktime_get_raw_ns();
> +
> +	/* Optimistic spin for the next jiffie before touching IRQs */
> +	ret = __i915_spin_request(req);
> +	if (ret == 0)
> +		goto out;
> +
> +	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
>  	for (;;) {
>  		struct timer_list timer;
>  
> @@ -1279,14 +1310,15 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			destroy_timer_on_stack(&timer);
>  		}
>  	}
> -	now = ktime_get_raw_ns();
> -	trace_i915_gem_request_wait_end(req);
> -
>  	if (!irq_test_in_progress)
>  		ring->irq_put(ring);
>  
>  	finish_wait(&ring->irq_queue, &wait);
>  
> +out:
> +	now = ktime_get_raw_ns();
> +	trace_i915_gem_request_wait_end(req);
> +
>  	if (timeout) {
>  		s64 tres = *timeout - (now - before);
>  
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
  2015-04-07 15:20 ` [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
@ 2015-04-08 11:46   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:44PM +0100, Chris Wilson wrote:
> Ring switches can occur many times per frame, and are often out of
> control, causing frequent RPS boosting for no practical benefit. Treat
> the sw semaphore synchronisation as a separate client and only allow it
> to boost once per busy/idle cycle.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  1 +
>  drivers/gpu/drm/i915/i915_drv.h     | 34 ++++++++++++++++++----------------
>  drivers/gpu/drm/i915/i915_gem.c     |  7 +++++--
>  drivers/gpu/drm/i915/intel_pm.c     |  1 +
>  4 files changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5da74b46e202..c8fe548af41d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2296,6 +2296,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>  			   list_empty(&file_priv->rps_boost) ? "" : ", active");
>  		rcu_read_unlock();
>  	}
> +	seq_printf(m, "Semaphore boosts: %d\n", dev_priv->rps.semaphores.rps_boosts);
>  	seq_printf(m, "Kernel boosts: %d\n", dev_priv->rps.boosts);
>  
>  	mutex_unlock(&dev_priv->rps.hw_lock);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d35778797ef0..057a1346e81f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -268,6 +268,22 @@ struct drm_i915_private;
>  struct i915_mm_struct;
>  struct i915_mmu_object;
>  
> +struct drm_i915_file_private {
> +	struct drm_i915_private *dev_priv;
> +	struct drm_file *file;
> +
> +	struct {
> +		spinlock_t lock;
> +		struct list_head request_list;
> +	} mm;
> +	struct idr context_idr;
> +
> +	struct list_head rps_boost;
> +	struct intel_engine_cs *bsd_ring;
> +
> +	unsigned rps_boosts;
> +};

This looks really confusing to me and feels a bit too much like abuse. I
think extracting a tiny struct intel_rps_boost_client and switching the
interfaces of gen6_rps_boost and __i915_wait_request over to it instead of
using the file_priv would be a lot clearer.

Otherwise these two patches make a lot of sense, but can you please cc
Deepak when resending for his ack/r-b?

Thanks, Daniel

> +
>  enum intel_dpll_id {
>  	DPLL_ID_PRIVATE = -1, /* non-shared dpll in use */
>  	/* real shared dpll ids must be >= 0 */
> @@ -1047,6 +1063,8 @@ struct intel_gen6_power_mgmt {
>  	struct list_head clients;
>  	unsigned boosts;
>  
> +	struct drm_i915_file_private semaphores;
> +
>  	/* manual wa residency calculations */
>  	struct intel_rps_ei up_ei, down_ei;
>  
> @@ -2185,22 +2203,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   * a later patch when the call to i915_seqno_passed() is obsoleted...
>   */
>  
> -struct drm_i915_file_private {
> -	struct drm_i915_private *dev_priv;
> -	struct drm_file *file;
> -
> -	struct {
> -		spinlock_t lock;
> -		struct list_head request_list;
> -	} mm;
> -	struct idr context_idr;
> -
> -	struct list_head rps_boost;
> -	struct intel_engine_cs *bsd_ring;
> -
> -	unsigned rps_boosts;
> -};
> -
>  /*
>   * A command that requires special handling by the command parser.
>   */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a32a84598fac..3d31ff11fbef 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3045,9 +3045,12 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  		return ret;
>  
>  	if (!i915_semaphore_is_enabled(obj->base.dev)) {
> +		struct drm_i915_private *i915 = to_i915(obj->base.dev);
>  		ret = __i915_wait_request(rq,
> -					  atomic_read(&to_i915(obj->base.dev)->gpu_error.reset_counter),
> -					  to_i915(obj->base.dev)->mm.interruptible, NULL, NULL);
> +					  atomic_read(&i915->gpu_error.reset_counter),
> +					  i915->mm.interruptible,
> +					  NULL,
> +					  &i915->rps.semaphores);
>  		if (ret)
>  			return ret;
>  
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index d3f4e9593db1..3e274cf3adaa 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6827,6 +6827,7 @@ void intel_pm_setup(struct drm_device *dev)
>  	INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work,
>  			  intel_gen6_powersave_work);
>  	INIT_LIST_HEAD(&dev_priv->rps.clients);
> +	INIT_LIST_HEAD(&dev_priv->rps.semaphores.rps_boost);
>  
>  	dev_priv->pm.suspended = false;
>  }
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 23/70] drm/i915: Record ring->start address in error state
  2015-04-07 15:20 ` [PATCH 23/70] drm/i915: Record ring->start address in error state Chris Wilson
@ 2015-04-08 11:47   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 11:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:47PM +0100, Chris Wilson wrote:
> This is mostly useful for execlists where the rings switch between
> contexts (and so checking that the ring's start register matches the
> context is important).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h       |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++----
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8bb7e66dd4cd..d69ccd16cd60 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -471,6 +471,7 @@ struct drm_i915_error_state {
>  		u32 semaphore_seqno[I915_NUM_RINGS - 1];
>  
>  		/* Register state */
> +		u32 start;
>  		u32 tail;
>  		u32 head;
>  		u32 ctl;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 5f798961266f..17dc2fcaba10 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -256,10 +256,11 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
>  		return;
>  
>  	err_printf(m, "%s command stream:\n", ring_str(ring_idx));
> -	err_printf(m, "  HEAD: 0x%08x\n", ring->head);
> -	err_printf(m, "  TAIL: 0x%08x\n", ring->tail);
> -	err_printf(m, "  CTL: 0x%08x\n", ring->ctl);
> -	err_printf(m, "  HWS: 0x%08x\n", ring->hws);
> +	err_printf(m, "  START: 0x%08x\n", ring->start);
> +	err_printf(m, "  HEAD:  0x%08x\n", ring->head);
> +	err_printf(m, "  TAIL:  0x%08x\n", ring->tail);
> +	err_printf(m, "  CTL:   0x%08x\n", ring->ctl);
> +	err_printf(m, "  HWS:   0x%08x\n", ring->hws);
>  	err_printf(m, "  ACTHD: 0x%08x %08x\n", (u32)(ring->acthd>>32), (u32)ring->acthd);
>  	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
>  	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
> @@ -890,6 +891,7 @@ static void i915_record_ring_state(struct drm_device *dev,
>  	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
>  	ering->seqno = ring->get_seqno(ring, false);
>  	ering->acthd = intel_ring_get_active_head(ring);
> +	ering->start = I915_READ_START(ring);
>  	ering->head = I915_READ_HEAD(ring);
>  	ering->tail = I915_READ_TAIL(ring);
>  	ering->ctl = I915_READ_CTL(ring);
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-08 11:39   ` Daniel Vetter
@ 2015-04-08 13:43     ` Rantala, Valtteri
  2015-04-08 14:15       ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Rantala, Valtteri @ 2015-04-08 13:43 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Tamminen, Eero T

Hi, 


> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> Sent: Wednesday, April 08, 2015 2:40 PM
> To: Chris Wilson
> Cc: intel-gfx@lists.freedesktop.org; Daniel Vetter; Tvrtko Ursulin; Tamminen,
> Eero T; Rantala, Valtteri
> Subject: Re: [PATCH 17/70] drm/i915: Optimistically spin for the request
> completion
> 
> On Tue, Apr 07, 2015 at 04:20:41PM +0100, Chris Wilson wrote:
> > This provides a nice boost to mesa in swap bound scenarios (as mesa
> > throttles itself to the previous frame and given the scenario that
> > will complete shortly). It will also provide a good boost to systems
> > running with semaphores disabled and so frequently waiting on the GPU
> > as it switches rings. In the most favourable of microbenchmarks, this
> > can increase performance by around 15% - though in practice
> > improvements will be marginal and rarely noticeable.
> >
> > v2: Account for user timeouts
> > v3: Limit the spinning to a single jiffie (~1us) at most. On an
> > otherwise idle system, there is no scheduler contention and so without
> > a limit we would spin until the GPU is ready.
> > v4: Drop forcewake - the lazy coherent access doesn't require it, and
> > we have no reason to believe that the forcewake itself improves seqno
> > coherency - it only adds delay.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> 
> Eero/Valtterri, do you have perf data for this one?
> 
[Rantala, Valtteri] 
I have issues with applying this patch to latest night, have to check that out.

Can you provide Git/branch that I could use?

--
Valtteri

> Thanks, Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 44
> > +++++++++++++++++++++++++++++++++++------
> >  1 file changed, 38 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index c7d9ee2f708a..47650327204e
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1181,6 +1181,29 @@ static bool missed_irq(struct drm_i915_private
> *dev_priv,
> >  	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
> >  }
> >
> > +static int __i915_spin_request(struct drm_i915_gem_request *rq) {
> > +	unsigned long timeout;
> > +
> > +	if (i915_gem_request_get_ring(rq)->irq_refcount)
> > +		return -EBUSY;
> > +
> > +	timeout = jiffies + 1;
> > +	while (!need_resched()) {
> > +		if (i915_gem_request_completed(rq, true))
> > +			return 0;
> > +
> > +		if (time_after_eq(jiffies, timeout))
> > +			break;
> > +
> > +		cpu_relax_lowlatency();
> > +	}
> > +	if (i915_gem_request_completed(rq, false))
> > +		return 0;
> > +
> > +	return -EAGAIN;
> > +}
> > +
> >  /**
> >   * __i915_wait_request - wait until execution of request has finished
> >   * @req: duh!
> > @@ -1225,12 +1248,20 @@ int __i915_wait_request(struct
> drm_i915_gem_request *req,
> >  	if (INTEL_INFO(dev)->gen >= 6)
> >  		gen6_rps_boost(dev_priv, file_priv);
> >
> > -	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring)))
> > -		return -ENODEV;
> > -
> >  	/* Record current time in case interrupted by signal, or wedged
> */
> >  	trace_i915_gem_request_wait_begin(req);
> >  	before = ktime_get_raw_ns();
> > +
> > +	/* Optimistic spin for the next jiffie before touching IRQs */
> > +	ret = __i915_spin_request(req);
> > +	if (ret == 0)
> > +		goto out;
> > +
> > +	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
> > +		ret = -ENODEV;
> > +		goto out;
> > +	}
> > +
> >  	for (;;) {
> >  		struct timer_list timer;
> >
> > @@ -1279,14 +1310,15 @@ int __i915_wait_request(struct
> drm_i915_gem_request *req,
> >  			destroy_timer_on_stack(&timer);
> >  		}
> >  	}
> > -	now = ktime_get_raw_ns();
> > -	trace_i915_gem_request_wait_end(req);
> > -
> >  	if (!irq_test_in_progress)
> >  		ring->irq_put(ring);
> >
> >  	finish_wait(&ring->irq_queue, &wait);
> >
> > +out:
> > +	now = ktime_get_raw_ns();
> > +	trace_i915_gem_request_wait_end(req);
> > +
> >  	if (timeout) {
> >  		s64 tres = *timeout - (now - before);
> >
> > --
> > 2.1.4
> >
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-08 13:43     ` Rantala, Valtteri
@ 2015-04-08 14:15       ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-08 14:15 UTC (permalink / raw)
  To: Rantala, Valtteri; +Cc: Daniel Vetter, intel-gfx, Tamminen, Eero T

On Wed, Apr 08, 2015 at 01:43:47PM +0000, Rantala, Valtteri wrote:
> Hi, 
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> > Sent: Wednesday, April 08, 2015 2:40 PM
> > To: Chris Wilson
> > Cc: intel-gfx@lists.freedesktop.org; Daniel Vetter; Tvrtko Ursulin; Tamminen,
> > Eero T; Rantala, Valtteri
> > Subject: Re: [PATCH 17/70] drm/i915: Optimistically spin for the request
> > completion
> > 
> > On Tue, Apr 07, 2015 at 04:20:41PM +0100, Chris Wilson wrote:
> > > This provides a nice boost to mesa in swap bound scenarios (as mesa
> > > throttles itself to the previous frame and given the scenario that
> > > will complete shortly). It will also provide a good boost to systems
> > > running with semaphores disabled and so frequently waiting on the GPU
> > > as it switches rings. In the most favourable of microbenchmarks, this
> > > can increase performance by around 15% - though in practice
> > > improvements will be marginal and rarely noticeable.
> > >
> > > v2: Account for user timeouts
> > > v3: Limit the spinning to a single jiffie (~1us) at most. On an
> > > otherwise idle system, there is no scheduler contention and so without
> > > a limit we would spin until the GPU is ready.
> > > v4: Drop forcewake - the lazy coherent access doesn't require it, and
> > > we have no reason to believe that the forcewake itself improves seqno
> > > coherency - it only adds delay.
> > >
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > > Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> > 
> > Eero/Valtterri, do you have perf data for this one?
> > 
> [Rantala, Valtteri] 
> I have issues with applying this patch to latest night, have to check that out.
> 
> Can you provide Git/branch that I could use?

It applies cleanly here afaict - you double-checked that you updated
drm-intel.git? Just today I merged a pile of patches from Chris' series
here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code
  2015-04-07 15:20 ` [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
@ 2015-04-09 15:02   ` Daniel Vetter
  2015-04-09 15:24     ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-09 15:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:51PM +0100, Chris Wilson wrote:
> @@ -640,7 +641,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>  			break;
>  	}
>  
> -	if (&request->list == &ring->request_list)
> +	if (WARN_ON(&request->list == &ring->request_list))
>  		return -ENOSPC;

Checking for new_space < n (and initializing new_space to 0) would be a
clearer check imo. But that's just a bikeshed. Same for the legacy one
below.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code
  2015-04-09 15:02   ` Daniel Vetter
@ 2015-04-09 15:24     ` Chris Wilson
  2015-04-09 15:31       ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-09 15:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Apr 09, 2015 at 05:02:36PM +0200, Daniel Vetter wrote:
> On Tue, Apr 07, 2015 at 04:20:51PM +0100, Chris Wilson wrote:
> > @@ -640,7 +641,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
> >  			break;
> >  	}
> >  
> > -	if (&request->list == &ring->request_list)
> > +	if (WARN_ON(&request->list == &ring->request_list))
> >  		return -ENOSPC;
> 
> Checking for new_space < n (and initializing new_space to 0) would be a
> clearer check imo. But that's just a bikeshed. Same for the legacy one
> below.

If you watch later, I remove the double update of ringbuf->space.
However, I am quite found of the if (iter == list_head) return -ENOSPC,
so I am a bit biased.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code
  2015-04-09 15:24     ` Chris Wilson
@ 2015-04-09 15:31       ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-09 15:31 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Thu, Apr 09, 2015 at 04:24:53PM +0100, Chris Wilson wrote:
> On Thu, Apr 09, 2015 at 05:02:36PM +0200, Daniel Vetter wrote:
> > On Tue, Apr 07, 2015 at 04:20:51PM +0100, Chris Wilson wrote:
> > > @@ -640,7 +641,7 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
> > >  			break;
> > >  	}
> > >  
> > > -	if (&request->list == &ring->request_list)
> > > +	if (WARN_ON(&request->list == &ring->request_list))
> > >  		return -ENOSPC;
> > 
> > Checking for new_space < n (and initializing new_space to 0) would be a
> > clearer check imo. But that's just a bikeshed. Same for the legacy one
> > below.
> 
> If you watch later, I remove the double update of ringbuf->space.
> However, I am quite found of the if (iter == list_head) return -ENOSPC,
> so I am a bit biased.

Oh it was mostly that I had to double-check the loop above (which was out
of the diff context). With context it's all good. I'm a really lazy
reviewer ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 34/70] drm/i915: Use a separate slab for vmas
  2015-04-07 15:20 ` [PATCH 34/70] drm/i915: Use a separate slab for vmas Chris Wilson
@ 2015-04-10  8:32   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:20:58PM +0100, Chris Wilson wrote:
> vma are more frequently allocated than objects and so should equally
> benefit from having a dedicated slab.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Merged these two moar-slab patches, thanks.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_dma.c     | 4 ++++
>  drivers/gpu/drm/i915/i915_drv.h     | 1 +
>  drivers/gpu/drm/i915/i915_gem.c     | 7 ++++++-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
>  4 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 135fbcad367f..9cbc04df94fb 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1012,6 +1012,8 @@ put_bridge:
>  free_priv:
>  	if (dev_priv->requests)
>  		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->vmas)
> +		kmem_cache_destroy(dev_priv->vmas);
>  	if (dev_priv->objects)
>  		kmem_cache_destroy(dev_priv->objects);
>  	kfree(dev_priv);
> @@ -1098,6 +1100,8 @@ int i915_driver_unload(struct drm_device *dev)
>  
>  	if (dev_priv->requests)
>  		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->vmas)
> +		kmem_cache_destroy(dev_priv->vmas);
>  	if (dev_priv->objects)
>  		kmem_cache_destroy(dev_priv->objects);
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ad08aa532456..2ca11208983e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1579,6 +1579,7 @@ struct i915_virtual_gpu {
>  struct drm_i915_private {
>  	struct drm_device *dev;
>  	struct kmem_cache *objects;
> +	struct kmem_cache *vmas;
>  	struct kmem_cache *requests;
>  
>  	const struct intel_device_info info;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a4a62592f0f8..05d7431db4ab 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4832,7 +4832,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
>  
>  	list_del(&vma->vma_link);
>  
> -	kfree(vma);
> +	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
>  }
>  
>  static void
> @@ -5211,6 +5211,11 @@ i915_gem_load(struct drm_device *dev)
>  				  sizeof(struct drm_i915_gem_object), 0,
>  				  SLAB_HWCACHE_ALIGN,
>  				  NULL);
> +	dev_priv->vmas =
> +		kmem_cache_create("i915_gem_vma",
> +				  sizeof(struct i915_vma), 0,
> +				  SLAB_HWCACHE_ALIGN,
> +				  NULL);
>  	dev_priv->requests =
>  		kmem_cache_create("i915_gem_request",
>  				  sizeof(struct drm_i915_gem_request), 0,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f48d8454f0ef..a9f24236efd9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2542,7 +2542,8 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
>  
>  	if (WARN_ON(i915_is_ggtt(vm) != !!ggtt_view))
>  		return ERR_PTR(-EINVAL);
> -	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
> +
> +	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
>  	if (vma == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 41/70] drm/i915: Tidy gen8 IRQ handler
  2015-04-07 15:21 ` [PATCH 41/70] drm/i915: Tidy " Chris Wilson
@ 2015-04-10  8:36   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:21:05PM +0100, Chris Wilson wrote:
> Remove some needless variables and parameter passing.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Merged 3 patches up to this one, thanks.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 113 +++++++++++++++++-----------------------
>  1 file changed, 49 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index c2c80bf490c6..46bcbff89760 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -985,8 +985,7 @@ static void ironlake_rps_change_irq_handler(struct drm_device *dev)
>  	return;
>  }
>  
> -static void notify_ring(struct drm_device *dev,
> -			struct intel_engine_cs *ring)
> +static void notify_ring(struct intel_engine_cs *ring)
>  {
>  	if (!intel_ring_initialized(ring))
>  		return;
> @@ -1248,9 +1247,9 @@ static void ilk_gt_irq_handler(struct drm_device *dev,
>  {
>  	if (gt_iir &
>  	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
> -		notify_ring(dev, &dev_priv->ring[RCS]);
> +		notify_ring(&dev_priv->ring[RCS]);
>  	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[VCS]);
> +		notify_ring(&dev_priv->ring[VCS]);
>  }
>  
>  static void snb_gt_irq_handler(struct drm_device *dev,
> @@ -1260,11 +1259,11 @@ static void snb_gt_irq_handler(struct drm_device *dev,
>  
>  	if (gt_iir &
>  	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
> -		notify_ring(dev, &dev_priv->ring[RCS]);
> +		notify_ring(&dev_priv->ring[RCS]);
>  	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[VCS]);
> +		notify_ring(&dev_priv->ring[VCS]);
>  	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		notify_ring(dev, &dev_priv->ring[BCS]);
> +		notify_ring(&dev_priv->ring[BCS]);
>  
>  	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>  		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -1275,63 +1274,65 @@ static void snb_gt_irq_handler(struct drm_device *dev,
>  		ivybridge_parity_error_irq_handler(dev, gt_iir);
>  }
>  
> -static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
> -				       struct drm_i915_private *dev_priv,
> +static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
>  				       u32 master_ctl)
>  {
> -	struct intel_engine_cs *ring;
> -	u32 rcs, bcs, vcs;
> -	uint32_t tmp = 0;
>  	irqreturn_t ret = IRQ_NONE;
>  
>  	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
> -		tmp = I915_READ_FW(GEN8_GT_IIR(0));
> +		u32 tmp = I915_READ_FW(GEN8_GT_IIR(0));
>  		if (tmp) {
>  			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
>  			ret = IRQ_HANDLED;
>  
> -			rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[RCS];
> -			if (rcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_lrc_irq_handler(ring);
> -			if (rcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> -
> -			bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[BCS];
> -			if (bcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_lrc_irq_handler(ring);
> -			if (bcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
> +				intel_lrc_irq_handler(&dev_priv->ring[RCS]);
> +			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
> +				notify_ring(&dev_priv->ring[RCS]);
> +
> +			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
> +				intel_lrc_irq_handler(&dev_priv->ring[BCS]);
> +			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
> +				notify_ring(&dev_priv->ring[BCS]);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
>  
>  	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
> -		tmp = I915_READ_FW(GEN8_GT_IIR(1));
> +		u32 tmp = I915_READ_FW(GEN8_GT_IIR(1));
>  		if (tmp) {
>  			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
>  			ret = IRQ_HANDLED;
>  
> -			vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VCS];
> -			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_lrc_irq_handler(ring);
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> -
> -			vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VCS2];
> -			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_lrc_irq_handler(ring);
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> +			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
> +				intel_lrc_irq_handler(&dev_priv->ring[VCS]);
> +			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
> +				notify_ring(&dev_priv->ring[VCS]);
> +
> +			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
> +				intel_lrc_irq_handler(&dev_priv->ring[VCS2]);
> +			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
> +				notify_ring(&dev_priv->ring[VCS2]);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
>  
> +	if (master_ctl & GEN8_GT_VECS_IRQ) {
> +		u32 tmp = I915_READ_FW(GEN8_GT_IIR(3));
> +		if (tmp) {
> +			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
> +			ret = IRQ_HANDLED;
> +
> +			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
> +				intel_lrc_irq_handler(&dev_priv->ring[VECS]);
> +			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
> +				notify_ring(&dev_priv->ring[VECS]);
> +		} else
> +			DRM_ERROR("The master control interrupt lied (GT3)!\n");
> +	}
> +
>  	if (master_ctl & GEN8_GT_PM_IRQ) {
> -		tmp = I915_READ_FW(GEN8_GT_IIR(2));
> +		u32 tmp = I915_READ_FW(GEN8_GT_IIR(2));
>  		if (tmp & dev_priv->pm_rps_events) {
>  			I915_WRITE_FW(GEN8_GT_IIR(2),
>  				      tmp & dev_priv->pm_rps_events);
> @@ -1341,22 +1342,6 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device *dev,
>  			DRM_ERROR("The master control interrupt lied (PM)!\n");
>  	}
>  
> -	if (master_ctl & GEN8_GT_VECS_IRQ) {
> -		tmp = I915_READ_FW(GEN8_GT_IIR(3));
> -		if (tmp) {
> -			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
> -			ret = IRQ_HANDLED;
> -
> -			vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
> -			ring = &dev_priv->ring[VECS];
> -			if (vcs & GT_CONTEXT_SWITCH_INTERRUPT)
> -				intel_lrc_irq_handler(ring);
> -			if (vcs & GT_RENDER_USER_INTERRUPT)
> -				notify_ring(dev, ring);
> -		} else
> -			DRM_ERROR("The master control interrupt lied (GT3)!\n");
> -	}
> -
>  	return ret;
>  }
>  
> @@ -1651,7 +1636,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
>  
>  	if (HAS_VEBOX(dev_priv->dev)) {
>  		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -			notify_ring(dev_priv->dev, &dev_priv->ring[VECS]);
> +			notify_ring(&dev_priv->ring[VECS]);
>  
>  		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
>  			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
> @@ -1845,7 +1830,7 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
>  			I915_WRITE(VLV_IIR, iir);
>  		}
>  
> -		gen8_gt_irq_handler(dev, dev_priv, master_ctl);
> +		gen8_gt_irq_handler(dev_priv, master_ctl);
>  
>  		/* Call regardless, as some status bits might not be
>  		 * signalled in iir */
> @@ -2187,7 +2172,7 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>  
>  	/* Find, clear, then process each source of interrupt */
>  
> -	ret = gen8_gt_irq_handler(dev, dev_priv, master_ctl);
> +	ret = gen8_gt_irq_handler(dev_priv, master_ctl);
>  
>  	if (master_ctl & GEN8_DE_MISC_IRQ) {
>  		tmp = I915_READ(GEN8_DE_MISC_IIR);
> @@ -3692,7 +3677,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>  		new_iir = I915_READ16(IIR); /* Flush posted writes */
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(&dev_priv->ring[RCS]);
>  
>  		for_each_pipe(dev_priv, pipe) {
>  			int plane = pipe;
> @@ -3883,7 +3868,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>  		new_iir = I915_READ(IIR); /* Flush posted writes */
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(&dev_priv->ring[RCS]);
>  
>  		for_each_pipe(dev_priv, pipe) {
>  			int plane = pipe;
> @@ -4110,9 +4095,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>  		new_iir = I915_READ(IIR); /* Flush posted writes */
>  
>  		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[RCS]);
> +			notify_ring(&dev_priv->ring[RCS]);
>  		if (iir & I915_BSD_USER_INTERRUPT)
> -			notify_ring(dev, &dev_priv->ring[VCS]);
> +			notify_ring(&dev_priv->ring[VCS]);
>  
>  		for_each_pipe(dev_priv, pipe) {
>  			if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS &&
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush
  2015-04-07 15:21 ` [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
@ 2015-04-10  8:37   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:21:08PM +0100, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9511993daeea..c394c0d13eb7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2570,7 +2570,6 @@ int __i915_add_request(struct intel_engine_cs *ring,
>  
>  	i915_queue_hangcheck(ring->dev);
>  
> -	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
>  	queue_delayed_work(dev_priv->wq,
>  			   &dev_priv->mm.retire_work,
>  			   round_jiffies_up_relative(HZ));
> @@ -2908,6 +2907,12 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  	struct drm_i915_private *dev_priv =
>  		container_of(work, typeof(*dev_priv), mm.idle_work.work);
>  	struct drm_device *dev = dev_priv->dev;
> +	struct intel_engine_cs *ring;
> +	int i;
> +
> +	for_each_ring(ring, dev_priv, i)
> +		if (!list_empty(&ring->request_list))
> +			return;
>  
>  	intel_mark_idle(dev);
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 45/70] drm/i915: Remove request->uniq
  2015-04-07 15:21 ` [PATCH 45/70] drm/i915: Remove request->uniq Chris Wilson
@ 2015-04-10  8:38   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:38 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Jani Nikula

On Tue, Apr 07, 2015 at 04:21:09PM +0100, Chris Wilson wrote:
> We already assign a unique identifier to every request: seqno. That
> someone felt like adding a second one without even mentioning why and
> tweaking ABI smells very fishy.
> 
> Fixes regression from
> commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a
> Author: Nick Hoath <nicholas.hoath@intel.com>
> Date:   Thu Feb 19 16:30:47 2015 +0000
> 
>     drm/i915: Fix a use after free, and unbalanced refcounting
> 
> v2: Rebase
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Nick Hoath <nicholas.hoath@intel.com>
> Cc: Thomas Daniel <thomas.daniel@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Jani Nikula <jani.nikula@intel.com>

Queued for -next, thanks for the patch.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_drv.h   |  4 ----
>  drivers/gpu/drm/i915/i915_gem.c   |  1 -
>  drivers/gpu/drm/i915/i915_trace.h | 13 ++++---------
>  3 files changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 262ebb620112..89839751237c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1843,8 +1843,6 @@ struct drm_i915_private {
>  		void (*stop_ring)(struct intel_engine_cs *ring);
>  	} gt;
>  
> -	uint32_t request_uniq;
> -
>  	/*
>  	 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
>  	 * will be rejected. Instead look for a better place.
> @@ -2120,8 +2118,6 @@ struct drm_i915_gem_request {
>  	/** process identifier submitting this request */
>  	struct pid *pid;
>  
> -	uint32_t uniq;
> -
>  	/**
>  	 * The ELSP only accepts two elements at a time, so we queue
>  	 * context/tail pairs on a given queue (ring->execlist_queue) until the
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c394c0d13eb7..e90894545fa4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2660,7 +2660,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  	}
>  
>  	rq->ring = ring;
> -	rq->uniq = dev_priv->request_uniq++;
>  
>  	if (i915.enable_execlists)
>  		ret = intel_logical_ring_alloc_request_extras(rq, ctx);
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index ce8ee9e8bced..6e2eee52aaa2 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -499,7 +499,6 @@ DECLARE_EVENT_CLASS(i915_gem_request,
>  	    TP_STRUCT__entry(
>  			     __field(u32, dev)
>  			     __field(u32, ring)
> -			     __field(u32, uniq)
>  			     __field(u32, seqno)
>  			     ),
>  
> @@ -508,13 +507,11 @@ DECLARE_EVENT_CLASS(i915_gem_request,
>  						i915_gem_request_get_ring(req);
>  			   __entry->dev = ring->dev->primary->index;
>  			   __entry->ring = ring->id;
> -			   __entry->uniq = req ? req->uniq : 0;
>  			   __entry->seqno = i915_gem_request_get_seqno(req);
>  			   ),
>  
> -	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u",
> -		      __entry->dev, __entry->ring, __entry->uniq,
> -		      __entry->seqno)
> +	    TP_printk("dev=%u, ring=%u, seqno=%u",
> +		      __entry->dev, __entry->ring, __entry->seqno)
>  );
>  
>  DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
> @@ -559,7 +556,6 @@ TRACE_EVENT(i915_gem_request_wait_begin,
>  	    TP_STRUCT__entry(
>  			     __field(u32, dev)
>  			     __field(u32, ring)
> -			     __field(u32, uniq)
>  			     __field(u32, seqno)
>  			     __field(bool, blocking)
>  			     ),
> @@ -575,14 +571,13 @@ TRACE_EVENT(i915_gem_request_wait_begin,
>  						i915_gem_request_get_ring(req);
>  			   __entry->dev = ring->dev->primary->index;
>  			   __entry->ring = ring->id;
> -			   __entry->uniq = req ? req->uniq : 0;
>  			   __entry->seqno = i915_gem_request_get_seqno(req);
>  			   __entry->blocking =
>  				     mutex_is_locked(&ring->dev->struct_mutex);
>  			   ),
>  
> -	    TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, blocking=%s",
> -		      __entry->dev, __entry->ring, __entry->uniq,
> +	    TP_printk("dev=%u, ring=%u, seqno=%u, blocking=%s",
> +		      __entry->dev, __entry->ring,
>  		      __entry->seqno, __entry->blocking ?  "yes (NB)" : "no")
>  );
>  
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 47/70] drm/i915: Allocate context objects from stolen
  2015-04-07 15:21 ` [PATCH 47/70] drm/i915: Allocate context objects from stolen Chris Wilson
@ 2015-04-10  8:39   ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:21:11PM +0100, Chris Wilson wrote:
> As we never expose context objects directly to userspace, we can forgo
> allocating a first-class GEM object for them and prefer to use the
> limited resource of reserved/stolen memory for them. Note this means
> that their initial contents are undefined.
> 
> However, a downside of using stolen objects for execlists is that we
> cannot access the physical address directly (thanks MCH!) which prevents
> their use.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 4 +++-
>  drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 18900f745bc6..b9c6b0ad1d0f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -157,7 +157,9 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
>  	struct drm_i915_gem_object *obj;
>  	int ret;
>  
> -	obj = i915_gem_alloc_object(dev, size);
> +	obj = i915_gem_object_create_stolen(dev, size);
> +	if (obj == NULL)
> +		obj = i915_gem_alloc_object(dev, size);
>  	if (obj == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index fc57d4111e56..a62ffaa45bd1 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1711,7 +1711,7 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  
>  	context_size = round_up(get_lr_context_size(ring), 4096);
>  
> -	ctx_obj = i915_gem_alloc_context_obj(dev, context_size);
> +	ctx_obj = i915_gem_alloc_object(dev, context_size);
>  	if (IS_ERR(ctx_obj)) {
>  		ret = PTR_ERR(ctx_obj);
>  		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed: %d\n", ret);
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 50/70] drm/i915: The argument for postfix is redundant
  2015-04-07 15:21 ` [PATCH 50/70] drm/i915: The argument for postfix is redundant Chris Wilson
@ 2015-04-10  8:53   ` Daniel Vetter
  2015-04-10  9:00     ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  8:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 07, 2015 at 04:21:14PM +0100, Chris Wilson wrote:
> We are conservative on the amount of free space available in the ring to
> avoid overruning the potential MI_INTERRUPT after the seqno write.
> Further undermining the justification for the change was that it was
> applied incorrectly.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Hm, where are we conservative with our estimates? Could we just wait for
req->head instead? And I don't see what's been implemented wrongly with
postfix.

Looking at

commit a71d8d94525e8fd855c0466fb586ae1cb008f3a2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 15 11:25:36 2012 +0000

    drm/i915: Record the tail at each request and use it to estimate the head

->postfix does get updated where ->tail was, Nick just renamed it from
->tail to ->postfix since execlist used ->tail with a different meaning.

Or do I miss something?

> diff --git a/drivers/gpu/drm/i915/intel_dvo.c b/drivers/gpu/drm/i915/intel_dvo.c
> index 9a27ec7100ef..f45caa6af7d2 100644
> --- a/drivers/gpu/drm/i915/intel_dvo.c
> +++ b/drivers/gpu/drm/i915/intel_dvo.c
> @@ -496,7 +496,7 @@ void intel_dvo_init(struct drm_device *dev)
>  		int gpio;
>  		bool dvoinit;
>  		enum pipe pipe;
> -		uint32_t dpll[2];
> +		uint32_t dpll[I915_MAX_PIPES];

Unrelated change and there's indeed only 2 dvo plls ever on gen2.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 50/70] drm/i915: The argument for postfix is redundant
  2015-04-10  8:53   ` Daniel Vetter
@ 2015-04-10  9:00     ` Chris Wilson
  2015-04-10  9:32       ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-10  9:00 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Apr 10, 2015 at 10:53:02AM +0200, Daniel Vetter wrote:
> On Tue, Apr 07, 2015 at 04:21:14PM +0100, Chris Wilson wrote:
> > We are conservative on the amount of free space available in the ring to
> > avoid overruning the potential MI_INTERRUPT after the seqno write.
> > Further undermining the justification for the change was that it was
> > applied incorrectly.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Hm, where are we conservative with our estimates? Could we just wait for
> req->head instead? And I don't see what's been implemented wrongly with
> postfix.

It took more patches for it to get fixed, when it wasn't actually broken as
the calculation of space remaining is conservative. req->head would be a
reasonable compromise to the addition of another variable, and the extra bit
of hysteresis here would probably be useful. Hmm.

> Looking at
> 
> commit a71d8d94525e8fd855c0466fb586ae1cb008f3a2
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Feb 15 11:25:36 2012 +0000
> 
>     drm/i915: Record the tail at each request and use it to estimate the head
> 
> ->postfix does get updated where ->tail was, Nick just renamed it from
> ->tail to ->postfix since execlist used ->tail with a different meaning.
> 
> Or do I miss something?

tail was always the position of the end of the request.
 
> > diff --git a/drivers/gpu/drm/i915/intel_dvo.c b/drivers/gpu/drm/i915/intel_dvo.c
> > index 9a27ec7100ef..f45caa6af7d2 100644
> > --- a/drivers/gpu/drm/i915/intel_dvo.c
> > +++ b/drivers/gpu/drm/i915/intel_dvo.c
> > @@ -496,7 +496,7 @@ void intel_dvo_init(struct drm_device *dev)
> >  		int gpio;
> >  		bool dvoinit;
> >  		enum pipe pipe;
> > -		uint32_t dpll[2];
> > +		uint32_t dpll[I915_MAX_PIPES];
> 
> Unrelated change and there's indeed only 2 dvo plls ever on gen2.

Accidental squashing for a compiler warning.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl
  2015-04-07 16:28   ` [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
@ 2015-04-10  9:14     ` Daniel Vetter
  2015-04-15  9:03       ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  9:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Apr 7, 2015 at 6:28 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>         /* Pinned buffers may be scanout, so flush the cache */
> -       if (obj->pin_display)
> +       if (obj->pin_display) {
> +               ret = i915_mutex_lock_interruptible(dev);
> +               if (ret)
> +                       goto unref;

I think and ACCESS_ONCE here and in the previous one would be good.
Wanted to do that and merge both, but they seem to conflict with the
lack of read-read ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 50/70] drm/i915: The argument for postfix is redundant
  2015-04-10  9:00     ` Chris Wilson
@ 2015-04-10  9:32       ` Daniel Vetter
  2015-04-10  9:45         ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-10  9:32 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Fri, Apr 10, 2015 at 10:00:33AM +0100, Chris Wilson wrote:
> On Fri, Apr 10, 2015 at 10:53:02AM +0200, Daniel Vetter wrote:
> > On Tue, Apr 07, 2015 at 04:21:14PM +0100, Chris Wilson wrote:
> > > We are conservative on the amount of free space available in the ring to
> > > avoid overruning the potential MI_INTERRUPT after the seqno write.
> > > Further undermining the justification for the change was that it was
> > > applied incorrectly.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Hm, where are we conservative with our estimates? Could we just wait for
> > req->head instead? And I don't see what's been implemented wrongly with
> > postfix.
> 
> It took more patches for it to get fixed, when it wasn't actually broken as
> the calculation of space remaining is conservative. req->head would be a
> reasonable compromise to the addition of another variable, and the extra bit
> of hysteresis here would probably be useful. Hmm.
> 
> > Looking at
> > 
> > commit a71d8d94525e8fd855c0466fb586ae1cb008f3a2
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Wed Feb 15 11:25:36 2012 +0000
> > 
> >     drm/i915: Record the tail at each request and use it to estimate the head
> > 
> > ->postfix does get updated where ->tail was, Nick just renamed it from
> > ->tail to ->postfix since execlist used ->tail with a different meaning.
> > 
> > Or do I miss something?
> 
> tail was always the position of the end of the request.

In the above referenced commit request->tail is sampled _before_ we call
->add_request and all that stuff. That seems to have changed in

commit 6d3d8274bc45de4babb62d64562d92af984dd238
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Jan 15 13:10:39 2015 +0000

    drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

by essentially doing an s/request->tail/request->postfix/ and adding a new
request->tail (real tail) to satisfy execlist.
-Daniel
>  
> > > diff --git a/drivers/gpu/drm/i915/intel_dvo.c b/drivers/gpu/drm/i915/intel_dvo.c
> > > index 9a27ec7100ef..f45caa6af7d2 100644
> > > --- a/drivers/gpu/drm/i915/intel_dvo.c
> > > +++ b/drivers/gpu/drm/i915/intel_dvo.c
> > > @@ -496,7 +496,7 @@ void intel_dvo_init(struct drm_device *dev)
> > >  		int gpio;
> > >  		bool dvoinit;
> > >  		enum pipe pipe;
> > > -		uint32_t dpll[2];
> > > +		uint32_t dpll[I915_MAX_PIPES];
> > 
> > Unrelated change and there's indeed only 2 dvo plls ever on gen2.
> 
> Accidental squashing for a compiler warning.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 50/70] drm/i915: The argument for postfix is redundant
  2015-04-10  9:32       ` Daniel Vetter
@ 2015-04-10  9:45         ` Chris Wilson
  0 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-10  9:45 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Apr 10, 2015 at 11:32:36AM +0200, Daniel Vetter wrote:
> On Fri, Apr 10, 2015 at 10:00:33AM +0100, Chris Wilson wrote:
> > On Fri, Apr 10, 2015 at 10:53:02AM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 07, 2015 at 04:21:14PM +0100, Chris Wilson wrote:
> > > > We are conservative on the amount of free space available in the ring to
> > > > avoid overruning the potential MI_INTERRUPT after the seqno write.
> > > > Further undermining the justification for the change was that it was
> > > > applied incorrectly.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > 
> > > Hm, where are we conservative with our estimates? Could we just wait for
> > > req->head instead? And I don't see what's been implemented wrongly with
> > > postfix.
> > 
> > It took more patches for it to get fixed, when it wasn't actually broken as
> > the calculation of space remaining is conservative. req->head would be a
> > reasonable compromise to the addition of another variable, and the extra bit
> > of hysteresis here would probably be useful. Hmm.
> > 
> > > Looking at
> > > 
> > > commit a71d8d94525e8fd855c0466fb586ae1cb008f3a2
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Wed Feb 15 11:25:36 2012 +0000
> > > 
> > >     drm/i915: Record the tail at each request and use it to estimate the head
> > > 
> > > ->postfix does get updated where ->tail was, Nick just renamed it from
> > > ->tail to ->postfix since execlist used ->tail with a different meaning.
> > > 
> > > Or do I miss something?
> > 
> > tail was always the position of the end of the request.
> 
> In the above referenced commit request->tail is sampled _before_ we call
> ->add_request and all that stuff. That seems to have changed in
> 
> commit 6d3d8274bc45de4babb62d64562d92af984dd238
> Author: Nick Hoath <nicholas.hoath@intel.com>
> Date:   Thu Jan 15 13:10:39 2015 +0000
> 
>     drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
> 
> by essentially doing an s/request->tail/request->postfix/ and adding a new
> request->tail (real tail) to satisfy execlist.

Oh sorry, execlists was broken. Quel surprise. The model should have
been to add the breadcrumb, grab the tail for the end of the request,
submit to engine

http://cgit.freedesktop.org/~ickle/linux-2.6/tree/drivers/gpu/drm/i915/i915_gem_request.c#n378
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-07 15:20 ` [PATCH 17/70] drm/i915: Optimistically spin for the request completion Chris Wilson
  2015-04-08 11:39   ` Daniel Vetter
@ 2015-04-13 11:34   ` Tvrtko Ursulin
  2015-04-13 12:25     ` Daniel Vetter
  1 sibling, 1 reply; 113+ messages in thread
From: Tvrtko Ursulin @ 2015-04-13 11:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, Eero Tamminen, Rantala, Valtteri


Hi,

On 04/07/2015 04:20 PM, Chris Wilson wrote:
> This provides a nice boost to mesa in swap bound scenarios (as mesa
> throttles itself to the previous frame and given the scenario that will
> complete shortly). It will also provide a good boost to systems running
> with semaphores disabled and so frequently waiting on the GPU as it
> switches rings. In the most favourable of microbenchmarks, this can
> increase performance by around 15% - though in practice improvements
> will be marginal and rarely noticeable.
>
> v2: Account for user timeouts
> v3: Limit the spinning to a single jiffie (~1us) at most. On an
> otherwise idle system, there is no scheduler contention and so without a
> limit we would spin until the GPU is ready.
> v4: Drop forcewake - the lazy coherent access doesn't require it, and we
> have no reason to believe that the forcewake itself improves seqno
> coherency - it only adds delay.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>

I already said that I already gave my r-b for this one. :)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 66/70] drm/i915: Remove obj->pin_mappable
  2015-04-07 16:28   ` [PATCH 66/70] drm/i915: Remove obj->pin_mappable Chris Wilson
@ 2015-04-13 11:35     ` Tvrtko Ursulin
  2015-04-13 12:30       ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Tvrtko Ursulin @ 2015-04-13 11:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/07/2015 05:28 PM, Chris Wilson wrote:
> The obj->pin_mappable flag only exists for debug purposes and is a
> hindrance that is mistreated with rotated GGTT views. For debug
> purposes, it suffices to mark objects with pin_display as being of note.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c | 6 +++---
>   drivers/gpu/drm/i915/i915_drv.h     | 1 -
>   drivers/gpu/drm/i915/i915_gem.c     | 6 +-----
>   3 files changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2e851c6a310c..6508eec3cf60 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -166,9 +166,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   	}
>   	if (obj->stolen)
>   		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> -	if (obj->pin_mappable || obj->fault_mappable) {
> +	if (obj->pin_display || obj->fault_mappable) {
>   		char s[3], *t = s;
> -		if (obj->pin_mappable)
> +		if (obj->pin_display)
>   			*t++ = 'p';
>   		if (obj->fault_mappable)
>   			*t++ = 'f';
> @@ -464,7 +464,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>   			size += i915_gem_obj_ggtt_size(obj);
>   			++count;
>   		}
> -		if (obj->pin_mappable) {
> +		if (obj->pin_display) {
>   			mappable_size += i915_gem_obj_ggtt_size(obj);
>   			++mappable_count;
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index eeffefa10612..2c72ee0214b5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1980,7 +1980,6 @@ struct drm_i915_gem_object {
>   	 * accurate mappable working set.
>   	 */
>   	unsigned int fault_mappable:1;
> -	unsigned int pin_mappable:1;
>   	unsigned int pin_display:1;
>
>   	/*
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index bd60bb552920..3d4463930267 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4445,9 +4445,6 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
>   	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
>
>   	vma->pin_count++;
> -	if (flags & PIN_MAPPABLE)
> -		obj->pin_mappable |= true;
> -
>   	return 0;
>   }
>
> @@ -4487,8 +4484,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
>   	WARN_ON(vma->pin_count == 0);
>   	WARN_ON(!i915_gem_obj_ggtt_bound_view(obj, view));
>
> -	if (--vma->pin_count == 0 && view->type == I915_GGTT_VIEW_NORMAL)
> -		obj->pin_mappable = false;
> +	--vma->pin_count;
>   }
>
>   bool
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 17/70] drm/i915: Optimistically spin for the request completion
  2015-04-13 11:34   ` Tvrtko Ursulin
@ 2015-04-13 12:25     ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-13 12:25 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx, Rantala, Valtteri, Eero Tamminen

On Mon, Apr 13, 2015 at 12:34:19PM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 04/07/2015 04:20 PM, Chris Wilson wrote:
> >This provides a nice boost to mesa in swap bound scenarios (as mesa
> >throttles itself to the previous frame and given the scenario that will
> >complete shortly). It will also provide a good boost to systems running
> >with semaphores disabled and so frequently waiting on the GPU as it
> >switches rings. In the most favourable of microbenchmarks, this can
> >increase performance by around 15% - though in practice improvements
> >will be marginal and rarely noticeable.
> >
> >v2: Account for user timeouts
> >v3: Limit the spinning to a single jiffie (~1us) at most. On an
> >otherwise idle system, there is no scheduler contention and so without a
> >limit we would spin until the GPU is ready.
> >v4: Drop forcewake - the lazy coherent access doesn't require it, and we
> >have no reason to believe that the forcewake itself improves seqno
> >coherency - it only adds delay.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> >Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
> 
> I already said that I already gave my r-b for this one. :)
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 66/70] drm/i915: Remove obj->pin_mappable
  2015-04-13 11:35     ` Tvrtko Ursulin
@ 2015-04-13 12:30       ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-04-13 12:30 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Apr 13, 2015 at 12:35:53PM +0100, Tvrtko Ursulin wrote:
> 
> On 04/07/2015 05:28 PM, Chris Wilson wrote:
> >The obj->pin_mappable flag only exists for debug purposes and is a
> >hindrance that is mistreated with rotated GGTT views. For debug
> >purposes, it suffices to mark objects with pin_display as being of note.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_debugfs.c | 6 +++---
> >  drivers/gpu/drm/i915/i915_drv.h     | 1 -
> >  drivers/gpu/drm/i915/i915_gem.c     | 6 +-----
> >  3 files changed, 4 insertions(+), 9 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >index 2e851c6a310c..6508eec3cf60 100644
> >--- a/drivers/gpu/drm/i915/i915_debugfs.c
> >+++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >@@ -166,9 +166,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  	}
> >  	if (obj->stolen)
> >  		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> >-	if (obj->pin_mappable || obj->fault_mappable) {
> >+	if (obj->pin_display || obj->fault_mappable) {
> >  		char s[3], *t = s;
> >-		if (obj->pin_mappable)
> >+		if (obj->pin_display)
> >  			*t++ = 'p';
> >  		if (obj->fault_mappable)
> >  			*t++ = 'f';
> >@@ -464,7 +464,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
> >  			size += i915_gem_obj_ggtt_size(obj);
> >  			++count;
> >  		}
> >-		if (obj->pin_mappable) {
> >+		if (obj->pin_display) {
> >  			mappable_size += i915_gem_obj_ggtt_size(obj);
> >  			++mappable_count;
> >  		}
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index eeffefa10612..2c72ee0214b5 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -1980,7 +1980,6 @@ struct drm_i915_gem_object {
> >  	 * accurate mappable working set.
> >  	 */
> >  	unsigned int fault_mappable:1;
> >-	unsigned int pin_mappable:1;
> >  	unsigned int pin_display:1;
> >
> >  	/*
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index bd60bb552920..3d4463930267 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -4445,9 +4445,6 @@ i915_gem_object_do_pin(struct drm_i915_gem_object *obj,
> >  	WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
> >
> >  	vma->pin_count++;
> >-	if (flags & PIN_MAPPABLE)
> >-		obj->pin_mappable |= true;
> >-
> >  	return 0;
> >  }
> >
> >@@ -4487,8 +4484,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
> >  	WARN_ON(vma->pin_count == 0);
> >  	WARN_ON(!i915_gem_obj_ggtt_bound_view(obj, view));
> >
> >-	if (--vma->pin_count == 0 && view->type == I915_GGTT_VIEW_NORMAL)
> >-		obj->pin_mappable = false;
> >+	--vma->pin_count;
> >  }
> >
> >  bool
> >
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations
  2015-04-07 15:20 ` [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
@ 2015-04-14 13:51   ` Tvrtko Ursulin
  2015-04-14 14:00     ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Tvrtko Ursulin @ 2015-04-14 13:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Lionel Landwerlin


On 04/07/2015 04:20 PM, Chris Wilson wrote:
> Currently, we only track the last request globally across all engines.
> This prevents us from issuing concurrent read requests on e.g. the RCS
> and BCS engines (or more likely the render and media engines). Without
> semaphores, we incur costly stalls as we synchronise between rings -
> greatly impacting the current performance of Broadwell versus Haswell in
> certain workloads (like video decode). With the introduction of
> reference counted requests, it is much easier to track the last request
> per ring, as well as the last global write request so that we can
> optimise inter-engine read read requests (as well as better optimise
> certain CPU waits).
>
> v2: Fix inverted readonly condition for nonblocking waits.
> v3: Handle non-continguous engine array after waits
> v4: Rebase, tidy, rewrite ring list debugging
> v5: Use obj->active as a bitfield, it looks cool
> v6: Micro-optimise, mostly involving moving code around
> v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
> v8: Rebase

I am still slightly concerned with the sequential ring req waiting in 
combination with optimistic spinning, but other than that looks good to me:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations
  2015-04-14 13:51   ` Tvrtko Ursulin
@ 2015-04-14 14:00     ` Chris Wilson
  0 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-14 14:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Lionel Landwerlin, intel-gfx

On Tue, Apr 14, 2015 at 02:51:37PM +0100, Tvrtko Ursulin wrote:
> 
> On 04/07/2015 04:20 PM, Chris Wilson wrote:
> >Currently, we only track the last request globally across all engines.
> >This prevents us from issuing concurrent read requests on e.g. the RCS
> >and BCS engines (or more likely the render and media engines). Without
> >semaphores, we incur costly stalls as we synchronise between rings -
> >greatly impacting the current performance of Broadwell versus Haswell in
> >certain workloads (like video decode). With the introduction of
> >reference counted requests, it is much easier to track the last request
> >per ring, as well as the last global write request so that we can
> >optimise inter-engine read read requests (as well as better optimise
> >certain CPU waits).
> >
> >v2: Fix inverted readonly condition for nonblocking waits.
> >v3: Handle non-continguous engine array after waits
> >v4: Rebase, tidy, rewrite ring list debugging
> >v5: Use obj->active as a bitfield, it looks cool
> >v6: Micro-optimise, mostly involving moving code around
> >v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
> >v8: Rebase
> 
> I am still slightly concerned with the sequential ring req waiting
> in combination with optimistic spinning, but other than that looks
> good to me:

I hear you, I don't yet have a scenario where I care but with a little
more refactoring (see next version) extending i915_wait_request to work
on an array of requests will be a reasonalbly easy task.
 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Thanks, but I have a new version on its way with minor changes.

Spotted an issue with Ironlake and do_idling() as well as slight
refactoring.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock
  2015-04-07 16:28   ` [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
@ 2015-04-14 14:52     ` Tvrtko Ursulin
  2015-04-14 15:05       ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Tvrtko Ursulin @ 2015-04-14 14:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/07/2015 05:28 PM, Chris Wilson wrote:
> We only need a very lightweight mechanism here as the locking is only
> used for co-ordinating a bitfield.
>
> Also double check that the object is still pinned to the display plane
> before processing the state change.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |  2 +-
>   drivers/gpu/drm/i915/i915_gem.c          |  2 +-
>   drivers/gpu/drm/i915/intel_frontbuffer.c | 40 +++++++++++++++++---------------
>   3 files changed, 23 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 97372869097f..eeffefa10612 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1545,7 +1545,7 @@ struct intel_pipe_crc {
>   };
>
>   struct i915_frontbuffer_tracking {
> -	struct mutex lock;
> +	spinlock_t lock;
>
>   	/*
>   	 * Tracking bits for delayed frontbuffer flushing du to gpu activity or
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e9f2d2b102de..43baac2c1e20 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5260,7 +5260,7 @@ i915_gem_load(struct drm_device *dev)
>
>   	i915_gem_shrinker_init(dev_priv);
>
> -	mutex_init(&dev_priv->fb_tracking.lock);
> +	spin_lock_init(&dev_priv->fb_tracking.lock);
>   }
>
>   void i915_gem_release(struct drm_device *dev, struct drm_file *file)
> diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
> index a20cffb78c0f..28ce2ab94189 100644
> --- a/drivers/gpu/drm/i915/intel_frontbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_frontbuffer.c
> @@ -139,16 +139,14 @@ void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
>
>   	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
>
> -	if (!obj->frontbuffer_bits)
> +	if (!obj->frontbuffer_bits || !obj->pin_display)
>   		return;
>
>   	if (ring) {
> -		mutex_lock(&dev_priv->fb_tracking.lock);
> -		dev_priv->fb_tracking.busy_bits
> -			|= obj->frontbuffer_bits;
> -		dev_priv->fb_tracking.flip_bits
> -			&= ~obj->frontbuffer_bits;
> -		mutex_unlock(&dev_priv->fb_tracking.lock);
> +		spin_lock(&dev_priv->fb_tracking.lock);
> +		dev_priv->fb_tracking.busy_bits |= obj->frontbuffer_bits;
> +		dev_priv->fb_tracking.flip_bits &= ~obj->frontbuffer_bits;
> +		spin_unlock(&dev_priv->fb_tracking.lock);
>   	}
>
>   	intel_mark_fb_busy(dev, obj->frontbuffer_bits, ring);
> @@ -175,9 +173,12 @@ void intel_frontbuffer_flush(struct drm_device *dev,
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>
>   	/* Delay flushing when rings are still busy.*/
> -	mutex_lock(&dev_priv->fb_tracking.lock);
> +	spin_lock(&dev_priv->fb_tracking.lock);
>   	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
> -	mutex_unlock(&dev_priv->fb_tracking.lock);
> +	spin_unlock(&dev_priv->fb_tracking.lock);

Looks like you could just remove the lock here in process.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock
  2015-04-14 14:52     ` Tvrtko Ursulin
@ 2015-04-14 15:05       ` Chris Wilson
  2015-04-14 15:15         ` Tvrtko Ursulin
  0 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-14 15:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Apr 14, 2015 at 03:52:09PM +0100, Tvrtko Ursulin wrote:
> >  	/* Delay flushing when rings are still busy.*/
> >-	mutex_lock(&dev_priv->fb_tracking.lock);
> >+	spin_lock(&dev_priv->fb_tracking.lock);
> >  	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
> >-	mutex_unlock(&dev_priv->fb_tracking.lock);
> >+	spin_unlock(&dev_priv->fb_tracking.lock);
> 
> Looks like you could just remove the lock here in process.

...as in we are always protected by struct_mutex? I think Daniel was
planning for a future where that was guaranteed.

Anyway my v2 patch does:

void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
                               struct intel_engine_cs *ring,
                               enum fb_op_origin origin);

static inline void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj,
                                           struct intel_engine_cs *ring,
                                           enum fb_op_origin origin)
{
        if (!obj->frontbuffer_bits || !obj->pin_display)
                return;

        __intel_fb_obj_invalidate(obj, ring, origin);
}


As the function call overhead itself was annoying me in the execbuffer
profiles.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock
  2015-04-14 15:05       ` Chris Wilson
@ 2015-04-14 15:15         ` Tvrtko Ursulin
  0 siblings, 0 replies; 113+ messages in thread
From: Tvrtko Ursulin @ 2015-04-14 15:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/14/2015 04:05 PM, Chris Wilson wrote:
> On Tue, Apr 14, 2015 at 03:52:09PM +0100, Tvrtko Ursulin wrote:
>>>   	/* Delay flushing when rings are still busy.*/
>>> -	mutex_lock(&dev_priv->fb_tracking.lock);
>>> +	spin_lock(&dev_priv->fb_tracking.lock);
>>>   	frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits;
>>> -	mutex_unlock(&dev_priv->fb_tracking.lock);
>>> +	spin_unlock(&dev_priv->fb_tracking.lock);
>>
>> Looks like you could just remove the lock here in process.
>
> ...as in we are always protected by struct_mutex? I think Daniel was
> planning for a future where that was guaranteed.

No, it always looks to be updated with a single write - so I don't see 
why have a lock for this read?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl
  2015-04-10  9:14     ` Daniel Vetter
@ 2015-04-15  9:03       ` Chris Wilson
  2015-04-15  9:33         ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Chris Wilson @ 2015-04-15  9:03 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Apr 10, 2015 at 11:14:56AM +0200, Daniel Vetter wrote:
> On Tue, Apr 7, 2015 at 6:28 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >         /* Pinned buffers may be scanout, so flush the cache */
> > -       if (obj->pin_display)
> > +       if (obj->pin_display) {
> > +               ret = i915_mutex_lock_interruptible(dev);
> > +               if (ret)
> > +                       goto unref;
> 
> I think and ACCESS_ONCE here and in the previous one would be good.
> Wanted to do that and merge both, but they seem to conflict with the
> lack of read-read ...

What do you want to accomplish with ACCESS_ONCE()? Unfortunately as we
can't use it on bitfields.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl
  2015-04-15  9:03       ` Chris Wilson
@ 2015-04-15  9:33         ` Daniel Vetter
  2015-04-15  9:38           ` Chris Wilson
  0 siblings, 1 reply; 113+ messages in thread
From: Daniel Vetter @ 2015-04-15  9:33 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Wed, Apr 15, 2015 at 10:03:56AM +0100, Chris Wilson wrote:
> On Fri, Apr 10, 2015 at 11:14:56AM +0200, Daniel Vetter wrote:
> > On Tue, Apr 7, 2015 at 6:28 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >         /* Pinned buffers may be scanout, so flush the cache */
> > > -       if (obj->pin_display)
> > > +       if (obj->pin_display) {
> > > +               ret = i915_mutex_lock_interruptible(dev);
> > > +               if (ret)
> > > +                       goto unref;
> > 
> > I think and ACCESS_ONCE here and in the previous one would be good.
> > Wanted to do that and merge both, but they seem to conflict with the
> > lack of read-read ...
> 
> What do you want to accomplish with ACCESS_ONCE()? Unfortunately as we
> can't use it on bitfields.

making sure that gcc doesn't reload the variable when it shuffles basic
blocks around. Admittedly unlikely. The point is not to do an atomic load
but just to ensure that we have a consistent value, no matter how obtained
so. But I guess we need to open-code the ACCESS_ONCE with a (volatile
bool) cast.

The other reason is that it serves as a nice reminder to the reader that
something tricky is going on, and that the lockless reading of
->pin_display is intentional.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl
  2015-04-15  9:33         ` Daniel Vetter
@ 2015-04-15  9:38           ` Chris Wilson
  0 siblings, 0 replies; 113+ messages in thread
From: Chris Wilson @ 2015-04-15  9:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Apr 15, 2015 at 11:33:33AM +0200, Daniel Vetter wrote:
> On Wed, Apr 15, 2015 at 10:03:56AM +0100, Chris Wilson wrote:
> > On Fri, Apr 10, 2015 at 11:14:56AM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 7, 2015 at 6:28 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > >         /* Pinned buffers may be scanout, so flush the cache */
> > > > -       if (obj->pin_display)
> > > > +       if (obj->pin_display) {
> > > > +               ret = i915_mutex_lock_interruptible(dev);
> > > > +               if (ret)
> > > > +                       goto unref;
> > > 
> > > I think and ACCESS_ONCE here and in the previous one would be good.
> > > Wanted to do that and merge both, but they seem to conflict with the
> > > lack of read-read ...
> > 
> > What do you want to accomplish with ACCESS_ONCE()? Unfortunately as we
> > can't use it on bitfields.
> 
> making sure that gcc doesn't reload the variable when it shuffles basic
> blocks around. Admittedly unlikely. The point is not to do an atomic load
> but just to ensure that we have a consistent value, no matter how obtained
> so. But I guess we need to open-code the ACCESS_ONCE with a (volatile
> bool) cast.
> 
> The other reason is that it serves as a nice reminder to the reader that
> something tricky is going on, and that the lockless reading of
> ->pin_display is intentional.

Ok, that's just what I thought you wanted. We can hope that the
optimistic_read() is more friendly (presupposing Linus actually adds
it).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 33/70] drm/i915: Use a separate slab for requests
  2015-04-07 15:20 ` [PATCH 33/70] drm/i915: Use a separate slab for requests Chris Wilson
@ 2015-05-22 14:48   ` Robert Beckett
  0 siblings, 0 replies; 113+ messages in thread
From: Robert Beckett @ 2015-05-22 14:48 UTC (permalink / raw)
  To: intel-gfx

On 07/04/2015 16:20, Chris Wilson wrote:
> requests are even more frequently allocated than objects and equally
> benefit from having a dedicated slab.
>
> v2: Rebase
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_dma.c         | 12 ++++++----
>   drivers/gpu/drm/i915/i915_drv.h         |  4 +++-
>   drivers/gpu/drm/i915/i915_gem.c         | 41 +++++++++++++++++++--------------
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 -
>   4 files changed, 35 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 7b0109e2ab23..135fbcad367f 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1010,8 +1010,10 @@ out_regs:
>   put_bridge:
>   	pci_dev_put(dev_priv->bridge_dev);
>   free_priv:
> -	if (dev_priv->slab)
> -		kmem_cache_destroy(dev_priv->slab);
> +	if (dev_priv->requests)
> +		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->objects)
> +		kmem_cache_destroy(dev_priv->objects);
>   	kfree(dev_priv);
>   	return ret;
>   }
> @@ -1094,8 +1096,10 @@ int i915_driver_unload(struct drm_device *dev)
>   	if (dev_priv->regs != NULL)
>   		pci_iounmap(dev->pdev, dev_priv->regs);
>
> -	if (dev_priv->slab)
> -		kmem_cache_destroy(dev_priv->slab);
> +	if (dev_priv->requests)
> +		kmem_cache_destroy(dev_priv->requests);
> +	if (dev_priv->objects)
> +		kmem_cache_destroy(dev_priv->objects);
>
>   	pci_dev_put(dev_priv->bridge_dev);
>   	kfree(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 600b6d4a0139..ad08aa532456 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1578,7 +1578,8 @@ struct i915_virtual_gpu {
>
>   struct drm_i915_private {
>   	struct drm_device *dev;
> -	struct kmem_cache *slab;
> +	struct kmem_cache *objects;
> +	struct kmem_cache *requests;
>
>   	const struct intel_device_info info;
>
> @@ -2070,6 +2071,7 @@ struct drm_i915_gem_request {
>   	struct kref ref;
>
>   	/** On Which ring this request was generated */
> +	struct drm_i915_private *i915;
>   	struct intel_engine_cs *ring;
>
>   	/** GEM sequence number associated with this request. */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f07cd17be04..a4a62592f0f8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -381,13 +381,13 @@ out:
>   void *i915_gem_object_alloc(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	return kmem_cache_zalloc(dev_priv->slab, GFP_KERNEL);
> +	return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
>   }
>
>   void i915_gem_object_free(struct drm_i915_gem_object *obj)
>   {
>   	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> -	kmem_cache_free(dev_priv->slab, obj);
> +	kmem_cache_free(dev_priv->objects, obj);
>   }
>
>   static int
> @@ -2633,43 +2633,45 @@ void i915_gem_request_free(struct kref *req_ref)
>   		i915_gem_context_unreference(ctx);
>   	}
>
> -	kfree(req);
> +	kmem_cache_free(req->i915->requests, req);
>   }
>
>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx)
>   {
> +	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	struct drm_i915_gem_request *rq;
>   	int ret;
> -	struct drm_i915_gem_request *request;
> -	struct drm_i915_private *dev_private = ring->dev->dev_private;
>
>   	if (ring->outstanding_lazy_request)
>   		return 0;
>
> -	request = kzalloc(sizeof(*request), GFP_KERNEL);
> -	if (request == NULL)
> +	rq = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> +	if (rq == NULL)
>   		return -ENOMEM;
>
> -	ret = i915_gem_get_seqno(ring->dev, &request->seqno);
> +	kref_init(&rq->ref);
> +	rq->i915 = dev_priv;
> +
> +	ret = i915_gem_get_seqno(ring->dev, &rq->seqno);
>   	if (ret) {
> -		kfree(request);
> +		kfree(rq);
>   		return ret;
>   	}
>
> -	kref_init(&request->ref);
> -	request->ring = ring;
> -	request->uniq = dev_private->request_uniq++;
> +	rq->ring = ring;
> +	rq->uniq = dev_priv->request_uniq++;
>
>   	if (i915.enable_execlists)
> -		ret = intel_logical_ring_alloc_request_extras(request, ctx);
> +		ret = intel_logical_ring_alloc_request_extras(rq, ctx);
>   	else
> -		ret = intel_ring_alloc_request_extras(request);
> +		ret = intel_ring_alloc_request_extras(rq);
>   	if (ret) {
> -		kfree(request);
> +		kfree(rq);
>   		return ret;
>   	}
>
> -	ring->outstanding_lazy_request = request;
> +	ring->outstanding_lazy_request = rq;
>   	return 0;
>   }
>
> @@ -5204,11 +5206,16 @@ i915_gem_load(struct drm_device *dev)
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	int i;
>
> -	dev_priv->slab =
> +	dev_priv->objects =
>   		kmem_cache_create("i915_gem_object",
>   				  sizeof(struct drm_i915_gem_object), 0,
>   				  SLAB_HWCACHE_ALIGN,
>   				  NULL);
> +	dev_priv->requests =
> +		kmem_cache_create("i915_gem_request",
> +				  sizeof(struct drm_i915_gem_request), 0,
> +				  SLAB_HWCACHE_ALIGN,
> +				  NULL);
>
>   	INIT_LIST_HEAD(&dev_priv->vm_list);
>   	i915_init_vm(dev_priv, &dev_priv->gtt.base);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 99a1fdff4924..bf7837d30388 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2162,7 +2162,6 @@ int intel_ring_idle(struct intel_engine_cs *ring)
>   int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
>   {
>   	request->ringbuf = request->ring->buffer;
> -
>   	return 0;
>   }
>
>

You missed a request allocation in execlists_context_queue in 
intel_lrc.c when !request. By the look of it that code could be changed 
to use i915_gem_request_alloc, unless there is any reason not to set the 
outstanding_lazy_request to the dummy request.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 0/1] drm/i915: intel_ring_initialized() must be simple and inline
  2015-04-07 15:21 ` [PATCH 57/70] drm/i915: intel_ring_initialized() must be simple and inline Chris Wilson
@ 2015-12-08 15:02   ` Dave Gordon
  2015-12-08 15:02     ` [PATCH 1/1] " Dave Gordon
  0 siblings, 1 reply; 113+ messages in thread
From: Dave Gordon @ 2015-12-08 15:02 UTC (permalink / raw)
  To: intel-gfx

Based on Chris Wilson's patch from 6 months ago, rebased and adapted.

The idea is to use ring->dev as an indicator showing which engines have
been initialised and are therefore to be included in iterations that use
for_each_ring(). This allows us to avoid multiple memory references and
a (non-inlined) function call on each iteration of each such loop.

This version differs from Chris' primarily in the error cleanup paths,
where initialisation has failed and we therefore want to mark an engine
as NOT initialised. I have made the ring_cleanup() functions callable from
the failure path of the ring_init() code, rather than duplicating all the
steps to tear down a partially-constructed state. This also increases
symmetry; ring->dev is set at the start of ring_init, and cleared at the
end of ring_cleanup, in both the normal and error cases.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 1/1] drm/i915: intel_ring_initialized() must be simple and inline
  2015-12-08 15:02   ` [PATCH 0/1] " Dave Gordon
@ 2015-12-08 15:02     ` Dave Gordon
  2015-12-10 10:24       ` Daniel Vetter
  0 siblings, 1 reply; 113+ messages in thread
From: Dave Gordon @ 2015-12-08 15:02 UTC (permalink / raw)
  To: intel-gfx

Based on Chris Wilson's patch from 6 months ago, rebased and adapted.

The current implementation of intel_ring_initialized() is too heavyweight;
it's a non-inlined function that chases several levels of pointers. This
wouldn't matter too much if it were rarely called, but it's used inside
the iterator test of for_each_ring() and is therefore called quite
frequently. So let's make it simple and inline ...

The idea here is to use ring->dev as an indicator showing which engines
have been initialised and are therefore to be included in iterations that
use for_each_ring(). This allows us to avoid multiple memory references
and a (non-inlined) function call on each iteration of each such loop.

	Fixes regression from
	commit 48d823878d64f93163f5a949623346748bbce1b4
	Author: Oscar Mateo <oscar.mateo@intel.com>
	Date:   Thu Jul 24 17:04:23 2014 +0100

	    drm/i915/bdw: Generic logical ring init and cleanup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 17 +++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c | 39 +++++++++++----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++-
 3 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4ebafab..7644c48 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1894,8 +1894,10 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 
 	dev_priv = ring->dev->dev_private;
 
-	intel_logical_ring_stop(ring);
-	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
+	if (ring->buffer) {
+		intel_logical_ring_stop(ring);
+		WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
+	}
 
 	if (ring->cleanup)
 		ring->cleanup(ring);
@@ -1909,6 +1911,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	}
 
 	lrc_destroy_wa_ctx_obj(ring);
+	ring->dev = NULL;
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
@@ -1931,11 +1934,11 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
-		return ret;
+		goto error;
 
 	ret = intel_lr_context_deferred_alloc(ring->default_context, ring);
 	if (ret)
-		return ret;
+		goto error;
 
 	/* As this is the default context, always pin it */
 	ret = intel_lr_context_do_pin(
@@ -1946,9 +1949,13 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		DRM_ERROR(
 			"Failed to pin and map ringbuffer %s: %d\n",
 			ring->name, ret);
-		return ret;
+		goto error;
 	}
 
+	return 0;
+
+error:
+	intel_logical_ring_cleanup(ring);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 57d78f2..921c8a6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,23 +33,6 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-bool
-intel_ring_initialized(struct intel_engine_cs *ring)
-{
-	struct drm_device *dev = ring->dev;
-
-	if (!dev)
-		return false;
-
-	if (i915.enable_execlists) {
-		struct intel_context *dctx = ring->default_context;
-		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
-
-		return ringbuf->obj;
-	} else
-		return ring->buffer && ring->buffer->obj;
-}
-
 int __intel_ring_space(int head, int tail, int size)
 {
 	int space = head - tail;
@@ -2167,8 +2150,10 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	init_waitqueue_head(&ring->irq_queue);
 
 	ringbuf = intel_engine_create_ringbuffer(ring, 32 * PAGE_SIZE);
-	if (IS_ERR(ringbuf))
-		return PTR_ERR(ringbuf);
+	if (IS_ERR(ringbuf)) {
+		ret = PTR_ERR(ringbuf);
+		goto error;
+	}
 	ring->buffer = ringbuf;
 
 	if (I915_NEED_GFX_HWS(dev)) {
@@ -2197,8 +2182,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	return 0;
 
 error:
-	intel_ringbuffer_free(ringbuf);
-	ring->buffer = NULL;
+	intel_cleanup_ring_buffer(ring);
 	return ret;
 }
 
@@ -2211,12 +2195,14 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 
 	dev_priv = to_i915(ring->dev);
 
-	intel_stop_ring_buffer(ring);
-	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
+	if (ring->buffer) {
+		intel_stop_ring_buffer(ring);
+		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
-	intel_unpin_ringbuffer_obj(ring->buffer);
-	intel_ringbuffer_free(ring->buffer);
-	ring->buffer = NULL;
+		intel_unpin_ringbuffer_obj(ring->buffer);
+		intel_ringbuffer_free(ring->buffer);
+		ring->buffer = NULL;
+	}
 
 	if (ring->cleanup)
 		ring->cleanup(ring);
@@ -2225,6 +2211,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
+	ring->dev = NULL;
 }
 
 static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5d1eb20..49574ff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -350,7 +350,11 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
-bool intel_ring_initialized(struct intel_engine_cs *ring);
+static inline bool
+intel_ring_initialized(struct intel_engine_cs *ring)
+{
+	return ring->dev != NULL;
+}
 
 static inline unsigned
 intel_ring_flag(struct intel_engine_cs *ring)
-- 
1.9.1


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 113+ messages in thread

* Re: [PATCH 1/1] drm/i915: intel_ring_initialized() must be simple and inline
  2015-12-08 15:02     ` [PATCH 1/1] " Dave Gordon
@ 2015-12-10 10:24       ` Daniel Vetter
  0 siblings, 0 replies; 113+ messages in thread
From: Daniel Vetter @ 2015-12-10 10:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Tue, Dec 08, 2015 at 03:02:36PM +0000, Dave Gordon wrote:
> Based on Chris Wilson's patch from 6 months ago, rebased and adapted.
> 
> The current implementation of intel_ring_initialized() is too heavyweight;
> it's a non-inlined function that chases several levels of pointers. This
> wouldn't matter too much if it were rarely called, but it's used inside
> the iterator test of for_each_ring() and is therefore called quite
> frequently. So let's make it simple and inline ...
> 
> The idea here is to use ring->dev as an indicator showing which engines
> have been initialised and are therefore to be included in iterations that
> use for_each_ring(). This allows us to avoid multiple memory references
> and a (non-inlined) function call on each iteration of each such loop.
> 
> 	Fixes regression from
> 	commit 48d823878d64f93163f5a949623346748bbce1b4
> 	Author: Oscar Mateo <oscar.mateo@intel.com>
> 	Date:   Thu Jul 24 17:04:23 2014 +0100
> 
> 	    drm/i915/bdw: Generic logical ring init and cleanup
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>

Queued for -next, thanks for the patch.
-Daniel
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 17 +++++++++-----
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 39 +++++++++++----------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++-
>  3 files changed, 30 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 4ebafab..7644c48 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1894,8 +1894,10 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  
>  	dev_priv = ring->dev->dev_private;
>  
> -	intel_logical_ring_stop(ring);
> -	WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
> +	if (ring->buffer) {
> +		intel_logical_ring_stop(ring);
> +		WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
> +	}
>  
>  	if (ring->cleanup)
>  		ring->cleanup(ring);
> @@ -1909,6 +1911,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  	}
>  
>  	lrc_destroy_wa_ctx_obj(ring);
> +	ring->dev = NULL;
>  }
>  
>  static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
> @@ -1931,11 +1934,11 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  
>  	ret = i915_cmd_parser_init_ring(ring);
>  	if (ret)
> -		return ret;
> +		goto error;
>  
>  	ret = intel_lr_context_deferred_alloc(ring->default_context, ring);
>  	if (ret)
> -		return ret;
> +		goto error;
>  
>  	/* As this is the default context, always pin it */
>  	ret = intel_lr_context_do_pin(
> @@ -1946,9 +1949,13 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  		DRM_ERROR(
>  			"Failed to pin and map ringbuffer %s: %d\n",
>  			ring->name, ret);
> -		return ret;
> +		goto error;
>  	}
>  
> +	return 0;
> +
> +error:
> +	intel_logical_ring_cleanup(ring);
>  	return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 57d78f2..921c8a6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -33,23 +33,6 @@
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  
> -bool
> -intel_ring_initialized(struct intel_engine_cs *ring)
> -{
> -	struct drm_device *dev = ring->dev;
> -
> -	if (!dev)
> -		return false;
> -
> -	if (i915.enable_execlists) {
> -		struct intel_context *dctx = ring->default_context;
> -		struct intel_ringbuffer *ringbuf = dctx->engine[ring->id].ringbuf;
> -
> -		return ringbuf->obj;
> -	} else
> -		return ring->buffer && ring->buffer->obj;
> -}
> -
>  int __intel_ring_space(int head, int tail, int size)
>  {
>  	int space = head - tail;
> @@ -2167,8 +2150,10 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	init_waitqueue_head(&ring->irq_queue);
>  
>  	ringbuf = intel_engine_create_ringbuffer(ring, 32 * PAGE_SIZE);
> -	if (IS_ERR(ringbuf))
> -		return PTR_ERR(ringbuf);
> +	if (IS_ERR(ringbuf)) {
> +		ret = PTR_ERR(ringbuf);
> +		goto error;
> +	}
>  	ring->buffer = ringbuf;
>  
>  	if (I915_NEED_GFX_HWS(dev)) {
> @@ -2197,8 +2182,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	return 0;
>  
>  error:
> -	intel_ringbuffer_free(ringbuf);
> -	ring->buffer = NULL;
> +	intel_cleanup_ring_buffer(ring);
>  	return ret;
>  }
>  
> @@ -2211,12 +2195,14 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>  
>  	dev_priv = to_i915(ring->dev);
>  
> -	intel_stop_ring_buffer(ring);
> -	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
> +	if (ring->buffer) {
> +		intel_stop_ring_buffer(ring);
> +		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>  
> -	intel_unpin_ringbuffer_obj(ring->buffer);
> -	intel_ringbuffer_free(ring->buffer);
> -	ring->buffer = NULL;
> +		intel_unpin_ringbuffer_obj(ring->buffer);
> +		intel_ringbuffer_free(ring->buffer);
> +		ring->buffer = NULL;
> +	}
>  
>  	if (ring->cleanup)
>  		ring->cleanup(ring);
> @@ -2225,6 +2211,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>  
>  	i915_cmd_parser_fini_ring(ring);
>  	i915_gem_batch_pool_fini(&ring->batch_pool);
> +	ring->dev = NULL;
>  }
>  
>  static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5d1eb20..49574ff 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -350,7 +350,11 @@ struct  intel_engine_cs {
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  };
>  
> -bool intel_ring_initialized(struct intel_engine_cs *ring);
> +static inline bool
> +intel_ring_initialized(struct intel_engine_cs *ring)
> +{
> +	return ring->dev != NULL;
> +}
>  
>  static inline unsigned
>  intel_ring_flag(struct intel_engine_cs *ring)
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2015-12-10 10:24 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07 15:20 Low hanging fruit take 2 Chris Wilson
2015-04-07 15:20 ` [PATCH 01/70] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
2015-04-08 11:16   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 02/70] drm/i915: Fix the flip synchronisation to consider mmioflips Chris Wilson
2015-04-07 15:20 ` [PATCH 03/70] drm/i915: Ensure cache flushes prior to doing CS flips Chris Wilson
2015-04-08 11:23   ` Daniel Vetter
2015-04-08 11:29     ` Chris Wilson
2015-04-07 15:20 ` [PATCH 04/70] drm/i915: Agressive downclocking on Baytrail Chris Wilson
2015-04-07 15:20 ` [PATCH 05/70] drm/i915: Fix computation of last_adjustment for RPS autotuning Chris Wilson
2015-04-07 15:20 ` [PATCH 06/70] drm/i915: Fix race on unreferencing the wrong mmio-flip-request Chris Wilson
2015-04-08 11:30   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 07/70] drm/i915: Boost GPU frequency if we detect outstanding pageflips Chris Wilson
2015-04-08 11:31   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 08/70] drm/i915: Deminish contribution of wait-boosting from clients Chris Wilson
2015-04-07 15:20 ` [PATCH 09/70] drm/i915: Re-enable RPS wait-boosting for all engines Chris Wilson
2015-04-07 15:20 ` [PATCH 10/70] drm/i915: Split i915_gem_batch_pool into its own header Chris Wilson
2015-04-07 15:20 ` [PATCH 11/70] drm/i915: Tidy batch pool logic Chris Wilson
2015-04-07 15:20 ` [PATCH 12/70] drm/i915: Split the batch pool by engine Chris Wilson
2015-04-07 15:20 ` [PATCH 13/70] drm/i915: Free batch pool when idle Chris Wilson
2015-04-07 15:20 ` [PATCH 14/70] drm/i915: Split batch pool into size buckets Chris Wilson
2015-04-07 15:20 ` [PATCH 15/70] drm/i915: Include active flag when describing objects in debugfs Chris Wilson
2015-04-08 11:33   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 16/70] drm/i915: Suppress empty lines from debugfs/i915_gem_objects Chris Wilson
2015-04-08 11:34   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 17/70] drm/i915: Optimistically spin for the request completion Chris Wilson
2015-04-08 11:39   ` Daniel Vetter
2015-04-08 13:43     ` Rantala, Valtteri
2015-04-08 14:15       ` Daniel Vetter
2015-04-13 11:34   ` Tvrtko Ursulin
2015-04-13 12:25     ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations Chris Wilson
2015-04-14 13:51   ` Tvrtko Ursulin
2015-04-14 14:00     ` Chris Wilson
2015-04-07 15:20 ` [PATCH 19/70] drm/i915: Inline check required for object syncing prior to execbuf Chris Wilson
2015-04-07 15:20 ` [PATCH 20/70] drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Chris Wilson
2015-04-08 11:46   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 21/70] drm/i915: Limit mmio flip " Chris Wilson
2015-04-07 15:20 ` [PATCH 22/70] drm/i915: Reduce frequency of unspecific HSW reg debugging Chris Wilson
2015-04-07 15:20 ` [PATCH 23/70] drm/i915: Record ring->start address in error state Chris Wilson
2015-04-08 11:47   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 24/70] drm/i915: Use simpler form of spin_lock_irq(execlist_lock) Chris Wilson
2015-04-07 15:20 ` [PATCH 25/70] drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists Chris Wilson
2015-04-07 15:20 ` [PATCH 26/70] drm/i915: Map the execlists context regs once during pinning Chris Wilson
2015-04-07 15:20 ` [PATCH 27/70] drm/i915: Remove vestigal DRI1 ring quiescing code Chris Wilson
2015-04-09 15:02   ` Daniel Vetter
2015-04-09 15:24     ` Chris Wilson
2015-04-09 15:31       ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 28/70] drm/i915: Overhaul execlist submission Chris Wilson
2015-04-07 15:20 ` [PATCH 29/70] drm/i915: Move the execlists retirement to the right spot Chris Wilson
2015-04-07 15:20 ` [PATCH 30/70] drm/i915: Map the ringbuffer using WB on LLC machines Chris Wilson
2015-04-07 15:20 ` [PATCH 31/70] drm/i915: Refactor duplicate object vmap functions Chris Wilson
2015-04-07 15:20 ` [PATCH 32/70] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2015-04-07 15:20 ` [PATCH 33/70] drm/i915: Use a separate slab for requests Chris Wilson
2015-05-22 14:48   ` Robert Beckett
2015-04-07 15:20 ` [PATCH 34/70] drm/i915: Use a separate slab for vmas Chris Wilson
2015-04-10  8:32   ` Daniel Vetter
2015-04-07 15:20 ` [PATCH 35/70] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
2015-04-07 15:21 ` [PATCH 36/70] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
2015-04-07 15:21 ` [PATCH 37/70] drm/i915: Squash more pointer indirection for i915_is_gtt Chris Wilson
2015-04-07 15:21 ` [PATCH 38/70] drm/i915: Reduce locking in execlist command submission Chris Wilson
2015-04-07 15:21 ` [PATCH 39/70] drm/i915: Reduce more " Chris Wilson
2015-04-07 15:21 ` [PATCH 40/70] drm/i915: Reduce locking in gen8 IRQ handler Chris Wilson
2015-04-07 15:21 ` [PATCH 41/70] drm/i915: Tidy " Chris Wilson
2015-04-10  8:36   ` Daniel Vetter
2015-04-07 15:21 ` [PATCH 42/70] drm/i915: Remove request retirement before each batch Chris Wilson
2015-04-07 15:21 ` [PATCH 43/70] drm/i915: Cache the GGTT offset for the execlists context Chris Wilson
2015-04-07 15:21 ` [PATCH 44/70] drm/i915: Prefer to check for idleness in worker rather than sync-flush Chris Wilson
2015-04-10  8:37   ` Daniel Vetter
2015-04-07 15:21 ` [PATCH 45/70] drm/i915: Remove request->uniq Chris Wilson
2015-04-10  8:38   ` Daniel Vetter
2015-04-07 15:21 ` [PATCH 46/70] drm/i915: Cache the reset_counter for the request Chris Wilson
2015-04-07 15:21 ` [PATCH 47/70] drm/i915: Allocate context objects from stolen Chris Wilson
2015-04-10  8:39   ` Daniel Vetter
2015-04-07 15:21 ` [PATCH 48/70] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2015-04-07 15:21 ` [PATCH 49/70] drm/i915: Do not zero initialise page tables Chris Wilson
2015-04-07 15:21 ` [PATCH 50/70] drm/i915: The argument for postfix is redundant Chris Wilson
2015-04-10  8:53   ` Daniel Vetter
2015-04-10  9:00     ` Chris Wilson
2015-04-10  9:32       ` Daniel Vetter
2015-04-10  9:45         ` Chris Wilson
2015-04-07 15:21 ` [PATCH 51/70] drm/i915: Record the position of the start of the request Chris Wilson
2015-04-07 15:21 ` [PATCH 52/70] drm/i915: Cache the execlist ctx descriptor Chris Wilson
2015-04-07 15:21 ` [PATCH 53/70] drm/i915: Eliminate vmap overhead for cmd parser Chris Wilson
2015-04-07 15:21 ` [PATCH 54/70] drm/i915: Cache last cmd descriptor when parsing Chris Wilson
2015-04-07 15:21 ` [PATCH 55/70] drm/i915: Use WC copies on !llc platforms for the command parser Chris Wilson
2015-04-07 15:21 ` [PATCH 56/70] drm/i915: Cache kmap between relocations Chris Wilson
2015-04-07 15:21 ` [PATCH 57/70] drm/i915: intel_ring_initialized() must be simple and inline Chris Wilson
2015-12-08 15:02   ` [PATCH 0/1] " Dave Gordon
2015-12-08 15:02     ` [PATCH 1/1] " Dave Gordon
2015-12-10 10:24       ` Daniel Vetter
2015-04-07 15:21 ` [PATCH 58/70] drm/i915: Before shrink_all we only need to idle the GPU Chris Wilson
2015-04-07 15:21 ` [PATCH 59/70] drm/i915: Simplify object is-pinned checking for shrinker Chris Wilson
2015-04-07 16:28 ` Chris Wilson
2015-04-07 16:28   ` [PATCH 60/70] drm/i915: Make evict-everything more robust Chris Wilson
2015-04-07 16:28   ` [PATCH 61/70] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
2015-04-14 14:52     ` Tvrtko Ursulin
2015-04-14 15:05       ` Chris Wilson
2015-04-14 15:15         ` Tvrtko Ursulin
2015-04-07 16:28   ` [PATCH 62/70] drm/i915: Reduce locking inside busy ioctl Chris Wilson
2015-04-07 16:28   ` [PATCH 63/70] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2015-04-10  9:14     ` Daniel Vetter
2015-04-15  9:03       ` Chris Wilson
2015-04-15  9:33         ` Daniel Vetter
2015-04-15  9:38           ` Chris Wilson
2015-04-07 16:28   ` [PATCH 64/70] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2015-04-07 16:28   ` [PATCH 65/70] drm/i915: Reduce locking for gen6+ GT interrupts Chris Wilson
2015-04-07 16:28   ` [PATCH 66/70] drm/i915: Remove obj->pin_mappable Chris Wilson
2015-04-13 11:35     ` Tvrtko Ursulin
2015-04-13 12:30       ` Daniel Vetter
2015-04-07 16:28   ` [PATCH 67/70] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
2015-04-07 16:28   ` [PATCH 68/70] drm/i915: Simplify vma-walker for i915_gem_objects Chris Wilson
2015-04-07 16:28   ` [PATCH 69/70] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
2015-04-07 16:28   ` [PATCH 70/70] drm/i915: Use vma as the primary token for managing binding Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.